A Windows XP help forum. PCbanter

If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

Go Back   Home » PCbanter forum » Microsoft Windows XP » General XP issues or comments
Site Map Home Register Authors List Search Today's Posts Mark Forums Read Web Partners

Windows vim script to sort lines & remove duplicates & concatenate data based on a given field



 
 
Thread Tools Display Modes
  #1  
Old June 17th 19, 04:42 AM posted to alt.windows7.general,microsoft.public.windowsxp.general,comp.os.msdos.programmer
Arlen G. Holder
external usenet poster
 
Posts: 236
Default Windows vim script to sort lines & remove duplicates & concatenate data based on a given field

Save this script if you ever need to sort a text file based on a field.
o Windows vim script to sort lines & remove duplicates & concatenate data based on a given field

In a recent thread, this question was asked, which has, to my knowledge,
absolutely ZERO workable solutions on Windows which stay wholly inside of
the vi VIM gvim freeware text editor, even though the question has been
asked many times on the net over the years, and particularly given it's
trivial to do on Linux - but VERY (very) hard to do inside of VIM on
Windows.

Given the domain in the first space-separated field, the question was asked
how to create a file of unique domains based on data autogenerated for
multiple domains of the generic format
http://domain1.com (various data fields)
http://domain1.com (more data fields)
http://domain1.com (stuff)
http://domain2.net (random data fields
http://domain3.xyz (more data fields aplenty)
Where the original file was 10,000 lines long so a program was needed to
not only eliminate duplicate domains but also to concatenate the
domain-specific fields on a single line, resulting in a file of format:
http://domain1.com (various data fields) (more data fields) (stuff)
http://domain2.net (random data fields
http://domain3.xyz (data fields aplenty)

This is the alt-os-linux thread on the subject:
o Finding first field duplicate lines in a sorted text file without uniq
or awk or col using only vim - is it possible?
https://groups.google.com/forum/#!topic/alt.os.linux/aZlsGxn_nEE

And this is the solution, which, I repeat, has NEVER been done on Windows
to my knowledge, without adding Linux commands (like Cygwin) or without
using Access or Excel or some other non-VIM database tool to eliminate
duplicates based on a given field.
[START HERE]
function! C(blah)
redir = cnt
silent exe "%s#^" . a:blah . "##gn"
redir END
let res = strpart(cnt, 1, stridx(cnt, " "))
let i = 0
while i res - 1
normal! @q
let i += 1
endwhile
endfunction

function! All()
let i = 0
while i g:dc
normal! "ayW
call C(getreg("a"))
normal! j
let i += 1
endwhile
endfunction

let dc = 10000
let @q='0jdWkJ0'
sort u
normal! gg
call All()
[END HERE]

NOTE: Change the "dc" variable to fit your file size in number of lines.
NOTE: Source the file within VIM (e.g., :source doitall.vim).

Note: The problem set sounds simple until you actually _try_ to solve it.
https://stackoverflow.com/questions/1915636/is-there-a-way-to-uniq-by-column
https://stackoverflow.com/questions/22849757/how-to-delete-duplicated-rows-based-in-a-column-value
https://unix.stackexchange.com/questions/104525/sort-based-on-the-third-column
https://stackoverflow.com/questions/17847799/sort-and-remove-duplicates-based-on-column
https://superuser.com/questions/416134/how-can-i-sort-objects-by-third-column-in-powershell
https://www.biostars.org/p/271720/
https://unix.stackexchange.com/questions/77406/sort-only-on-the-second-column
https://unix.stackexchange.com/questions/171091/remove-lines-based-on-duplicates-within-one-column-without-sort
https://www.unix.com/shell-programming-and-scripting/179059-remove-duplicate-lines-based-field-sort.html
https://stackoverflow.com/questions/17847799/sort-and-remove-duplicates-based-on-column
https://stackoverflow.com/questions/33265934/duplicates-in-an-unix-text-file-based-on-multiple-fields
https://superuser.com/questions/332268/windows-7-sorting-contents-of-a-file/332269
https://stackoverflow.com/questions/22306490/how-to-sort-csv-by-columns-using-batch-scripting
https://stackoverflow.com/questions/12850909/how-to-remove-duplicates-in-a-csv-file-based-on-two-columns
https://stackoverflow.com/questions/1450085/list-only-duplicate-lines-based-on-one-column-from-a-semi-colon-delimited-file
https://stackoverflow.com/questions/6438896/sorting-data-based-on-second-column-of-a-file
https://www.mathworks.com/matlabcentral/answers/278956-sort-matrix-based-on-unique-values-in-one-column
https://www.ultraedit.com/support/tutorials-power-tips/ultraedit/advanced-column-based-sort.html
Ads
 




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is Off
HTML code is Off






All times are GMT +1. The time now is 12:37 PM.


Powered by vBulletin® Version 3.6.4
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright ©2004-2024 PCbanter.
The comments are property of their posters.