bash tip: collapse or parse a big text doc into individual sorted words from columns

Start with list.txt like:

server7858   server7858   server7858   server7858   server7861   server7860   server8310   server8310   server7863   server8311

server7859   server7859   server7859   server7859   server8781   server8676   server8677   server8677   server8679   server8782

Which has duplicates and long lines and crap. Run this:

rm list2.txt

rm list3.txt

for word in `cat list.txt`; do echo $word ; done |sort |uniq >> list2.txt

sed -ibak -e ‘s/ //g’ list2.txt

cat list2.txt |sort|uniq > list3.txt

vi list3.txt

ta da!

if you need word counts and such, pipe it through wc before running uniq

If you need to collapse multi line (multi-line multiline) data like this:

fldcvisla8524:

packages.MQSeriesServer.installdate: 1439579830

fldcvfsla13746:

packages.MQSeriesServer.installdate: 1486575523

Into:

fldcvisla8524:  packages.MQSeriesServer.installdate: 1439579830

fldcvfsla13746:  packages.MQSeriesServer.installdate: 1486575523

Then try this:

paste -s -d ‘  \n’ list-mq.txt >> list-mq-out.txt

If you have to adapt to collapsing every 3 lines or 4th or 5 or whatever, add more spaces in the ‘   \n’ and try again until it loops properly

this one is useful, you can search for a pattern and then combine the following lines, needs tweaking for your use case:

cat temp.txt |sed -n ‘/+version/ {s/.*//; N; N; s/\n//g; p;}’

Remember that there is the paste command that combines two files together!

Join too.

file1.txt:

January

February

file2.txt

01 1970

07 1967

paste file1.txt file2.txt  > file3.txt

gives file3:

January 01 1970

February 07 1967

Join will match up fields in two places from two separate files, it’s a sql join across text files!

more metadata, this one is hard to find:

bash

list

word

parse

parsing

text

columns

sorting

collapse

words

shell

script

split

line

lines

grep

collapse

multi

multiple

multiline

multi-line

multi line

cut

paste

Leave a Reply

Your email address will not be published. Required fields are marked *