parse a text file

Hello guys im trying to parse a text file that contains information and i cant seem to get started, does anyone have any good ideas?

For example, the text file looks like this:

idx=929392, pl= 12, name= someperson
age=19
state=ma
status=n/a

idx=929393, pl= 12, name= someperson2
age=20
state=ma
status=n/a

idx=929394, pl= 12, name= someperson3
age=21
state=ma
status=n/a


i want to parse the name and age into another text file like this format:

someperson 19
someperson 20
someperson 21

possibly include other attributes next to the age like idx?
Thanks for the help in advance!
Let's look at a regular expression to match the name. It must have a space and lower case letters and numbers. That is:
 
[\ ,a-z,0-9]

Age is similar.

If the names are in a file, x.txt, we can do:
 
cat x.txt | egrep "name=[\ ,a-z,0-9]*|age=[\ ,a-z,0-9]*"

That generates:
name= someperson
age=19
name= someperson2
age=20
name= someperson3
age=21


We seperate the field name from value by splitting = with cut:
 
-d = -f 2


So applying that we get:
 
cat x.txt | egrep "name=[\ ,a-z,0-9]*|age=[\ ,a-z,0-9]*" | cut -d = -f 2

That gives us:
 someperson
19
 someperson2
20
 someperson3
21


We need to make up a line from the pairs. Enter AWK:
1
2
3
4
5
6
7
8
9
{
    if (i%2 == 1)
        printf("%s %s\n", name, $1);

    if (i%2 == 0)
        name = $1;

    ++i;
}


Putting it all together:
 
cat x.txt | egrep -o "name=[\ ,a-z,0-9]*|age=[\ ,a-z,0-9]*" | cut -d = -f 2 | awk '{ if (i%2 == 1) printf("%s %s\n", name, $1); if (i%2 == 0) name=$1; ++i }'

Output:
someperson 19
someperson2 20
someperson3 21
Last edited on
http://anaturb.net/C/string_exapm.htm
This page has good examples.
Call the cat home and raise:
sed -n '/name=/{ s/\(idx=[0-9]\+\).*\(name= [a-z0-9]\+\)/\2 \1/; H }; /age=/H; ${g; s/^\n//; s/name= //g; s/idx=\([0-9]\+\)\nage=\([0-9]\+\)/\2 \1/g; p}' x.txt
someperson 19 929392
someperson2 20 929393
someperson3 21 929394

Luckily, that program can be saved as a file:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# x.sed
/name=/{
  s/\(idx=[0-9]\+\).*\(name= [a-z0-9]\+\)/\2 \1/
  H
}

/age=/H

${
  g
  s/^\n//
  s/name= //g
  s/idx=\([0-9]\+\)\nage=\([0-9]\+\)/\2 \1/g
  p
}

# sed -n -f x.sed x.txt 
someperson 19 929392
someperson2 20 929393
someperson3 21 929394

The functionality obviously depends on the input having both a line with idx and name and a line with age for each "record".
Topic archived. No new replies allowed.