Processing files with awk, part two

In this second of two columns on the awk programming utility, we show you how to print reports with awk's print and printf commands

Summary
This is the second of two parts on awk, so if you missed the first part in last month's issue it's advisable to review it (see Resources below). Awk is a text processing utility that runs through a text file by reading and processing a record at a time. This month we show you how to print and format a user list with awk. (2,600 words)

 


One more piece of awk syntax will make it an even more useful tool. I said in last month's column that awk treats the spaces in a record as a field separator. It is possible to change the field separator to another value.

Figure 1 is an example of a passwd file. The password itself in this example is replaced with a single exclamation mark. This file has several separate fields in it, but the field separator is a colon (:) rather than spaces.

Figure 1

root:!:0:1:Super User:/:
daemon:!:1:1:System Daemons:/etc
lbw:!:209:200:Lavinia Bowder Washinton:/home/lbw:/bin/csh
bob:!:210:200:Robbie Cramer:/home/bob:/bin/ksh
joann:!:213:200:Jo Ann Batson:/home/joann:/bin/ksh
jlan:!:214:200:Jack Landon:/home/jlan:/bin/ksh
jank:!:215:200:Jan Kingly:/home/jank:/bin/ksh
ljn:!:216:200:Laura Nugent:/home/ljn:/bin/ksh
mjb:!:220:200:Mo Budlong:/home/mjb:/bin/ksh
bda:!:235:500:Basic Development Accnt:/home/bda:/bin/ksh
obrero:!:245:500::/home/obrero:/bin/ksh
guest1:!:501:500:Guest1 Account:/disk2/guest1:/bin/ksh
guest2:!:502:500:Guest2 Account:/disk2/guest2:/bin/ksh
guest3:!:503:500:Guest3 Account:/disk2/guest3:/bin/ksh
beb:!:248:202:Becky E Brown  :/home/beb:/bin/ksh

A passwd file can be used as the input file to awk for for an awk report by changing the field separator. Figure 2 is a short example. There are two points to notice.

First the logic in BEGIN{FS=":"}. In awk, FS is a pre-defined variable that contains the field separator. If you make no changes to it, the FS value is set to spaces. In this listing, the BEGIN logic sets FS to a colon (:), so the value of the field separator is changed before the first record is read. This allows the passwd records to be broken into fields at the colons.

The second point to notice is on line 3 of Figure 2. In all previous examples the file has been piped into awk using "ls -l|awk etc." In this example, the file is specifically named by placing it on the command line after the closing single quote at the end of the awk commands. Awk can take its input from a pipe as in previous examples, or from an explicitly named file (or files) as in Figure 2. Remember that the closing quote ends multiline input so be sure to type the closing quote, a space and then the name of the file.

Remember to type a TAB wherever you see the ^ mark.

Figure 2

awk '
BEGIN{FS=":"}
{print $1 "  ^" $5}' /etc/passwd

Unless you are in the C shell, the closing quote ends multiline input so be sure to type the closing quote, followed by a space and followed by the name of the file.

Figure 3 is an example using C shell continuation characters. The example shown in Figure 2 works correctly. Figure 4 gives you two further examples, one version that won't work and another that will.

Figure 3

awk ' \
BEGIN{FS=":"} \
{print $1 "  ^" $5}' /etc/passwd

Figure 4

awk '
BEGIN{FS=":"}
{print $1 "  ^" $5}
' /etc/passwd        < this works as multiline input is still active

awk '
BEGIN{FS=":"}
{print $1 "  ^" $5}' < multiline input ends here
/etc/passwd          < this won't work multiline input
                      ended on the previous line

Figure 5 is a sample output from Figure 2 or Figure 3 for the C shell. The awk script selects field $1 which is the user id, and field $5 which is the user name and prints them with a tab between them.

Figure 5

root         Super User
daemon       System Daemons
lbw          Lavinia Bowder Washinton
bob          Robbie Cramer
joann        Jo Ann Batson
jlan         Jack Landon
jank         Jan Kingly
ljn          Laura Nugent
mjb          Mo Budlong
bda          Basic Development Accnt
obrero       
guest1       Guest1 Account
guest2       Guest2 Account
guest3       Guest3 Account
beb          Becky E Brown

Awk has a number of pre-defined variables. You have already seen FS. Another useful one is NR. This is a variable that contains the number of the current record. It is updated by 1 as each record is read. You may use this to number the output records as in Figure 6, the output of which would look like Figure 7.

Figure 6

awk '
BEGIN{FS=":"}
{print NR ".   ^" $1 "  ^" $5}' /etc/passwd

Figure 7

1.     root         Super User
2.     daemon       System Daemons
3.     lbw          Lavinia Bowder Washinton
4.     bob          Robbie Cramer
6.     joann        Jo Ann Batson
7.     jlan         Jack Landon
8.     jank         Jan Kingly
9.     ljn          Laura Nugent
10.    mjb          Mo Budlong
11.    bda          Basic Development Accnt
12.    obrero       
13.    guest1       Guest1 Account
14.    guest2       Guest2 Account
15.    guest3       Guest3 Account
16.    beb          Becky E Brown

You may also use NR in the END logic. After the last record is read, NR is left set to the value of the last record. Figure 8 would produce output that looks like Figure 9.

Figure 8

awk '
BEGIN{FS=":"}
{print $1 "  ^" $5}
END{print "Total users = " NR}' /etc/passwd

Figure 9

root         Super User
daemon       System Daemons
lbw          Lavinia Bowder Washinton
bob          Robbie Cramer
joann        Jo Ann Batson
jlan         Jack Landon
jank         Jan Kingly
ljn          Laura Nugent
mjb          Mo Budlong
bda          Basic Development Accnt
obrero       
guest1       Guest1 Account
guest2       Guest2 Account
guest3       Guest3 Account
beb          Becky E Brown  
Total users = 16

Complex reporting: using printf to make it look right
The awk print command is good enough for a lot of reporting, but when it comes to more complex or longer print layouts involving tidy columns of information you need something more powerful. The intent of Figure 10 is to print four columns of information from the /etc/passwd file -- User id, name, home pat, and login shell. The columns are separated by tabs. The actual output looks something like Figure 11. A single tab is not enough to produce decent alignment when the fields are of substantially varying lengths.

Figure 10

awk '
BEGIN{FS=":";print "User  ^Name  ^Home  ^Shell}
{print $1 "  ^" $5 " ^" $6 "  ^" $7}
END{print "Total users = " NR}' /etc/passwd

Figure 11

User  Name  Home  Shell
root  Super User  /
daemon      System Daemons    /etc
lbw   Lavinia Bowder Washinton     /home/lbw   /bin/csh
bob   Robbie Cramer     /home/bob   /bin/ksh
joann Jo Ann Batson     /home/joann /bin/ksh
jlan  Jack Landon /home/jlan  /bin/ksh
jank  Jan Kingly  /home/jank  /bin/ksh
ljn   Laura Nugent      /home/ljn   /bin/ksh
mjb   Mo Budlong  /home/mjb   /bin/ksh
bda   Basic Development Accnt /home/bda    /bin/ksh
obrero            /home/obrero      /bin/ksh
guest1      Guest1 Account    /disk2/guest1     /bin/ksh
guest2      Guest2 Account    /disk2/guest2     /bin/ksh
guest3      Guest3 Account    /disk2/guest3     /bin/ksh
beb   Becky E Brown       /home/beb   /bin/ksh
Total users = 16

To handle this it is necessary to use the other awk print command which is printf (print formatted). The printf command is similar to the printf command of the C programming language, but a simplified explanation of the command is in order for those who do not know C.

The printf command is executed by providing a format string and a list of the values to be printed using the format string. These are separated by commas as in:

printf "format_string", $1, $3, $6, $7

Some versions of awk require parentheses around the arguments as in:

printf("format_string", $1, $3, $6, $7)

It is always safe to include the parentheses.

The values that can be used in a format string are very extensive and can format data in all sorts of ways, but for simple reports, the most useful format is the fixed width string.

A fixed width string field starts with a percent sign (%). If a minus sign (-) follows, then the printed data is left-justified within the fixed width of the field. Most string data is left-justified, so you should usually include the minus sign. The next part of the format is the length of the field, and finally an `s' ends the formatting. An example of this would be "%-30s" which is a field containing 30 left-justified characters. Using this format string with printf would look something like:

printf("%-30s",$1)

This would print field $1 in a left-justified, 30-character field space.

If field $1 does not contain 30 characters, then the field is padded with spaces until 30 character spaces are filled. One big advantage of a format string is that you can force a field to always print with a certain width by filling unused portions of the field with spaces. You may combine multiple format fields in a format string as in:

printf("%-20s%-30s", $1, $2)

This example will take field $1 and place it, left-justified into the first printing position. The field will be padded until it is 20 characters long. Then field $2 will be appended and padded out to 30 characters. This guarantees that columns will line up under one another. The format string for each field should be long enough to accommodate the largest value that will be placed in the field.

There is one small hitch in printf. The print command automatically prints a newline at the end of each print statement. The printf command does not, so you must explicitly end the format string with a newline "\n".

Using these rules, let's create a format string for the four fields that we want to print from the /etc/passwd file. In Figure 12 I have taken the four fields, found the longest example, made a guess as to a safe width to use, and then created a format string that is one character longer than the safe width. This allows for a minimum of a single space between fields.

Figure 12

Field Longest Safe Width Format
User id 6 10 "%-11s"
Name 25 30 "%-31s"
Home 8 15 "%-16s"
Shell 8 15 "%-16s"

The next step is to combine all of the fields into one long format string and append a newline.

printf("%-11s%-31s%-16s%-16s\n")

Finally list the fields to be printed with separating commas.

printf("%-11s%-31s%-16s%-16s\n",$1,$5,$6,$7)

For your version of awk the format string and list of values after printf may not need to be enclosed in parentheses as in:

printf "%-11s%-31s%-16s%-16s\n",$1,$5,$6,$7

It is always safe to use the parentheses, but in many versions of awk you do not need them.

Figure 13 is the first version of the awk script using printf. It does not include column titles.

Figure 13

awk '
BEGIN{FS=":"}
{printf("%-11s%-31s%-16s%-16s\n",$1,$5,$6,$7)}
END{print "Total users = " NR}' /etc/passwd

Figure 14 is the C shell version of the same listing.

Figure 14

awk ' \
BEGIN{FS=":"} \
{printf("%-11s%-31s%-16s%-16s\n",$1,$5,$6,$7)} \
END{print "Total users = " NR}' /etc/passwd

Adding column titles involves ensuring that the column titles actually line up with the fields in the format string. Figure 15 uses a simple trick to ensure that the column titles do align. The values used by printf to fill a format string when printing do not need to be variables. They can also be strings. The header or title line can be created by using the same format string that was used in the body of the report.

Figure 15

awk '
BEGIN{FS=":";
printf("%-11s%-31s%-16s%-16s\n","User","Name","Home","Shell")}
{printf("%-11s%-31s%-16s%-16s\n",$1,$5,$6,$7)}
END{print "Total users = " NR}' /etc/passwd

The output from Figure 15 is shown in Figure 16 -- it's a much more readable and useful output.

Figure 16

User       Name                           Home            Shell
root       Super User                     /
daemon     System Daemons                 /etc
lbw        Lavinia Bowder Washinton       /home/lbw       /bin/csh
bob        Robbie Cramer                  /home/bob       /bin/ksh
joann      Jo Ann Batson                  /home/joann     /bin/ksh
jlan       Jack Landon                    /home/jlan      /bin/ksh
jank       Jan Kingly                     /home/jank      /bin/ksh
ljn        Laura Nugent                   /home/ljn       /bin/ksh
mjb        Mo Budlong                     /home/mjb       /bin/ksh
bda        Basic Development Accnt        /home/bda       /bin/ksh
obrero                                    /home/obrero    /bin/ksh
guest1     Guest1 Account                 /disk2/guest1   /bin/ksh
guest2     Guest2 Account                 /disk2/guest2   /bin/ksh
guest3     Guest3 Account                 /disk2/guest3   /bin/ksh
beb        Becky E Brown                  /home/beb       /bin/ksh
Total users = 16

In case you're offended by figure 15
Just before I put this article to bed, there is one thing in Figure 15 that offends me as a programmer. The format string is repeated twice, on lines 3 and 4. From a programming standpoint this is not optimum. If you need to change the report layout you have to modify the format string twice, and that leads to potential typographical errors.

You will recall from one of the earlier examples that we used a variable to save the total bytes for all files that were listed. Why not create a variable that contains the format string? In Figure 17 the format string has been assigned to a variable named format as part of the BEGIN logic. In the printf commands, the variable "format" is used as the format string for both the title line and the individual record lines instead of a literal format string. The output is exactly the same as Figure 16. Figure 18 is the C shell version.

Figure 17

awk '
BEGIN{FS=":";
format = "%-11s%-31s%-16s%-16s\n";
printf(format,"User","Name","Home","Shell")}
{printf(format,$1,$5,$6,$7)}
END{print "Total users = " NR}' /etc/passwd

Figure 18

awk ' \
BEGIN{FS=":"; \
format = "%-11s%-31s%-16s%-16s\n"; \
printf(format,"User","Name","Home","Shell")} \
{printf(format,$1,$5,$6,$7)} \
END{print "Total users = " NR}' /etc/passwd

So far all the examples I have given have been typed directly at the command line. You may also open a file with vi, type the above lines exactly as given in Figure 17. Add an initial line that forces a Bourne or Korn shell to execute the commands as in Figure 19 and save the file as userlist.

Figure 19

#!/bin/ksh
# (or /bin/sh)
awk '
BEGIN{FS=":";
format = "%-11s%-31s%-16s%-16s\n";
printf(format,"User","Name","Home","Shell")}
{printf(format,$1,$5,$6,$7)}
END{print "Total users = " NR}' /etc/passwd

Change the execution privileges using:

chmod a+x userlist

and you now have a script that will display a user list any time you type "userlist." You may also send the output to a file using redirection as in:

userlist >userlist.txt

or to a printer using one of the printer pipes such as:

userlist|lp

In Figure 19 I created a shell script that executed an awk command on a specific file. This is not a true awk script, but a shell script that executed awk. An awk script includes only the awk commands. Assume for a moment that for security reasons, a copy of the /etc/passwd file is saved every week, allowing a running record of who had access to the system at any time in the past. An awk script could be created by using only the awk commands in Figure 19. This would look like Figure 20. Save this file as userfmt.awk or some similar name to identify it as containing awk commands.

Figure 20

BEGIN{FS=":";
format = "%-11s%-31s%-16s%-16s\n";
printf(format,"User","Name","Home","Shell")}
{printf(format,$1,$5,$6,$7)}
END{print "Total users = " NR}

To execute the awk script, use a -f switch to identify the awk script as in:

awk -f userfmt.awk /etc/passwd

Using this awk script you can process any earlier saved versions of the passwd file as in:

awk -f userfmt.awk /old/passswd.970404 >users_970404.txt

Believe it or not, these two articles only scratch the surface of awk. An excellent book on the subject is sed & awk by O'Reilly and Associates, Inc (see Resources below). If you intend to pursue awk further I recommend the book strongly.

Contact us for a free consultation.

 

MENU:

 
SOFTWARE DEVELOPMENT:
    • EXPERIENCE
PRODUCTS:
UNIX: 

   • UNIX TUTORIALS

LEGACY SYSTEMS:

    • LEARN COBOL
    • PRODUCTS
    • GEN-CODE
    • COMPILERS   

INTERNET:
    • CYBERSUITE   
WINDOWS:

    • PRODUCTS


Search Now:
 
In Association with Amazon.com

Copyright©2001 King Computer Services Inc. All rights reserved.