sed utility goes way beyond search and replace, however, and you can use it to mangle a file beyond all recognition.
Before I launch you on a career as file mangler extraordinaire, you should understand what
At the simplest level,
The last rule is perhaps the most calming, because, to the new user,
Let's look at the process a little more closely. The
Assume that you have a text file in which you want to change all examples of
s/ms-dos/MS-DOS/g s/MS-DOS system/MS-DOS operating system
A
d (no address limit, deletes all lines) 1d (delete the first line) 10,15d (delete lines 10 through 15) $d (delete the last line) /Sacramento/d (delete any lines containing the word Sacramento) /CUTHERE/,/TOHERE/d (delete all lines between CUTHERE and TOHERE) /^$/d (delete all blank lines; ^=line start, $=line end) /STOPHERE/,$d (delete everything from STOPHERE to end of input) Now, let's say that you have attended an early run of Romeo and Ethel the Dancer, and the text of your review is ready for typesetting as listed below. The playwright calls you in a panic to inform you that he has opted for a simpler name -- Romeo and Juliet, of all things -- and you are stuck with a long and glowing review that you must rewrite. The sample below is the first of the 200 pages you produced after being a bit carried away by the performance. Note that a newline character terminates each line in the following text.
Romeo and Ethel the Dancer Moves Audience to Tears. I was treated to the off Broadway opening of Romeo and Ethel the Dancer. This moving story of star-crossed lovers had the audience in tears half way through the third act. Do not go to see this play without a hanky, but even the weeping from the back row could not diminish the brilliance of Romeo and Ethel the Dancer by William Shakespeare. The first effort at a search-and-replace script would give you:
#romeo.sed s/Romeo and Ethel the Dancer/Romeo and Juliet/g. If you review the text, you will see that this will only match the title of the article, producing this confusing output.
Romeo and Juliet Moves Audience to Tears. I was treated to the off Broadway opening of Romeo and Ethel the Dancer. This moving story of star-crossed lovers had the audience in tears half way through the third act. Do not go to see this play without a hanky, but even the weeping from the back row could not diminish the brilliance of Romeo and Ethel the Dancer by William Shakespeare.
What is really needed here is some way of wrapping a search around line breaks. To do this effectively, we use a special syntax option of An address can be used to identify a range of lines over which multiple commands are to be executed. The syntax is as follows:
[address]{
command
command
etc.
}
What we want is to execute a subroutine (or a sub-search-and-replace procedure) whenever the word
#romeo.sed
s/Romeo and Ethel the /Romeo and Juliet/g.
/Romeo/{
something here
}
The
The following listing includes line numbers to make the explanation easier, although a
1 #romeo.sed
2 s/Romeo and Ethel the Dancer/Romeo and Juliet/
3 /Romeo/{
4 $!N
5 /Dancer/{
6 s/\n/ /
7 s/Romeo and Ethel the Dancer\. */Romeo and Juliet.\
8 /
9 s/Romeo and Ethel the Dancer */Romeo and Juliet\
10 /
11 }
12 }
Line 1 includes a comment identifying the script. Line 2 is the basic search-and-replace procedure that is being performed on the article. If the search text is contained entirely within one line, then this script line takes care of it.
Line 3 identifies any line containing
Line 4 looks strange, but we'll break it down. The
At line 5, an inner routine limits things even further. Having pulled another line, we only want to continue if the pattern buffer now also contains When the next line is pulled into the pattern buffer, newline characters are left in the buffer. In order to get rid of the newline, you must replace it with a space -- if you don't, further search commands would have to take that newline into account. Replacing it with a space in this manner thus simplifies matters. If you replace the newline with nothing at all, then this:
Romeo and Ethel the Dancer Becomes this:
Romeo and Ethel theDancer Forcing a space to replace the newline creates the correct spacing:
Romeo and Ethel the Dancer Lines 7 through 10 match two possible versions of the title: lines 7 and 8 match the title at the end of a sentence, while lines 9 and 10 match the title within a sentence.
Two separate replacement texts are needed because the intent of the search and replace is to combine two lines of text that contain the words
I was treated to the off-Broadway opening of Romeo and Ethel the Dancer. This moving story of star-crossed lovers are combined in the pattern buffer and become this:
I was treated to the off-Broadway opening of Romeo and Ethel the Dancer. This moving story of star-crossed lovers
If the replacement action were simply to replace
I was treated to the off-Broadway opening of Romeo and Juliet . This moving story of star-crossed lovers By including two possible search texts, one with a period and one without, the newline can be placed after the replacement text or after a period at the end of the original search text. If the title appeared followed by a comma anywhere in the original article, a third version of the search text would be needed to handle that condition. There is a syntax for searching for a string followed by any punctuation, but it is beyond the scope of this article. The title is changed and output with a period and a newline, or with a newline only. Look at the eighth and tenth lines in the listing below (note that the line numbers are now removed to correctly show the alignment of the backslashes). The slash appears at the beginning of the line. The complete replacement text includes everything up to the closing backslash, including the embedded newlines at the ends of the seventh and ninth lines.
#romeo.sed
s/Romeo and Ethel the Dancer/Romeo and Juliet/
/Romeo/{
$!N
/Dancer/{
s/\n/ /
s/Romeo and Ethel the Dancer\. */Romeo and Juliet.\
/
s/Romeo and Ethel the Dancer */Romeo and Juliet\
/
}
}
This gives us the effect we wanted.
Romeo and Juliet Moves Audience to Tears. I was treated to the off Broadway opening of Romeo and Juliet. This moving story of star-crossed lovers had the audience in tears half way through the third act. Do not go to see this play without a hanky, but even the weeping from the back row could not diminish the brilliance of Romeo and Juliet by William Shakespeare.
Sometimes it's useful to read in the contents of a file rather than
specifying it in the
/include closing/{
r closing.txt
}
The
Let's say we need to interchange the two paragraphs of the sample text with which we have been working. For this example we'll add two lines to our Romeo and Juliet review.
Romeo and Juliet Moves Audience to Tears. LAST I was treated to the off Broadway opening of Romeo and Juliet. This moving story of star-crossed lovers had the audience in tears half way through the third act. TOP Do not go to see this play without a hanky, but even the weeping from the back row could not diminish the brilliance of Romeo and Juliet by William Shakespeare.
The following script will read in from the line containing
#romeo2.sed
/LAST/,/TOP/{
H
d
}
${
x
}
As each line is read in,
Romeo and Juliet Moves Audience to Tears. Do not go to see this play without a hanky, but even the weeping from the back row could not diminish the brilliance of Romeo and Juliet by William Shakespeare. LAST I was treated to the off Broadway opening of Romeo and Juliet. This moving story of star-crossed lovers had the audience in tears half way through the third act. TOP
There is much more to |
|
|||||||||||||||||||||||||||||||||||||||||||
|
Copyright©2001 King Computer Services Inc. All rights reserved. |
||||||||||||||||||||||||||||||||||||||||||||