sed: modify your text without opening your files
Unix clones such as Linux are rarely used to manage technical documentation. This is unusual when you consider the plethora of tools available on these platforms for manipulating text in various ways.
Consider the dialogue between M. Jourdain and his philosophy teacher in Molière’s Le Bourgeois Gentilhomme:
MONSIEUR JOURDAIN:
: […] Je voudrais donc lui mettre dans un billet: « Belle marquise, vos beaux yeux me font mourir d’amour » ; mais je voudrais que cela fût mis d’une manière galante, que cela fût tourné gentiment.
[…]
PHILOSOPHY TEACHER:
: On les peut mettre premièrement comme vous avez dit: Belle marquise, vos beaux yeux me font mourir d’amour. Ou bien: D’amour mourir me font, belle marquise, vos beaux yeux. Ou bien: Vos yeux beaux d’amour me font, belle marquise, mourir. Ou bien: Mourir vos beaux yeux, belle marquise, d’amour me font. Ou bien: Me font vos yeux beaux mourir, belle marquise, d’amour.
Let’s start by displaying the original sentence in a terminal:
$ echo "Belle marquise, vos beaux \\yeux me font mourir d'amour."Belle marquise, vos beaux yeux me font mourir d'amour.Now we need to swap the words in the sentence to create a new one. For a simple transposition, you might find it easier to use awk. awk doesn’t deal with lines, but with the fields of a record (of a line), delimited by spaces by default. In other words, awk treats text like a database. It can easily display the whole line or just one or more fields in any desired order. Fields are indicated in the form $n, where n indicates the position of the field in the line, starting from the left. So $1 indicates the first field, $2, and so forth. $0 corresponds to the whole line.
So we’re going to give Mr. Jourdain’s declaration of love as input to a one-line awk program, using the pipeline redirection symbol (|).
$ echo "Belle marquise, vos beaux \\yeux me font mourir d'amour." |awk '{print $9" "$8" "$6" "$7" "$1" "$2" "$3" "$4" "$5}'d'amour. mourir me font Belle marquise, vos beaux yeuxThe output of the echo command is not displayed. What is displayed is the output of the awk program, of which the output of the echo command, Mr. Jourdain’s declaration of love, was the input.
However, the final output is not what was intended. The fields do not correspond exactly to words. The awk command therefore needs to be refined.
It’s simpler to turn to sed. sed selects sets of characters in lines, either quoted literally or via metacharacters in regular expressions. A well-known regular expression metacharacter is *, indicating zero or an indefinite number of characters on the command line, as in:
$ ls *.rstsed also supports back references, which display the value corresponding to a previously found literal or rational expression at the desired location. Fortunately for us, Mr. Jourdain’s declaration of love contains exactly nine words, which is the maximum number of back references possible.
$ echo "Belle marquise, vos beaux \\yeux me font mourir d'amour." |sed "s#\(.*\) \(.*\), \(.*\) \(.*\) \(.*\) \(.*\) \(.*\) \\\(.*\)\(d'.*\)#\9 \8 \6 \7, \1 \2, \3 \4 \5#"d'amour. mourir me font, Belle marquise, vos beaux yeuxWe’ve run into the same problem: the regular expression .* doesn’t correspond to a word, but to a series of characters, including punctuation. We must then use the \<.*\> form, which corresponds to a word such as those used by Mr. Jourdain to create prose. We’re going to use escape characters (backslash \) so that the < and > signs are not interpreted literally under certain consoles, but as metacharacters with a special function:
$ export \p="\(\<.*\>\) \(\<.*\>\), \(\<.*\>\) \(\<.*\>\) \\\(\<.*\>\) \(\<.*\>\) \(\<.*\>\) \(\<.*\>\) \(d'\<.*\>\)"$ echo "Belle marquise, vos beaux \\yeux me font mourir d'amour." |sed "s#$p#\9 \8 \6 \7, \1 \2, \3 \4 \5#"d'amour mourir me font, Belle marquise, vos beaux yeux.We could also use the [[:alpha:]]* form, which is more readable but less concise:
$ export a="[[:alpha:]]"$ export n="\($a*\) \($a*\), \($a*\) \($a*\) \($a*\) \\\($a*\) \($a*\) \($a*\) \(d'$a*\)"$ echo "Belle marquise, vos beaux \\yeux me font mourir d'amour." |sed "s#$n#\9 \8 \6 \7, \1 \2, \3 \4 \5#"d'amour mourir me font, Belle marquise, vos beaux yeux.That’s better, but we’ve got a capitalization problem. So we’re going to use the judiciously placed /u and /l operators. First, we’ll export some variables to make the script more concise and readable:
$ export w="\(\<.*\>)"$ export m="$w $w, $w $w $w $w $w $w"$ echo "Belle marquise, vos beaux \\yeux me font mourir d'amour." |sed "s#$m \(d'\<.*\>\)#\u\9 \8 \6 \7, \l\1 \2, \3 \4 \5#"D'amour mourir me font, belle marquise, vos beaux yeux.We can now easily redistribute the back references to get all the variations of the philosophy teacher:
$ echo "Belle marquise, vos beaux \\yeux me font mourir d'amour." |sed "s#$m \(d'\<.*\>\)#\u\3 \5 \4 \9 \6 \7, \l\1 \2, \8#"Vos yeux beaux d'amour me font, belle marquise, mourir.$ echo "Belle marquise, vos beaux \\\eyes make me die of love." |sed "s#$m \(d'\<.*\>)# \u\8 \3 \4 \5, \l\1 \2, \9 \6 \7#"Mourir vos beaux yeux, belle marquise, d'amour me font.$ echo "Belle marquise, vos beaux \\yeux me font mourir d'amour." |sed "s#$m \(d'\<.*\>\)#\u\6 \7 \3 \5 \4 \8, \l\1 \2, \9#"Me font vos yeux beaux mourir, belle marquise, d'amour.Molière and GNU/Linux
Section titled “Molière and GNU/Linux”Let’s rewrite the dialogue between M. Jourdain and his philosophy teacher in geek style:
MONSIEUR JOURDAIN:
I’d like to show him on the standard output:
$ Belle marquise, vos beaux yeux me font mourir d'amour.But I wish it were put in a gallant way, that it were turned nicely.
PHILOSOPHY TEACHER:
: They can be put first as you said:
$ echo "Belle marquise, vos beaux \\yeux me font mourir d'amour."Or:
$ export declaration="Belle marquise, vos \\beaux yeux me font mourir d'amour."$ echo $declarationOr:
$ export w="\(\<.*\>\)"$ export m="$w $w, $w $w $w $w $w $w"$ echo $declaration |sed "s#$m \(d'\<.*\>\)#\u\9 \8 \6 \7, \l\1 \2, \3 \4 \5#"Or else:
echo $declaration |sed "s#$m \(d'\<.*\>)#\u\3 \5 \4 \9 \6 \7, \l\1 \2, \8#"Or else:
echo $declaration |sed "s#$m \(d'\<.*\>)#\u\8 \3 \4 \5, \l\1 \2, \9 \6 \7#"Or else:
echo $declaration |sed "s#$m \(d'\<.*\>)#\u\6 \7 \3 \5 \4 \8, \l\1 \2, \9#"A lot of effort…
Section titled “A lot of effort…”Admittedly, a lot of effort for not very much, you might say. But imagine a file containing 1000 sentences of the same structure:
Dear doctor, these great misfortunes make you weep with bitterness. Little boy, this good ice cream makes you salivate with envy. Vast ocean, the strong swell makes you pitch with drunkenness.
This might be uncommon, but it is typical in technical documentation to find sentences with the same structure, for reasons of stylistic homogeneity.
To carry out our tests on a sample, let’s place the three sentences above in a file:
$ echo "Cher docteur, ces grands malheurs \\vous font pleurer d'amertume." > variations.txt
$ echo "Petit garçon, cette bonne glace te \\fait saliver d'envie." >> variations.txt
$ echo "Vaste océan, la forte houle te \\fait tanguer d'ivresse." >> variations.txtLet’s place the various sed commands in a different script each:
$ echo "s#$p#\u\9 \8 \6 \7, \l\1 \2, \3 \4 \5#" > moliere1.sed$ echo "s#$p#\u\3 \5 \4 \9 \6 \7, \l\1 \2, \8#" > moliere2.sed$ echo "s#$p#\u\8 \3 \4 \5, \l\1 \2, \9 \6 \7#" > moliere3.sed$ echo "s#$p#\u\6 \7 \3 \5 \4 \8, \l\1 \2, \9#" > moliere4.sedNow let’s loop through all the sed scripts on all the lines in the file:
$ for (( i=1; i<5; i++ )); do while read s; do echo "$s" | sed -f moliere$i.sed ; done < variations.txt doneD'amertume pleurer vous font, cher docteur, ces grands malheurs.D'envie saliver te fait, petit garçon, cette bonne glace.D'ivresse tanguer te fait, vaste océan, la forte houle.Ces malheurs grands d'amertume vous font, cher docteur, pleurer.Cette glace bonne d'envie te fait, petit garçon, saliver.La houle forte d'ivresse te fait, vaste océan, tanguer.Pleurer ces grands malheurs, cher docteur, d'amertume vous font.Saliver cette bonne glace, petit garçon, d'envie te fait.Tanguer la forte houle, vaste océan, d'ivresse te fait.Vous font ces malheurs grands pleurer, cher docteur, d'amertume.Te fait cette glace bonne saliver, petit garçon, d'envie.Te fait la houle forte tanguer, vaste océan, d'ivresse.And there it is. In just a few moments, without ever opening a single file, we apply a series of complex operations to an indefinite number of sentences of the same structure. This is not feasible with a word processor or any other tool with a graphical interface or with binary files.