Awk.Info

"Cause a little auk awk
goes a long way."

About awk.info
 »  table of contents
 »  featured topics
 »  page tags


About Awk
 »  advocacy
 »  learning
 »  history
 »  Wikipedia entry
 »  mascot
Implementations
 »  Awk (rarely used)
 »  Nawk (the-one-true, old)
 »  Gawk (widely used)
 »  Mawk
 »  Xgawk (gawk + xml + ...)
 »  Spawk (SQL + awk)
 »  Jawk (Awk in Java JVM)
 »  QTawk (extensions to gawk)
 »  Runawk (a runtime tool)
 »  platform support
Coding
 »  one-liners
 »  ten-liners
 »  tips
 »  the Awk 100
Community
 »  read our blog
 »  read/write the awk wiki
 »  discussion news group

Libraries
 »  Gawk
 »  Xgawk
 »  the Lawker library
Online doc
 »  reference card
 »  cheat sheet
 »  manual pages
 »  FAQ

Reading
 »  articles
 »  books:

WHAT'S NEW?

Mar 01: Michael Sanders demos an X-windows GUI for AWK.

Mar 01: Awk100#24: A. Lahm and E. de Rinaldis' patent search, in AWK

Feb 28: Tim Menzies asks this community to write an AWK cookbook.

Feb 28: Arnold Robbins announces a new debugger for GAWK.

Feb 28: Awk100#23: Premysl Janouch offers a IRC bot, In AWK

Feb 28: Updated: the AWK FAQ

Feb 28: Tim Menzies offers a tiny content management system, in Awk.

Jan 31: Comment system added to awk.info. For example, see discussion bottom of ?keys2awk

Jan 31: Martin Cohen shows that Gawk can handle massively long strings (300 million characters).

Jan 31: The AWK FAQ is being updated. For comments/ corrections/ extensions, please mail tim@menzies.us

Jan 31: Martin Cohen finds Awk on the Android platform.

Jan 31: Aleksey Cheusov released a new version of runawk.

Jan 31: Hirofumi Saito contributes a candidate Awk mascot.

Jan 31: Michael Sanders shows how to quickly build an AWK GUI for windows.

Jan 31: Hyung-Hwan Chung offers QSE, an embeddable Awk Interpreter.

[More ...]

Bookmark and Share

categories: Macros,Tools,Mar,2009,Timm

Macros

These pages focus on macro pre-processors (a natural application for Awk).


categories: Awk100,Macros,Tools,Mar,2009,JonB

m1 : A Micro Macro Processor

Contents

Synopsis

awk -f m1.awk [file...]

Download

Download from LAWKER.

Description

M1 is a simple macro language that supports the essential operations of defining strings and replacing strings in text by their definitions. It also provides facilities for file inclusion and for conditional expan- sion of text. It is not designed for any particular application, so it is mildly useful across several applications, including document preparation and programming. This paper describes the evolution of the program; the final version is implemented in about 110 lines of Awk.

M1 copies its input file(s) to its output unchanged except as modified by certain "macro expressions." The following lines define macros for subsequent processing:

 @comment Any text
 @@                     same as @comment
 @define name value
 @default name value    set if name undefined
 @include filename
 @if varname            include subsequent text if varname != 0
 @unless varname        include subsequent text if varname == 0
 @fi                    terminate @if or @unless
 @ignore DELIM          ignore input until line that begins with DELIM
 @stderr stuff          send diagnostics to standard error

A definition may extend across many lines by ending each line with a backslash, thus quoting the following newline.

Any occurrence of @name@ in the input is replaced in the output by the corresponding value.

@name at beginning of line is treated the same as @name@.

Applications

Form Letters

We'll start with a toy example that illustrates some simple uses of m1. Here's a form letter that I've often been tempted to use:

@default MYNAME Jon Bentley 
@default TASK respond to your special offer 
@default EXCUSE the dog ate my homework 
Dear @NAME@: 
    Although I would dearly love to @TASK@, 
I am afraid that I am unable to do so because @EXCUSE@. 
I am sure that you have been in this situation 
many times yourself. 
            Sincerely, 
            @MYNAME@ 

If that file is namedsayno.mac, it might be invoked with this text:

@define NAME Mr. Smith 
@define TASK subscribe to your magazine 
@define EXCUSE I suddenly forgot how to read 

Recall that a @default takes effect only if its variable was not previously @defined.

Troff Pre-Processing

I've found m1 to be a handy Troff preprocessor. Many of my text files (including this one) start with m1 definitions like:

@define ArrayFig @StructureSec@.2 
@define HashTabFig @StructureSec@.3 
@define TreeFig @StructureSec@.4 
@define ProblemSize 100 

Even a simple form of arithmetic would be useful in numeric sequences of definitions. The longer m1 variables get around Troff's dreadful two-character limit on string names; these variables are also avail- able to Troff preprocessors like Pic and Eqn. Various forms of the @define, @if, and @include facilities are present in some of the Troff-family languages (Pic and Troff) but not others (Tbl); m1 provides a consistent mechanism.

I include figures in documents with lines like this:

@define FIGNUM @FIGMFMOVIE@ 
@define FIGTITLE The Multiple Fragment heuristic. 
@FIGSTART@ 
<PS> <@THISDIR@/mfmovie.pic</PS>
@FIGEND@ 

The two @defines are a hack to supply the two parameters of number and title to the figure. The figure might be set off by horizontal lines or enclosed in a box, the number and title might be printed at the top or the bottom, and the figures might be graphs, pictures, or animations of algorithms. All figures, though, are presented in the consistent format defined by FIGSTART and FIGEND.

Awk Library Management

I have also used m1 as a preprocessor for Awk programs. The @include statement allows one to build simple libraries of Awk functions (though some- but not all- Awk implementations provide this facility by allowing multiple program files). File inclusion was used in an earlier version of this paper to include individual functions in the text and then wrap them all together into the completem1 program. The conditional statements allow one to customize a program with macros rather than run-time if statements, which can reduce both run time and compile time.

Controlling Experiments

The most interesting application for which I've used this macro language is unfortunately too complicated to describe in detail. The job for which I wrote the original version of m1 was to control a set of experiments. The experiments were described in a language with a lexical structure that forced me to make substitutions inside text strings; that was the original reason that substitutions are bracketed by at-signs. The experiments are currently controlled by text files that contain descriptions in the experiment language, data extraction programs written in Awk, and graphical displays of data written in Grap; all the programs are tailored bym1commands.

Most experiments are driven by short files that set a few keys parameters and then@includea large file with many @defaults. Separate files describe the fields of shared databases:

 @define N ($1) 
 @define NODES ($2) 
 @define CPU ($3) 
 ... 

These files are @included in both the experiment files and in Troff files that display data from the databases. I had tried to conduct a similar set of experiments before I built m1, and got mired in muck. The few hours I spent building the tool were paid back handsomely in the first days I used it.

The Substitution Function

M1 uses as fast substitution function. The idea is to process the string from left to right, searching for the first substitution to be made. We then make the substitution, and rescan the string starting at the fresh text. We implement this idea by keeping two strings: the text processed so far is in L (for Left), and unprocessed text is in R (for Right). Here is the pseudocode for dosubs:

L = Empty 
R = Input String 
while R contains an "@" sign do 
	let R = A @ B; set L = L A and R = B 
	if R contains no "@" then 
		L = L "@" 
		break 
	let R = A @ B; set M = A and R = B 
	if M is in SymTab then 
		R = SymTab[M] R 
	else 
		L = L "@" M 
		R = "@" R 
	return L R 

Possible Extensions

There are many ways in which them1program could be extended. Here are some of the biggest temptations to "creeping creaturism":

  • A long definition with a trail of backslashes might be more graciously expressed by a @longdefinestatement terminated by a@longend.
  • An @undefinestatement would remove a definition from the symbol table.
  • I've been tempted to add parameters to macros, but so far I have gotten around the problem by using an idiom described in the next section.
  • It would be easy to add stack-based arithmetic and strings to the language by adding@pushand @popcommands that read and write variables.
  • As soon as you try to write interesting macros, you need to have mechanisms for quoting strings (to postpone evaluation) and for forcing immediate evaluation.

Code

The following code is short (around 100 lines), which is significantly shorter than other macro processors; see, for instance, Chapter 8 of Kernighan and Plauger [1981]. The program uses several techniques that can be applied in many Awk programs.

  • Symbol tables are easy to implement with Awk┐s associative arrays.
  • The program makes extensive use of Awk's string-handling facilities: regular expressions, string concatenation, gsub, index, andsubstr.
  • Awk's file handling makes the dofile procedure straightforward.
  • The readline function and pushback mechanism associated with buffer are of general utility.

error

function error(s) {
	print "m1 error: " s | "cat 1>&2"; exit 1
}

dofile

function dofile(fname,  savefile, savebuffer, newstring) {
	if (fname in activefiles)
		error("recursively reading file: " fname)
	activefiles[fname] = 1
	savefile = file; file = fname
	savebuffer = buffer; buffer = ""
	while (readline() != EOF) {
		if (index($0, "@") == 0) {
			print $0
		} else if (/^@define[ \t]/) {
			dodef()
		} else if (/^@default[ \t]/) {
			if (!($2 in symtab))
				dodef()
		} else if (/^@include[ \t]/) {
			if (NF != 2) error("bad include line")
			dofile(dosubs($2))
		} else if (/^@if[ \t]/) {
			if (NF != 2) error("bad if line")
			if (!($2 in symtab) || symtab[$2] == 0)
				gobble()
		} else if (/^@unless[ \t]/) {
			if (NF != 2) error("bad unless line")
			if (($2 in symtab) && symtab[$2] != 0)
				gobble()
		} else if (/^@fi([ \t]?|$)/) { # Could do error checking here
		} else if (/^@stderr[ \t]?/) {
			print substr($0, 9) | "cat 1>&2"
		} else if (/^@(comment|@)[ \t]?/) {
		} else if (/^@ignore[ \t]/) { # Dump input until $2
			delim = $2
			l = length(delim)
			while (readline() != EOF)
				if (substr($0, 1, l) == delim)
					break
		} else {
			newstring = dosubs($0)
			if ($0 == newstring || index(newstring, "@") == 0)
				print newstring
			else
				buffer = newstring "\n" buffer
		}
	}
	close(fname)
	delete activefiles[fname]
	file = savefile
	buffer = savebuffer
}

readline

Put next input line into global string "buffer". Return "EOF" or "" (null string).

function readline(  i, status) {
	status = ""
	if (buffer != "") {
		i = index(buffer, "\n")
		$0 = substr(buffer, 1, i-1)
		buffer = substr(buffer, i+1)
	} else {
		# Hume: special case for non v10: if (file == "/dev/stdin")
		if (getline <file <= 0)
			status = EOF
	}
	# Hack: allow @Mname at start of line w/o closing @
	if ($0 ~ /^@[A-Z][a-zA-Z0-9]*[ \t]*$/)
		sub(/[ \t]*$/, "@")
	return status
}

gobble

function gobble(  ifdepth) {
	ifdepth = 1
	while (readline() != EOF) {
		if (/^@(if|unless)[ \t]/)
			ifdepth++
		if (/^@fi[ \t]?/ && --ifdepth <= 0)
			break
	}
}

dosubs

function dosubs(s,  l, r, i, m) {
	if (index(s, "@") == 0)
		return s
	l = ""	# Left of current pos; ready for output
	r = s	# Right of current; unexamined at this time
	while ((i = index(r, "@")) != 0) {
		l = l substr(r, 1, i-1)
		r = substr(r, i+1)	# Currently scanning @
		i = index(r, "@")
		if (i == 0) {
			l = l "@"
			break
		}
		m = substr(r, 1, i-1)
		r = substr(r, i+1)
		if (m in symtab) {
			r = symtab[m] r
		} else {
			l = l "@" m
			r = "@" r
		}
	}
	return l r
}

docodef

function dodef(fname,  str, x) {
	name = $2
	sub(/^[ \t]*[^ \t]+[ \t]+[^ \t]+[ \t]*/, "")  # OLD BUG: last * was +
	str = $0
	while (str ~ /\\$/) {
		if (readline() == EOF)
			error("EOF inside definition")
		# OLD BUG: sub(/\\$/, "\n" $0, str)
		x = $0
		sub(/^[ \t]+/, "", x)
		str = substr(str, 1, length(str)-1) "\n" x
	}
	symtab[name] = str
}

BEGIN

BEGIN {	
    EOF = "EOF"
	if (ARGC == 1)
		dofile("/dev/stdin")
	else if (ARGC >= 2) {
		for (i = 1; i < ARGC; i++)
			dofile(ARGV[i])
	} else
		error("usage: m1 [fname...]")
}

Bugs

M1 is three steps lower than m4. You'll probably miss something you have learned to expect.

History

M1 was documented in the 1997 sedawk book by Dale Dougherty & Arnold Robbins (ISBN 1-56592-225-5) but may have been written earlier.

This page was adapted from 131.191.66.141:8181/UNIX_BS/sedawk/examples/ch13/m1.pdf (download from LAWKER).

Author

Jon L. Bentley.


categories: Macros,Tools,Mar,2009,WillW

m5 - macro processor

Download

Download from LAWKER.

Synopsis

m5 [ -Dname ] [ -Dname=def ] [-c] [ -dp char ] 
   [ -o file ] [-sp char ] [ file ... ]
 
[g|n]awk -f m5.awk X [ -Dname ] [ -Dname=def ]  [-c]  [ -dp char ] 
                     [ -o file ] [ -sp char ] [ file ... ]

Description

M5 is a Bourne shell script for invoking m5.awk, which actu- ally performs the macro processing. m5, unlike many macroprocessors, does not directly interpret its input. Instead it uses a two-pass approach in which the first pass translates the input to an awk program, and the second pass executes the awk program to produce the final output. Details of usage are provided below.

This two pass sytem means that macros can contain awk commands, to be executed on the second pass. This greatly extends the expressability of the m5 macro system.

As noted in the synopsis above, its invocation may require specification of awk, gawk, or nawk, depending on the ver- sion of awk available on your system. This choice is further complicated on some systems, e.g. Sun, which have both awk (original awk) and nawk (new awk). Other systems appear to have new awk, but have named it just awk. New awk should be used, regardless of what it has been named. The macro processor translator will not work using original awk because the former, for example, uses the built-in function match().

Options

The following options are supported:

-Dname
Following the cpp convention, define name as 1 (one). This is the same as if a -Dname=1 appeared as an option or #name=1 appeared as an input line. Names specified using -D are awk variables defined just before main is invoked.
-Dname=def
Define name as "def". This is the same as if #name="def" appeared as an input line. Names specified using -D are awk variables defined just before main is invoked.
X
Yes, that really is a capital "X". The ver- sion of nawk on Sun Solaris 2.5.1 apparently does its own argument processing before pass- ing the arguments on to the awk program. In this case, X (and all succeeding options) are believed by nawk to be file names and are passed on to the macro processor translator (m5.awk) for its own argument processing). Without the X, Sun nawk attempts to process succeeding options (e.g., -Dname) as valid nawk arguments or files, thus causing an error. This may not be a problem for all awks.
-c
Compile only. The output program is still produced, but the final output is not.
-dp char
The directive prefix character (default is #).
-o file
The output program file (default is a.awk).
-sp char
The substitution prefix character (default is $).

Usage

Overview

The program that performs the first pass noted above is called the m5 translator and is named m5.awk. The input to the translator may be either standard input or one or more files listed on the command line. An input line with the directive prefix character (# by default) in column 1 is treated as a directive statement in the MP directive language (awk). All other input lines are processed as text lines. Simple macros are created using awk assignment statements and their values referenced using the substitu- tion prefix character ($ by default). The backslash (\) is the escape character; its presence forces the next character to literally appear in the output. This is most useful when forcing the appearance of the directive prefix character, the substitution prefix character, and the escape character itself.

Macro Substitution

All input lines are scanned for macro references that are indicated by the substitution prefix character. Assuming the default value of that character, macro references may be of the form $var, $(var), $(expr), $[str], $var[expr], or $func(args). These are replaced by an awk variable, awk variable, awk expression, awk array reference to the special array M[], regular awk array reference, or awk function call, respectively. These are, in effect, macros. The MP translator checks for proper nesting of parentheses and dou- ble quotes when translating $(expr) and $func(args) macros, and checks for proper nesting of square brackets and double quotes when translating $[expr] and $var[expr] macros. The substitution prefix character indicates a a macro reference unless it is (i) escaped (e.g., \$abc), (ii) followed by a character other than A-Z, a-z, (, or [ (e.g., $@), or (iii) inside a macro reference (e.g., $($abc); probably an error).

An understanding of the implementation of macro substitution will help in its proper usage. When a text line is encoun- tered, it is scanned for macros, embedded in an awk print statement, and copied to the output program. For example, the input line

The quick $fox jumped over the lazy $dog.

is transformed into

print "The quick " fox " jumped over the lazy " dog "."

Obviously the use of this transformation technique relies completely on the presence of the awk concatenation operator (one or more blanks).

Macros Containing Macros

As already noted, a macro reference inside another macro reference will not result in substitution and will probably cause an awk execution-time error. Furthermore, a substitution prefix character in the substituted string is also generally not significant because the substitution pre- fix character is detected at translation time, and macro values are assigned at execution time. However, macro references of the form $[expr] provide a simple nested referencing capability. For example, if $[abc] is in a text line, or in a directive line and not on the left hand side of an assignment statement, it is replaced by eval(M["abc"])/. When the output program is executed, the m5 runtime routine eval()/ substitutes the value of M["abc"] examining it for further macro references of the form $[str] (where "str" denotes an arbitrary string). If one is found, substitution and scanning proceed recursively. Function type macro references may result in references to other mac- ros, thus providing an additional form of nested referenc- ing.

Directive Lines

Except for the include directive, when a directive line is detected, the directive prefix is removed, the line is scanned for macros, and then the line is copied to the out- put program (as distinct from the final output). Any valid awk construct, including the function statement, is allowed in a directive line. Further information on writing awk programs may be found in Aho, Kernighan, and Weinberger, Dougherty and Robbins, and Robbins.

Include Directive

A single non-awk directive has been provided: the include directive. Assuming that # is the directive prefix, #include(filename) directs the MP translator to immediately read from the indicated file, processing lines from it in the normal manner. This processing mode makes the include directive the only type of directive to take effect at translation time. Nested includes are allowed. Include directives must appear on a line by themselves. More ela- borate types of file processing may be directly programmed using appropriate awk statements in the input file.

Main Program and Functions

The MP translator builds the resulting awk program in one of two ways, depending on the form of the first input line. If that line begins with "function", it is assumed that the user is providing one or more functions, including the func- tion "main" required by m5. If the first line does not begin with "function", then the entire input file is translated into awk statements that are placed inside "main". If some input lines are inside functions, and oth- ers are not, awk will will detect this and complain. The MP by design has little awareness of the syntax of directive lines (awk statements), and as a consequence syntax errors in directive lines are not detected until the output program is executed.

Output

Finally, unless the -c (compile only) option is specified on the command line, the output program is executed to produce the final output (directed by default to standard output). The version of awk specified in ARGV[0] (a built-in awk variable containing the command name) is used to execute the program. If ARGV[0] is null, awk is used.

EXAMPLE

Understanding this example requires recognition that macro substitution is a two-step process: (i) the input text is translated into an output awk program, and (ii) the awk program is executed to produce the final output with the macro substitutions actually accomplished. The examples below illustrate this process. # and $ are assumed to be the directive and substitution prefix characters. This example was successfully executed using awk on a Cray C90 running UNICOS 10.0.0.3, gawk on a Gateway E-3200 runing SuSE Linux Version 6.0, and nawk on a Sun Ultra 2 Model 2200 running Solaris 2.5.1.

Input Text

#function main() {

   Example 1: Simple Substitution
   ------------------------------
#  br = "brown"
   The quick $br fox.

   Example 2: Substitution inside a String
   ---------------------------------------
#  r = "row"
   The quick b$(r)n fox.

   Example 3: Expression Substitution
   ----------------------------------
#  a = 4
#  b = 3
   The quick $(2*a + b) foxes.

   Example 4: Macros References inside a Macro
   -------------------------------------------
#  $[fox] = "\$[q] \$[b] \$[f]"
#  $[q] = "quick"
#  $[b] = "brown"
#  $[f] = "fox"
   The $[fox].

   Example 5: Array Reference Substitution
   ---------------------------------------
#  x[7] = "brown"
#  b = 3
   The quick $x[2*b+1] fox.

   Example 6: Function Reference Substitution
   ------------------------------------------
   The quick $color(1,2) fox.

   Example 7: Substitution of Special Characters
   ---------------------------------------------
\#  The \$ quick \\ brown $# fox. $$
#}
#include(testincl.m5)

Included File testincl.m5

#function color(i,j) {
   The lazy dog.
#  if (i == j)
#     return "blue"
#  else
#     return "brown"
#}

Output Program

function main() {
   print
   print "   Example 1: Simple Substitution"
   print "   ------------------------------"
   br = "brown"
   print "   The quick " br " fox."
   print
   print "   Example 2: Substitution inside a String"
   print "   ---------------------------------------"
   r = "row"
   print "   The quick b" r "n fox."
   print
   print "   Example 3: Expression Substitution"
   print "   ----------------------------------"
   a = 4
   b = 3
   print "   The quick " 2*a + b " foxes."
   print
   print "   Example 4: Macros References inside a Macro"
   print "   -------------------------------------------"
   M["fox"] = "$[q] $[b] $[f]"
   M["q"] = "quick"
   M["b"] = "brown"
   M["f"] = "fox"
   print "   The " eval(M["fox"]) "."
   print
   print "   Example 5: Array Reference Substitution"
   print "   ---------------------------------------"
   x[7] = "brown"
   b = 3
   print "   The quick " x[2*b+1] " fox."
   print
   print "   Example 6: Function Reference Substitution"
   print "   ------------------------------------------"
   print "   The quick " color(1,2) " fox."
   print
   print "   Example 7: Substitution of Special Characters"
   print "   ---------------------------------------------"
   print "\#  The \$ quick \\ brown $# fox. $$"
}
function color(i,j) {
   print "   The lazy dog."
   if (i == j)
      return "blue"
   else
      return "brown"
}

function eval(inp   ,isplb,irb,out,name) {

   splb = SP "["
   out = ""

   while( isplb = index(inp, splb) ) {
      irb = index(inp, "]")
      if ( irb == 0 ) {
         out = out substr(inp,1,isplb+1)
         inp = substr( inp, isplb+2 )
      } else {
         name = substr( inp, isplb+2, irb-isplb-2 )
         sub( /^ +/, "", name )
         sub( / +$/, "", name )
         out = out substr(inp,1,isplb-1) eval(M[name])
         inp = substr( inp, irb+1 )
      }
   }

   out = out inp

   return out
}
BEGIN {
   SP = "$"
   main()
   exit
}

Final Output

   Example 1: Simple Substitution
   ------------------------------
   The quick brown fox.

   Example 2: Substitution inside a String
   ---------------------------------------
   The quick brown fox.

   Example 3: Expression Substitution
   ----------------------------------
   The quick 11 foxes.

   Example 4: Macros References inside a Macro
   -------------------------------------------
   The quick brown fox.

   Example 5: Array Reference Substitution
   ---------------------------------------
   The quick brown fox.

   Example 6: Function Reference Substitution
   ------------------------------------------
   The lazy dog.
   The quick brown fox.

   Example 7: Substitution of Special Characters
   ---------------------------------------------
#  The $ quick \ brown $# fox. $$

File

a.awk is the default output program file.

See Also

awk(1), cpp(1), gawk(1), m4(1), nawk(1). vi(1)

Author

William A. Ward, Jr., School of Computer and Information Sciences, University of South Alabama, Mobile, Alabama, July 23, 1999.

blog comments powered by Disqus