Awk.Info

"Cause a little auk awk
goes a long way."

About awk.info
 »  table of contents
 »  featured topics
 »  page tags


About Awk
 »  advocacy
 »  learning
 »  history
 »  Wikipedia entry
 »  mascot
Implementations
 »  Awk (rarely used)
 »  Nawk (the-one-true, old)
 »  Gawk (widely used)
 »  Mawk
 »  Xgawk (gawk + xml + ...)
 »  Spawk (SQL + awk)
 »  Jawk (Awk in Java JVM)
 »  QTawk (extensions to gawk)
 »  Runawk (a runtime tool)
 »  platform support
Coding
 »  one-liners
 »  ten-liners
 »  tips
 »  the Awk 100
Community
 »  read our blog
 »  read/write the awk wiki
 »  discussion news group

Libraries
 »  Gawk
 »  Xgawk
 »  the Lawker library
Online doc
 »  reference card
 »  cheat sheet
 »  manual pages
 »  FAQ

Reading
 »  articles
 »  books:

WHAT'S NEW?

Mar 01: Michael Sanders demos an X-windows GUI for AWK.

Mar 01: Awk100#24: A. Lahm and E. de Rinaldis' patent search, in AWK

Feb 28: Tim Menzies asks this community to write an AWK cookbook.

Feb 28: Arnold Robbins announces a new debugger for GAWK.

Feb 28: Awk100#23: Premysl Janouch offers a IRC bot, In AWK

Feb 28: Updated: the AWK FAQ

Feb 28: Tim Menzies offers a tiny content management system, in Awk.

Jan 31: Comment system added to awk.info. For example, see discussion bottom of ?keys2awk

Jan 31: Martin Cohen shows that Gawk can handle massively long strings (300 million characters).

Jan 31: The AWK FAQ is being updated. For comments/ corrections/ extensions, please mail tim@menzies.us

Jan 31: Martin Cohen finds Awk on the Android platform.

Jan 31: Aleksey Cheusov released a new version of runawk.

Jan 31: Hirofumi Saito contributes a candidate Awk mascot.

Jan 31: Michael Sanders shows how to quickly build an AWK GUI for windows.

Jan 31: Hyung-Hwan Chung offers QSE, an embeddable Awk Interpreter.

[More ...]

Bookmark and Share

categories: SysAdmin,Papers,WhyAwk,Apr,2009,HenryS

Awk: A Systems Programming Language?

At the Proceedings of the Winter Usenix Conference (Dallas'91) Henry Spencer wrote in Awk As A Major Systems Programming Language that...

    ...even experienced Unix programmers often don't know awk, or know it but view it as a counterpart of sed: useful "glue" for sticking things together in shell programming, but quite unsuited for major programming tasks. This is a major underestimate of a very powerful tool, and has hampered the development of support software that would make awk much more useful.

    There is no fundamental reason why awk programs have to be small "glue" programs: even the "old" awk is a powerful programming language in its own right. Effective use of its data structures and its stream-oriented structure takes some adjustment for C programmers, but the results can be quite striking.

    On the other hand, getting there can be a bit painful, and improvements in both the language and its support tools would help.

In 2009, Arnold Robbins comments:

    The paper is still interesting, although some bits are outdated (we now have a profiler, for instance).

categories: Awk100,Interpreters,Apr,2009,HenryS

AASL: Parser Genrator in Awk

Download

Download from LAWKER

Synopsis

aaslg [ -x ] [ file ... ]
aaslr [ -x ] table [ file ... ]

Description

Aaslg and aaslr implement the Amazing Awk Syntax Language, AASL (pro- nounced ``hassle''). Aaslg (pronounced ``hassling'') takes an AASL specification from the concatenation of the file(s) (default standard input) and emits the corresponding AASL table on standard output. Aaslr parses the contents of the file(s) (default standard input) according to the AASL table in file table, emitting the table's output on standard output.

Both take a -x option to turn on verbose and cryptic debugging output. Both look in a library directory for pieces of the AASL system; the AASLDIR environment variable, if present, overrides the default notion of the location of this directory.

Aaslr expects input to consist of input tokens, one per line. For sim- ple tokens, the line is just the text of the token. For metatokens like ``identifier'', the line is the metatoken's name, a tab, and the text of the token. [xxx discuss `#' lines]

Aaslr output, in the absence of syntax errors, consists of the input tokens plus action tokens, which are lines consisting of `#!' followed immediately by an identifier. If the syntax of the input does not match that specified in the AASL table, aaslr emits complaint(s) on standard error and attempts to repair the input into a legal form; see ``ERROR REPAIR'' below. Unless errors have cascaded to the point where aaslr gives up (in which case it emits the action token ``#!aargh'' to inform later passes of this), the output will always conform to the AASL syntax given in the table.

Normally, a complete program using AASL consists of three passes, the middle one being an invocation of aaslr. The first pass is a lexical analyzer, which breaks free-form input down into input tokens in some suitable way. The third pass is a semantics interpreter, which typi- cally responds to input tokens by momentarily remembering them and to action tokens by executing some action, often using the remembered value of the previous input token. Aaslg is in fact implemented using AASL, following this structure; it implements the -x option by just passing it to aaslr.

AASL Specifications

An AASL specification consists of class definitions, text definitions, and rules, in arbitrary order (except that class definitions must pre- cede use of the classes they define). A `#' (not enclosed in a string) begins a comment; characters from it to the end of the line are ignored. An identifier follows the same rules as a C identifier, except that in most contexts it can be at most 16 characters long. A string is enclosed in double quotes ("") and generally follows C syn- tax. Most strings denote input tokens, and references to ``input token'' as part of AASL specification syntax should be read as ``string denoting input token''.

A class definition is an identifier enclosed in angle brackets (<>) followed by one or more input tokens followed by a semicolon (;). It gives a name to a set of input tokens. Classes whose names start with capital letters are user abbreviations; see below. Classes whose names start with lowercase letters are special classes, used for internal purposes. The current special classes are:

trivial
tokens which the parser can discard at will, in the expectation that they might be inserted erroneously; see ``ERROR REPAIR'' for details.
lineterm
tokens which terminate a logical line for purposes of resyn- chronization in error repair; see ``ERROR REPAIR'' for details.
endmarker
xxx

For example, the class definitions used for AASL itself are:

<trivial> "," ";"   ;
<lineterm> ";" ;
<endmarker> "EOF"   ;

When AASL error repair is invoked, the parser sometimes needs to gener- ate input tokens. In the case of a metatoken, the parser knows the token's name but needs to generate a text for it as well. A text defi- nition consists of an input token, an arrow (->), and a string specify- ing what text should be generated for that token. For example, the text definitions used for AASL itself are:

"id" -> "___"
"string" -> "\"___\""

The rules of a specification define the syntax that the parser should accept. The order of rules is not significant, except that the first rule is considered to be the top level of the specification. The spec- ification is executed by calling the first rule; when execution of that rule terminates, execution of the specification terminates. If the user wishes this to occur only at end of input, he should arrange for the lexical analyzer to produce an endmarker token (conventionally ``EOF'') at the end of the input, and should write the first rule to require that token at the end.

Note that an input token may be recognized considerably before it is accepted, but the parser emits it to the output only on acceptance.

A rule consists of an identifier naming it, a colon (:), a sequence of items which is the body of the rule, and a semicolon (;). When a rule is called, it is executed by executing the individual items of the body in order (as modified by control structures) until either one of them explicitly terminates execution of the rule or the last item is exe- cuted.

An item which is an input token requires that that token appear in the input at that point, and accepts it (causing it to be emitted as out- put).

An item which is an identifier denotes a call to another rule, which executes the body of that rule and then returns to the caller. It is an error to call a nonexistent rule.

An item which is an identifier preceded by `!' causes that identifier to be emitted as an action token; the identifier has no other signifi- cance.

An item which is `<<' causes execution of the current rule to terminate immediately, returning to the calling rule.

An item which is `>>' causes the execution of the innermost enclosing loop (see below) to terminate immediately, with execution continuing after the end of that loop. The loop must be within the same rule.

An item which is an identifier preceded by `@%&!' causes an internal semantic action to be executed within the parser; this is normally needed only for bizarre situations like C's typedef. [xxx should give details I suppose]

A choice is a sequence of branches enclosed in parentheses (()) and separated by vertical bars (|). The first of the branches that can be executed, is, after which execution continues after the end of the choice.

A loop is a sequence of branches enclosed in braces ({}) and separated by vertical bars (|). The first of the branches that can be executed, is, and this is done repeatedly until the loop is terminated by `>>', after which execution continues after the end of the loop. (A loop can also be terminated by `<<' terminating execution of the whole rule.)

A branch is just a sequence of items, like a rule body, except that it must begin with either an input token or a lookahead. If it begins with an input token, it can be executed only when that token is the next token in the input, and execution starts with acceptance of that token.

A lookahead specifies conditions for execution of a branch based on recognizing but not accepting input token(s). The simplest form is just an input token enclosed in brackets ([]), in which case execution of that branch is possible only when that token is the next token in the input. The brackets can also contain multiple input tokens sepa- rated by commas, in which case the parser looks for any of those tokens. If a user-abbreviation class name appears, either by itself or as an element of a comma-separated list, it stands for the list of tokens given in its definition.

If a lookahead's brackets contain only a `*', this is a default branch, executable regardless of the state of the input.

As a very special case, a lookahead's brackets can contain two input tokens separated by slash (/), in which case that branch is executable only when those two tokens, in sequence, are next in the input. Warn- ing: this is implemented by a delicate perversion of the error-repair machinery, and if the first of those tokens is not then accepted, the parser will die in convulsions. A further restriction is that the same input token may not appear as the first token of a double lookahead and as a normal lookahead token in the same choice/loop.

Certain simple choice/loop structures appear frequently, and there are abbreviations for them:

abbreviation	    expansion
( items ?)	        ( items  |  [*] )
{ items ?}	        { items  |  [*] >> }
( ! [look] items ?) ( [ look]  |  items )
{ ! [look] items ?} { [ look] >>  |  items }

For example, here are the rules of the AASL specification for AASL, minus the actions (which add considerable clutter and are unintelligi- ble without the third pass):

	       rules: {
				   "id" ":" contents ";"
				   | "<" "id" ">" {"string" ?} ";"
				   | "string" "->" "string"
				   | "EOF" >>
	       };
	       contents: {
				   ">>"
				   | "<<"
				   | "id"
				   | "!" "id"
				   | "@%&!" "id"
				   | "string"
				   | "(" branches ")"
				   | "{" branches "}"
				   | [*] >>
	       };
	       branches: (
				   "!" "[" look "]" contents "?"
				   | [*] branch (
				   ["|"] {"|" branch ?}
				   | "?" !endbranch
				   | [*]
				   )
	       );
	       branch: (
				   "string" contents
				   | "[" look "]" contents
	       );
	       look: (
				   ["string"/"/"] "string" "/" "string"
				   | "*"
				   | [*] looker {"," looker ?}
	       );
	       looker: ( "string" | "id" ) ;

Error Repair

When the input token is not one of those desired, either because the item being executed is an input token and a different token appears on the input, or because none of the branches of a choice/loop is exe- cutable, error repair is invoked to try to fix things up. Sometimes it can actually guess right and fix the error, but more frequently it merely supplies a legal output so that later passes will not be thrown into chaos by a minor syntax error.

The general error-repair strategy of an AASL parser is to give the parser what it wants and then attempt to resynchronize the input with the parser.

[xxx long discussion of how ``what it wants'' is determined when there are multiple possibilities]

Resynchronization is performed in three stages. The first stage attempts to resynchronize within a logical line, and is applied only if neither the input token nor the desired token is a line terminator (a member of the ``lineterm'' class). If the input token is trivial (a member of the ``trivial'' class), it is discarded. Otherwise it is retained, in hopes that it will be the next token that the parser asks for.

Either way, an error message is produced, indicating what was desired, what was seen, and what was handed to the parser. If too many of these messages have been produced for a single line, the parser gives up, produces a last despairing message, emits a ``#!aargh'' action token to alert later pases, and exits. Barring this disaster, parsing then con- tinues. If the parser at some point is willing to accept the input token, it is accepted and error repair terminates. If a line termina- tor is seen in input, or the parser requests one, before the parser is willing to accept the input token, the second phase begins.

The second stage of resynchronization attempts to line both input and parser up on a line terminator. If the desired token is a line termi- nator and the input token is not, input is discarded until a line ter- minator appears. If the desired token is not a line terminator and the input token is, the input token is retained and parsing continues until the parser asks for a line terminator. Either way, the third phase then begins.

The third stage of resynchronization attempts to reconcile line termi- nators. If the desired and input tokens are identical, the input token is accepted and error repair terminates. If they are not identical and the input token is trivial (yes, line terminators can be trivial, and ones like `;' probably should be), the input token is discarded. If the desired token is the endmarker, then the input token is discarded. Otherwise, the input token continues to be retained in hopes that it will eventually be accepted. [xxx this needs more thought] In any case, the second phase begins again.

Files

all in $AASLDIR:
interp  table interpreter
lex     first pass of aaslg
syn     AASL table for aaslg
sem     third pass of aaslg

See Also

awk(1), yacc(1)

Diagnostics

``error-repair disaster'' means that the first token of a double looka- head could not be accepted and error repair was invoked on it.

History

Written at University of Toronto by Henry Spencer, somewhat in the spirit of S/SL (see ACM TOPLAS April 1982).

Bugs

Some of the restrictions on double lookahead are annoying.

Most of the C string escapes are recognized but disregarded, with only a backslashed double-quote interpreted properly during text generation.

Error repair needs further tuning; it has an annoying tendency to infi- nite-loop in certain odd situations (although the messages/line limit eventually breaks the loop).

Complex choices/loops with many branches can result in very long lines in the table.

Assessment

The implementation of AASL was fairly straight forward, with AASL itself used to describe its own syntax. An AASL specification is compiled into a table, which is then processed by a table-walking interpreter. The interpreter expects input to be as tokens, one per line, much likethe output of a traditional scanner. A complete program using AASL (for example, the AASL table generator) is normally three passes: thescanner,the parser (tables plus interpreter), and a semantics pass. The first set of tables was generated byhand for bootstrapping.

Apart from the minor nuisance of repeated iterations of language design, the biggest problem ofimplementing AASL wasthe question of semantic actions. Inserting awk semantic routines into the table interpreter, in the style of yacc,would not be impossible, but it seemed clumsy and inelegant. Awks lack of anyprovision for compile time initialization of tables strongly suggested reading them in at run time, rather than taking up space with a huge BEGIN action whose only purpose was to initialize the tables. This makes insertions into the interpreters code awkward.

The problem was solved by a crucial observation: traditional compilers (etc.) merge a two-stepprocess, first validating a token stream and inserting semantic action cookiesinto it, then interpreting thestream and the cookies to interface to semantics. Forexample, yaccs grammar notation can be viewed asinserting fragments of C code into a parsed output, and then interpreting that output. This approach yieldsan extremely natural pass structure for an AASL parser,with the parsersoutput stream being (in the absenceof syntax errors) a copy of its input stream with annotations. The following semantic pass then processesthis, momentarily remembering normal tokens and interpreting annotations as operations on the remembered values. (The semantic pass is, in fact, a classic pattern+action awk program, with a pattern and anaction for each annotation, and a general save the value in a variableaction for normal tokens.)

The one difficulty that arises with this method is when the language definition involves feedbackloops between semantics and parsing, an obvious example being Cs typedef.Dealing with this reallydoes require some imbedding of semantics into the interpreter,although with care it need not be much: thein-parser code for recognizing C typedefs, including the complications introduced by block structure andnested redeclarations of type names, is about 40 lines of awk.The in-parser actions are invoked by a special variant of the AASL emit semantic annotationsyntax.

Aside benefit of top-down parsing is that the context of errors is known, and it is relatively easy to implement automatic error recovery. When the interpreter is faced with an input token that does not appearin the list of possibilities in the parser table, it givesthe parser one of the possibilities anyway, and then usessimple heuristics to try to adjust the input to resynchronize. The result is that the parser,and subsequentpasses, always see a syntactically-correct program. (This approach is borrowed from S/SL and its predecessors.) Although the detailed error-recovery algorithm is still experimental, and the current one is notentirely satisfactory when a complex AASL specification does certain things, in general it deals with minorsyntax errors simply and cleanly without anyneed for complicating the specification with details of errorrecovery.Knowing the context of errors also makes it much easier to generate intelligible error messagesautomatically.

The AASL implementation is not large. The scanner is 78 lines of awk,the parser is 61 lines of AASL (using a fairly low-density paragraphing style and a good manycomments), and the semantics pass is 290 lines of awk. The table interpreter is 340 lines, about half of which (and most of the complexity) can be attributed to the automatic error recovery.

As an experiment with a more ambitious AASL specification, one for ANSI C was written. This occupies 374 lines excluding comments and blank lines, andwith the exception of the messy details of Cdeclaratorsis mostly a fairly straightforward transcription of the syntax given in the ANSI standard. Generating tables for this takes about three minutes of CPU time on a Sun 3/180; the tables are about 10K bytes.

The performance of the resulting ANSI C parser is not impressive: in very round numbers, averagedoveralarge program, it parses about one line of C per CPU second. (The scanner,164 lines of awk, accounts for a negligible fraction of this.) Some attention to optimization of both the tables and the interpreter might speed this up somewhat, but remarkable improvements are unlikely. As things stand in the absence of better awk implementations or a rewrite of the table interpreter in C, its a cute toy, possibly of some pedagogical value, but not a useful production tool. On the other hand, there does not appear to be any fundamental reason for the performance shortfall: itspurely the result of the slowexecution of awk programs.

Lessons From AASL

The scanner would be much faster with better regular-expression matching, because it can use regular expressions to determine whether a string is a plausible token but must use substr to extract the string first. Nawk functions would be very handy for modularizing code, especially the complicated and seldom-invoked error-recovery procedure. A switch statement modelled on the pattern+action scheme would be useful in several places.

Another troublesome issue is that arrays are second-class citizens in awk (and continue to be so in nawk): there is no array assignment. This lack leads to endless repetitions of code like:

for (i in array) 
    arraystack[i ":" sp] = array[i] 

whenever block structuring or a stack is desired. Nawk's multi-dimensional arrays supply some syntactic sugar for this but don't really fix the problem. Not only is this code clumsy, it is woefully inefficient compared to something like

arraystack[sp] = array 

even if the implementation is very clever. This significantly reduces the usefulness of arrays as symboltables and the like, a role for which they are otherwise very well suited.

It would also be of some use if there were some way to initialize arrays as constant tables, or alternatively a guarantee that the BEGIN action would be implemented cleverly and would not occupy space after it had finished executing.

A minor nuisance that surfaces constantly is that getting an error message out to the standard-error descriptor is painfully clumsy: one gets to choose between putting error messages out to a temporary file and having a shell "wrapper" process them later, or piping them into "cat >&2" (!).

The multi-pass input-driven structure that awk naturally lends itself to produces very clean and readable code with different phases neatly separated, but creates substantial difficulties when feedback loops appear. (In the case of AASL,this perhaps says more about language design than about awk.)

Author

Henry Spencer.


categories: Awk100,Top10,Interpreters,Dsl,Apr,2009,HenryS

Amazing Awk Assembler

Download from

Download from LAWKER.

Description

"aaa" (the Amazing Awk Assembler) is a primitive assembler written entirely in awk and sed. It was done for fun, to establish whether it was possible. It is; it works. It's quite slow, the input syntax is eccentric and rather restricted, and error-checking is virtually nonexistent, but it does work. Furthermore it's very easy to adapt to a new machine, provided the machine falls into the generic "8-bit-micro" category. It is supplied "as is", with no guarantees of any kind. I can't be bothered to do any more work on it right now, but even in its imperfect state it may be useful to someone.

aaa is the mainline shell file.

aux is a subdirectory with machine-independent stuff. Anon, 6801, and 6809 are subdirectories with machine-dependent stuff, choice specified by a -m option (default is "anon"). Actually, even the stuff that is supposedly machine-independent does have some machine-dependent assumptions; notably, it knows that bytes are 8 bits (not serious) and that the byte is the basic unit of instructions (more serious). These would have to change for the 68000 (going to 16-bit "bytes" might be sufficient) and maybe for the 32016 (harder).

aaa thinks that the machine subdirectories and the aux subdirectory are in the current directory, which is almost certainly wrong.

abst is an abstract for a paper. "card", in each machine directory, is a summary card for the slightly-eccentric input language. There is no real manual at present; sorry.

try.s is a sample piece of 6809 input; it is semantic trash, purely for test purposes. The assembler produces try.a, try.defs, and try.x as outputs from "aaa try.s". try.a is an internal file that looks somewhat like an assembly listing. try.defs is another internal file that looks somewhat like a symbol table. These files are preserved because of possible usefulness; tmp[123] are non-preserved temporaries. try.x is the Intel-hex output. try.x.good is identical to try.x and is a saved copy for regression testing of new work.

01pgm.s is a self-programming program for a 68701, based on the one in the Motorola ap note. 01pgm.x.good is another regression-test file.

If your C library (used by awk) has broken "%02x" so it no longer means "two digits of hex, *zero-filled*" (as some SysV libraries have), you will have to fall back from aux/hex to aux/hex.argh, which does it the hard way. Oh yes, you'll note that aaa feeds settings into awk on the command line; don't assume your awk won't do this until you try it.

Author

Henry Spencer


categories: Wp,Awk100,Wp,Tools,Apr,2009,HenryS

awf

The amazingly workable (text) formatter

Synopsis

awf -macros [ file ] ...

Download

Download from LAWKER. Type "make r" to run a regression test, formatting the manual page (awf.1) and comparing it to a preformatted copy (awf.1.out). Type "make install" to install it. Pathnames may need changing.

Description

Awf formats the text from the input file(s) (standard input if none) in an imitation of nroff's style with the -man or -ms macro packages. The -macro option is mandatory and must be `-man' or `-ms'.

Awf is slow and has many restrictions, but does a decent job on most manual pages and simple -ms documents, and isn't subject to AT&T's brain-damaged licensing that denies many System V users any text formatter at all. It is also a text formatter that is simple enough to be tinkered with, for people who want to experiment.

Awf implements the following raw nroff requests:

.\"  .ce  .fi  .in  .ne  .pl  .sp
.ad  .de  .ft  .it  .nf  .po  .ta
.bp  .ds  .ie  .ll  .nr  .ps  .ti
.br  .el  .if  .na  .ns  .rs  .tm

and the following in-text codes:

\$   \%   \*   \c   \f   \n   \s

plus the full list of nroff/troff special characters in the original V7 troff manual.

Many restrictions are present; the behavior in general is a subset of nroff's. Of particular note are the following:

  • Point sizes do not exist; .ps and \s are ignored.
  • Conditionals implement only numeric comparisons on \n(.$, string com- parisons between a macro parameter and a literal, and n (always true) and t (always false).
  • The implementation of strings is generally primitive.
  • Expressions in (e.g.) .sp are fairly general, but the |, &, and : operators do not exist, and the implementation of \w requires that quote (') be used as the delimiter and simply counts the characters inside (so that, e.g., \w'\(bu' equals 4).

White space at the beginning of lines, and imbedded white space within lines, is dealt with properly. Sentence terminators at ends of lines are understood to imply extra space afterward in filled lines. Tabs are implemented crudely and not quite correctly, although in most cases they work as expected. Hyphenation is done only at explicit hyphens, emdashes, and nroff discretionary hyphens.

MAN Macros

The -man macro set implements the full V7 manual macros, plus a few semi- random oddballs. The full list is:

.B   .DT  .IP  .P   .RE  .SM
.BI  .HP  .IR  .PD  .RI  .TH
.BR  .I   .LP  .PP  .RS  .TP
.BY  .IB  .NB  .RB  .SH  .UC

.BY and .NB each take a single string argument (respectively, an indi- cation of authorship and a note about the status of the manual page) and arrange to place it in the page footer.

MS Macros

The -ms macro set is a substantial subset of the V7 manuscript macros. The implemented macros are:

.AB  .CD  .ID  .ND  .QP  .RS  .UL
.AE  .DA  .IP  .NH  .QS  .SH  .UX
.AI  .DE  .LD  .NL  .R   .SM
.AU  .DS  .LG  .PP  .RE  .TL
.B   .I   .LP  .QE  .RP  .TP

Size changes are recognized but ignored, as are .RP and .ND. .UL just prints its argument in italics. .DS/.DE does not do a keep, nor do any of the other macros that normally imply keeps.

Assignments to the header/footer string variables are recognized and implemented, but there is otherwise no control over header/footer formatting. The DY string variable is available. The PD, PI, and LL number registers exist and can be changed.

Output

The only output format supported by awf, in its distributed form, is that appropriate to a dumb terminal, using overprinting for italics (via underlining) and bold. The nroff special characters are printed as some vague approximation (it's sometimes very vague) to their correct appearance.

Awf's knowledge of the output device is established by a device file, which is read before the user's input. It is sought in awf's library directory, first as dev.term (where term is the value of the TERM environment variable) and, failing that, as dev.dumb. The device file uses special internal commands to set up resolution, special characters, fonts, etc., and more normal nroff commands to set up page length etc.

FiLes

All in /usr/lib/awf (this can be overridden by the AWFLIB environment variable):

common     common device-independent initialization
dev.*      device-specific initialization
mac.m*     macro packages
pass1      macro substituter
pass2.base central formatter
pass2.m*   macro-package-specific bits of formatter
pass3      line and page composer

See Also

awk(1), nroff(1), man(7), ms(7)

Diagnostics

Unlike nroff, awf complains whenever it sees unknown commands and macros. All diagnostics (these and some internal ones) appear on standard error at the end of the run.

Author

Written at University of Toronto by Henry Spencer, more or less as a supplement to the C News project.

Copyright

Copyright 1990 University of Toronto. All rights reserved. Written by Henry Spencer. This software is not subject to any license of the American Telephone and Telegraph Company or of the Regents of the University of California.

Permission is granted to anyone to use this software for any purpose on any computer system, and to alter it and redistribute it freely, subject to the following restrictions:

  1. The author is not responsible for the consequences of use of this software, no matter how awful, even if they arise from flaws in it.
  2. The origin of this software must not be misrepresented, either by explicit claim or by omission. Since few users ever read sources, credits must appear in the documentation.
  3. Altered versions must be plainly marked as such, and must not be misrepresented as being the original software. Since few users ever read sources, credits must appear in the documentation.
  4. This notice may not be removed or altered.

Bugs

There are plenty, but what do you expect for a text formatter written entirely in (old) awk?

The -ms stuff has not been checked out very thoroughly.

blog comments powered by Disqus