About awk.info
» table of contents
» featured topics
» page tags
|
|
|
|
|
|
Mar 01: Michael Sanders demos an X-windows GUI for AWK.
Mar 01: Awk100#24: A. Lahm and E. de Rinaldis' patent search, in AWK
Feb 28: Tim Menzies asks this community to write an AWK cookbook.
Feb 28: Arnold Robbins announces a new debugger for GAWK.
Feb 28: Awk100#23: Premysl Janouch offers a IRC bot, In AWK
Feb 28: Updated: the AWK FAQ
Feb 28: Tim Menzies offers a tiny content management system, in Awk.
Jan 31: Comment system added to awk.info. For example, see discussion bottom of ?keys2awk
Jan 31: Martin Cohen shows that Gawk can handle massively long strings (300 million characters).
Jan 31: The AWK FAQ is being updated. For comments/ corrections/ extensions, please mail tim@menzies.us
Jan 31: Martin Cohen finds Awk on the Android platform.
Jan 31: Aleksey Cheusov released a new version of runawk.
Jan 31: Hirofumi Saito contributes a candidate Awk mascot.
Jan 31: Michael Sanders shows how to quickly build an AWK GUI for windows.
Jan 31: Hyung-Hwan Chung offers QSE, an embeddable Awk Interpreter.
The Awk.info Top 10 pages highlights the "best" (most impressive, most insightful, most fun, most visited) pages on this site.
echo Goal | gawk -f story.awk [ -v Grammar=FILE ] [ -v Seed=NUMBER ] echo Goal | gawk -f storyp.awk [ -v Grammar=FILE ] [ -v Seed=NUMBER ]
Download from LAWKER.
This code inputs a set of productions and outputs a string of words that satisfy the production rules.
This page describes two versions of that system: story.awk and storyp.awk. The former selects productions at random with equal probability. The latter allows the user to bias the selection by adding weights at the end of line, after each production.
This grammar..
Sentence -> Nounphrase Verbphrase Nounphrase -> the boy Nounphrase -> the girl Verbphrase -> Verb Modlist Adverb Verb -> runs Verb -> walks Modlist -> Modlist -> very Modlist Adverb -> quickly Adverb -> slowly... and this input ...
for i in 1 2 3 4 5 6 7 8 9 10;do echo Sentence | gawk -f ../story.awk -v Grammar=english.rules -v Seed=$i | fmt done... generates these sentences:
the boy runs very slowly the girl runs slowly the boy runs very slowly the girl walks very very quickly the boy runs quickly the girl walks very very slowly the boy walks very very very very very very quickly the boy walks very quickly the girl runs slowly the girl runs very quickly
Here is Gahan Wilson's sci-fi plot generator ...
Using the above, we can generate the following stories:
Earth scientists invent giant bugs who want Our Women, And Take A Few And Leave Earth is Attacked By tiny lunar superbeings who Under Stand and Are Not radioactive and can not be killed by the Navy but They Die From Catching A Cold Earth scientists invent enormous bugs who are Friendly and and They Get Married And Live Happily Forever After Earth is Struck By A Giant cloud and Magically Saved Earth scientists invent giant bugs who Under Stand and Are Not radioactive and can not be killed by the Air Force so They Kill Us Earth is Attacked By enormous extra Galactic blobs who Under Stand and Are Not radioactive and can be killed by the Air Force Earth scientists discover enormous blobs who Under Stand and Are Not radioactive and can be killed by a Crowd Of Peasants Earth falls Into Sun and Some Resuced Earth is Struck By A Giant comet but Is Saved Earth is Struck By A Giant comet and Is Destroyed
This is generated from the following code:
for i in 1 2 3 4 5 6 7 8 9 10;do echo echo Start | gawk -f ../story.awk -v Grammar=scifi.rules -v Seed=$i | fmt done
running on the following grammar:
Start -> Earth IsStressed IsStressed -> Catestrophes IsStressed -> Science IsStressed -> Attack IsStressed -> Collision Catestrophes -> Catestrophe and PossibleMegaDeath Catestrophe -> burnsUp Catestrophe -> freezes Catestrophe -> fallsIntoSun Collision -> isStruckByAGiant Floater AndThen Floater -> comet Floater -> asteroid Floater -> cloud AndThen -> butIsSaved AndThen -> andIsDestroyed AndThen -> andMagicallySaved PossibleMegaDeath -> everybodyDies PossibleMegaDeath -> Some GoOn SomeSaved -> somePeople SomeSaved -> everybody SomeSaved -> almostEverybody GoOn -> dies GoOn -> Resuced GoOn -> Saved Rescued -> isRescuedBy Sizes Extraterestrial Beings Saved -> butIsSavedBy SomeOne scientists the Science SomeOne -> earth SomeOne -> extraterestrial Science -> scientists DoSomething Sizes Beings Whichetc DoSomething -> invent DoSomething -> discover Attack -> isAttackedBy Sizes Extraterestrial Beings Whichetc Sizes -> tiny Sizes -> giant Sizes -> enormous Extraterestrial -> martian Extraterestrial -> lunar Extraterestrial -> extraGalactic Beings -> bugs Beings -> reptiles Beings -> blobs Beings -> superbeings Whichetc -> who WantSomething WantSomething -> WantWomen WantSomething -> areFriendly and DenoumentOrHappyEnding WantSomething -> UnderStand ButEtc Understand -> areFriendly butMisunderstood Understand -> misunderstandUs Understand -> understandUsAllTooWell Understand -> hungry DenoumentOrHappyEnding -> Denoument DenoumentOrHappyEnding -> HappyEnding Dine -> Hungry and eat us Denoument? WhichEtc -> Hungry -> lookUponUsAsASourceOfNourishment WantWomen -> wantOurWomen, AndTakeAFewAndLeave ButEtc -> AndAre radioactive and TryToKill AndAre -> andAre AndAre -> andAreNot Killers -> Killer Killers -> Killer and Killer Killer -> aCrowdOfPeasants Killer -> theArmy Killer -> theNavy Killer -> theAirForce Killer -> theMarines Killer -> theCoastGuard Killer -> theAtomBomb TryToKill -> can be killed by Killers TryToKill -> can not be killed by Killers SoEtc SoEtc -> butTheyDieFromCatchingACold SoEtc -> soTheyKillUs SoEtc -> soTheyPutUsUnderABenignDictatorShip SoEtc -> soTheyEatUs SoEtc -> soScientistsInventAWeapon Which SeEtc -> but Denoument Which -> whichTurnsThemIntoDisgustingLumps Which -> whichKillsThem Which -> whichFails SoEtc Denomument? -> Denomument? -> Denoument Denoument -> aCuteLittleKidConvincesThemPeopleAreOk Ending Denoument -> aPriestTalksToThemOfGod Ending Denoument -> theyFallInLoveWithThisBeautifulGirl EndSadOrHappy EndSadOrHappy -> Ending EndSadOrHappy -> HappyEnding Ending -> andTheyDie Ending -> andTheyLeave Ending -> andTheyTurnIntoDisgustingLumps HappyEnding -> andTheyGetMarriedAndLiveHappilyForeverAfter
Here is a grammar suitable for storyp.awk. Note that number at end of line that biases how often a production is selected. For example, "runs" and "slowly" are nine times more likely than other Verbs and Adverbs.
Sentence -> Nounphrase Verbphrase 1 Nounphrase -> the boy 0.75 Nounphrase -> the girl 0.25 Verbphrase -> Verb Modlist Adverb 1 Verb -> runs 0.9 Verb -> walks 0.1 Modlist -> 0.5 Modlist -> very Modlist 0.5 Adverb -> quickly 0.1 Adverb -> slowly 0.9The following code executes the biases story generation:
for((i=1;i<=10;i++)); do echo Sentence ; done | gawk -f ../storyp.awk -v Grammar=englishp.rules
This produces the following output. Note that, usually, we run slowly.
the boy runs very slowly the boy runs slowly the girl runs very slowly the boy runs slowly the boy runs slowly the girl walks very slowly the boy walks slowly the girl runs slowly the boy runs slowly the boy runs slowly
BEGIN {
srand(Seed ? Seed : 1)
Grammar = Grammar ? Grammar : "grammar"
while (getline < Grammar > 0)
if ($2 == "->") {
i = ++lhs[$1] # count lhs
rhscnt[$1, i] = NF-2 # how many in rhs
for (j = 3; j <= NF; j++) # record them
rhslist[$1, i, j-2] = $j
} else
if ($0 !~ /^[ \t]*$/)
print "illegal production: " $0
}
{ if ($1 in lhs) { # nonterminal to expand
gen($1)
printf("\n")
} else
print "unknown nonterminal: " $0
}
function gen(sym, i, j) {
if (sym in lhs) { # a nonterminal
i = int(lhs[sym] * rand()) + 1 # random production
for (j = 1; j <= rhscnt[sym, i]; j++) # expand rhs's
gen(rhslist[sym, i, j])
} else {
gsub(/[A-Z]/," &",sym)
printf("%s ", sym) }
}
Storyp.awk is almost the same as story.awk but it is assumed that each line ends in a number that will bias how often that production gets selected.
BEGIN {
srand(Seed ? Seed : 1)
Grammar = Grammar ? Grammar : "grammar"
while ((getline < Grammar) > 0)
if ($2 == "->") {
i = ++lhs[$1] # count lhs
rhsprob[$1, i] = $NF # 0 <= probability <= 1
rhscnt[$1, i] = NF-3 # how many in rhs
for (j = 3; j < NF; j++) # record them
rhslist[$1, i, j-2] = $j
} else
print "illegal production: " $0
for (sym in lhs)
for (i = 2; i <= lhs[sym]; i++)
rhsprob[sym, i] += rhsprob[sym, i-1]
}
{ if ($1 in lhs) { # nonterminal to expand
gen($1)
printf("\n")
} else
print "unknown nonterminal: " $0
}
function gen(sym, i, j) {
if (sym in lhs) { # a nonterminal
j = rand() # random production
for (i = 1; i <= lhs[sym] && j > rhsprob[sym, i]; i++) ;
for (j = 1; j <= rhscnt[sym, i]; j++) # expand rhs's
gen(rhslist[sym, i, j])
} else
printf("%s ", sym)
}
The code comes from Alfred Aho, Brian Kernighan, and Peter Weinberger from the book "The AWK Programming Language", Addison-Wesley, 1988.
The scifi grammar was written by Tim Menzies, 2009, and is based on Gahan Wilson's sci-fi plot generator: "The Science Fiction Horror Movie Pocket Computer" ( in "The Year's Best Science Fiction No. 5", edited by Harry Harrison and Brian Aldiss, Sphere, London, 1972).
Donald 'Paddy' McCarthy has a nice Awk solution to the Monty Hall Problem, which he describes as follow:
It turns out that if the contestant follows a strategy of always switching when asked, then he will maximise his chances of winning. Donald's simulator shows that:
BEGIN {
srand()
doors = 3
iterations = 10000
# Behind a door:
EMPTY = "empty"; PRIZE = "prize"
# Algorithm used
KEEP = "keep"; SWITCH="switch"; RAND="random";
}
function monty_hall( choice, algorithm ) { # Set up doors
for ( i=0; i<doors; i++ ) {
door[i] = EMPTY
}
door[int(rand()*doors)] = PRIZE # One door with prize
chosen = door[choice]
del door[choice]
#if you didn't choose the prize first time around then
# that will be the alternative
alternative = (chosen == PRIZE) ? EMPTY : PRIZE
if( algorithm == KEEP) {
return chosen
}
if( algorithm == SWITCH) {
return alternative
}
return rand() <0.5 ? chosen : alternative
}
function simulate(algo){
prizecount = 0
for(j=0; j< iterations; j++){
if( monty_hall( int(rand()*doors), algo) == PRIZE) {
prizecount ++
}
}
printf " Algorithm %7s: prize count = %i, = %6.2f%%\n", \
algo, prizecount,prizecount*100/iterations
}
BEGIN {
print "\nMonty Hall problem simulation:"
print doors, "doors,", iterations, "iterations.\n"
simulate(KEEP)
simulate(SWITCH)
simulate(RAND)
}
gawk -f montyHall.awk Monty Hall problem simulation: 3 doors, 10000 iterations. Algorithm keep: prize count = 3411, = 34.11% Algorithm switch: prize count = 6655, = 66.55% Algorithm random: prize count = 4991, = 49.91%
echo name | gawk -f gender.awk
Download from LAWKER
The following code predicts gender, given a first name.
This code is an excellent example of rule-based programming in Awk.
For a full description of the code, see
{ sex = "m" } # Assume male.
/^.*[aeiy]$/ { sex = "f" } # Female names endng in a/e/i/y.
/^All?[iy]((ss?)|z)on$/ { sex = "f" } # Allison (and variations)
/^.*een$/ { sex = "f" } # Cathleen, Eileen, Maureen,...
/^[^S].*r[rv]e?y?$/ { sex = "m" } # Barry, Larry, Perry,...
/^[^G].*v[ei]$/ { sex = "m" } # Clive, Dave, Steve,...
/^[^BD].*(b[iy]|y|via)nn?$/ { sex = "f" } # Carolyn,Gwendolyn,Vivian,...
/^[^AJKLMNP][^o][^eit]*([glrsw]ey|lie)$/ { sex = "m" } # Dewey, Stanley, Wesley,...
/^[^GKSW].*(th|lv)(e[rt])?$/ { sex = "f" } # Heather, Ruth, Velvet,...
/^[CGJWZ][^o][^dnt]*y$/ { sex = "m" } # Gregory, Jeremy, Zachary,...
/^.*[Rlr][abo]y$/ { sex = "m" } # Leroy, Murray, Roy,...
/^[AEHJL].*il.*$/ { sex = "f" } # Abigail, Jill, Lillian,...
/^.*[Jj](o|o?[ae]a?n.*)$/ { sex = "f" } # Janet, Jennifer, Joan,...
/^.*[GRguw][ae]y?ne$/ { sex = "m" } # Duane, Eugene, Rene,...
/^[FLM].*ur(.*[^eotuy])?$/ { sex = "f" } # Fleur, Lauren, Muriel,...
/^[CLMQTV].*[^dl][in]c.*[ey]$/ { sex = "m" } # Lance, Quincy, Vince,...
/^M[aei]r[^tv].*([^cklnos]|([^o]n))$/ { sex = "f" } # Margaret, Marylou, Miriam,...
/^.*[ay][dl]e$/ { sex = "m" } # Clyde, Kyle, Pascale,...
/^[^o]*ke$/ { sex = "m" } # Blake, Luke, Mike,...
/^[CKS]h?(ar[^lst]|ry).+$/ { sex = "f" } # Carol, Karen, Sharon,...
/^[PR]e?a([^dfju]|qu)*[lm]$/ { sex = "f" } # Pam, Pearl, Rachel,...
/^.*[Aa]nn.*$/ { sex = "f" } # Annacarol, Leann, Ruthann,...
/^.*[^cio]ag?h$/ { sex = "f" } # Deborah, Leah, Sarah,...
/^[^EK].*[grsz]h?an(ces)?$/ { sex = "f" } # Frances, Megan, Susan,...
/^[^P]*([Hh]e|[Ee][lt])[^s]*[ey].*[^t]$/ { sex = "f" } # Ethel, Helen, Gretchen,...
/^[^EL].*o(rg?|sh?)?(e|ua)$/ { sex = "m" } # George, Joshua, Theodore,..
/^[DP][eo]?[lr].*se$/ { sex = "f" } # Delores, Doris, Precious,...
/^[^JPSWZ].*[denor]n.*y$/ { sex = "m" } # Anthony, Henry, Rodney,...
/^K[^v]*i.*[mns]$/ { sex = "f" } # Karin, Kim, Kristin,...
/^Br[aou][cd].*[ey]$/ { sex = "m" } # Bradley, Brady, Bruce,...
/^[ACGK].*[deinx][^aor]s$/ { sex = "f" } # Agnes, Alexis, Glynis,...
/^[ILW][aeg][^ir]*e$/ { sex = "m" } # Ignace, Lee, Wallace,...
/^[^AGW][iu][gl].*[drt]$/ { sex = "f" } # Juliet, Mildred, Millicent,...
/^[ABEIUY][euz]?[blr][aeiy]$/ { sex = "m" } # Ari, Bela, Ira,...
/^[EGILP][^eu]*i[ds]$/ { sex = "f" } # Iris, Lois, Phyllis,...
/^[ART][^r]*[dhn]e?y$/ { sex = "m" } # Randy, Timothy, Tony,...
/^[BHL].*i.*[rtxz]$/ { sex = "f" } # Beatriz, Bridget, Harriet,...
/^.*oi?[mn]e$/ { sex = "m" } # Antoine, Jerome, Tyrone,...
/^D.*[mnw].*[iy]$/ { sex = "m" } # Danny, Demetri, Dondi,...
/^[^BG](e[rst]|ha)[^il]*e$/ { sex = "m" } # Pete, Serge, Shane,...
/^[ADFGIM][^r]*([bg]e[lr]|il|wn)$/ { sex = "f" } # Angel, Gail, Isabel,...
{ print sex } # Output prediction
TINY TIM is a tiny web-site manager written in AWK. For a live demo of the site, see http://at.ttoy.net/?tinytim. The site supports runtime content generation; e.g. the quote shown top right of the demo site is auto-generated each time you refresh the page.
The site was written to demonstrate that a little AWK goes a long way. At the time of this writing, the current system is under 100 lines of code (excluding a seperate formatter, that is another 170 lines of code). It took longer to write this doco and the various HTML/CSS theme files, than the actual code itself (fyi: 6 hours for the themes/doc and 3 hours for the code).
TINY TIM has the following features:
In a web accessible directory, type
svn export http://knit.googlecode.com/svn/branches/0.2/tinytim/
In the resulting directory, perform the local juju required to make index.cgi web-runnable (e.g. on my ISP, chmod u+rx index.cgi).
Follow the directions in the next section to customize the site.
TINY TIM is controlled by the following index.cgi file. To select a theme, comment out all but one of the last lines (using the "#" character). For a screen-shots of the current themes, see below.
#!/bin/bash
[ -n "$1" ] && export QUERY_STRING="$1"
tinytim() {
cat content/* themes/$1/theme.txt |
gawk -f lib/tinytim.awk |
sed 's/^<pre>/<script type="syntaxhighlighter" class="brush: cpp"><![CDATA[/' |
sed 's/^<\/pre>/<\/script>/'
}
#tinytim auklet
#tinytim trendygreen
tinytim wink
Notes:
Themes are defined in the sub-directory themes/themename. Each theme is defined by a theme.txt file that holds:
To write a new theme:
The following themes are defined in the directory themes.
Auklet:
Trendygreen (adapted from GetTemplates):
Wink:
The first entry in the content defines strings that can slip into the theme templates. For example, the following slots define the title of a site; the name of formatter script that renders each page; the url of the home directory of the site; a menu to add top of each page; a footer to add to the bottom of each page; and a web-accessible directory for storing images.
``title`` Just another Tiny Tim demo
``formatter`` lib/markup.awk
``description`` (simple cms)
``home`` http://at.ttoy.net
``menu`` <a href="?index">Home</a> |
<a href="?contact">Contact</a> |
<a href="?about">About</a>
``footer`` <p>Powered by <a href="?tinytim">TINY TIM</a>.
© 2010 by Tim Menzies
``images`` http://at.ttoy.net/img
Note the following important convention. TINY TIM auto-generates some of its own strings. The names of these strings start with an uppercase letter. To avoid confusion of your strings with those that are auto-generated, it is best to start your strings with a lower-case letter (e.g. like all those in the above example.
Google offers a nice free site-specific search engine. It takes a few days for the spiders to find the site but after that, it works fine. To set this up, follow the instructions at Google custom search, then
For example, look for google-search in the current templates and content/0config.txt.
After the first entry, the rest of the entries in the content/* define the pages of a site. Each entry must begin with the magic string
For example, this site contains a missing page report. This page is defined as follows. In the following definition of that page, the name is "404"; the tags are "Admin Feb10" and the title is "Sorry".
#12345#################################################################################### 404 Admin Feb10 Sorry I have bad news: <center> [img/404book.jpg] </center>
The contents can contain HTML and MARKUP tags.
MARKUP is a shorthand for writing HTML pages based on MARKDOWN:
Also, in MARKUP, major, minor, sub-, and sub-sub- headings are two line paragraphs where the second line contains two or more "=", "-", "+", "_" (respectively). MARKUP collects these headings as a table of contents, which is added to the top of the page.
Note that MARKUP is separate to TINY TIM. To change the formatting of pages, write your own AWK code and change the string ``formatter`` in the first entry of content/0config.txt.
If a `` entry contains a semi-colon (e.g. ``quotes;``) then it is a plugin. Plugin content is generated at runtime. To write a plugin, modify the file lib/plugins.awk. Currently, that file looks like this:
function slotsPlugIns(str,slots, tmp) {
split(str,tmp,";")
if (tmp[1]=="quotes")
return quotes(str,slots)
return str
}
function quotes(str,slots, n,tmp) {
srand(systime() + PROCINFO["pid"])
n=split(slots["quotes"],tmp,"\n")
return tmp[int(rand()*n) + 1]
}
The function slotsPlugIns is a "traffic-cop" who decides what plugin to call (in the above, there is only one current plugin: quotes).
Each plugin function (e.g. quotes) is passed the string from the template (see str) and an array of key/value pairs holding all the defined string values (see slots). These functions must return a string to be inserted into the rendered HTML.
In the example above, quotes just returns a random quote. It assumes that the predefined strings includes a set of quotes, one per line:
``quotes`` Small things with great love. <br>-- Mother Teresa
It's hard work to it look effortless.<br>-- Katarina Witt
"God bless us every one!".<br>-- Tiny Tim
The quote generated by this plug in can be view, top right of this page.
Brian Kernighan has granted permission for this site to host the code from the original Awk book:
The code can be viewed here.
These pages focus on domain-specific languages (a.k.a. "little langauges") written in Awk.
These little languages can range from the simple to the quite intricate. For example, LAWKER contains code for
Interestingly, without comments, the LISP interpreter is only three times longer than the HTML markup language. This comments either on the power of Awk, the regularity of LISP's core semantics, or both.
gawk -f graph.awk graphFile
A processor for a little language, specialized for graph-drawing.
The code inputs data, which includes a specification of a graph The output is data plotted in specified areaFor example, here is an input specification:
label here's some stuff bottom ticks 1 5 10 left ticks 1 2 10 20 range 1 1 10 22 height 10 width 30 1 2 * 2 4 * 3 6 * 4 8 * 7 14 + 8 12 + 9 10 + mb 0.9 11 =
It produces the following output
|----------------------|
20 - = = =
| = = = = |
= = = + + |
10 - + |
| * * |
| * |
2 *---------|------------|
1 5 10
here's some stuff
Set frame dimensions: height and width; offset for x and y axes.
BEGIN {
ht = 24; wid = 80
ox = 6; oy = 2
number = "^[-+]?([0-9]+[.]?[0-9]*|[.][0-9]+)" \
"([eE][-+]?[0-9]+)?$"
}
Skip comments
/^[ \t]*#/ { next }
Simple tags
$1 == "height" { ht = $2; next }
$1 == "width" { wid = $2; next }
$1 == "label" { # for bottom
sub(/^ *label */, "")
botlab = $0
next
}
$1 == "bottom" && $2 == "ticks" { # ticks for x-axis
for (i = 3; i <= NF; i++) bticks[++nb] = $i
next
}
$1 == "left" && $2 == "ticks" { # ticks for y-axis
for (i = 3; i <= NF; i++) lticks[++nl] = $i
next
}
$1 == "range" { # xmin ymin xmax ymax
xmin = $2; ymin = $3; xmax = $4; ymax = $5
next
}
Handling numerics.
$1 ~ number && $2 ~ number { # pair of numbers
nd++ # count number of data points
x[nd] = $1; y[nd] = $2
ch[nd] = $3 # optional plotting character
next
}
$1 ~ number && $2 !~ number { # single number
nd++ # count number of data points
x[nd] = nd; y[nd] = $1; ch[nd] = $2
next
}
Line functions, defined by a slope "m" and a y-intercept "b".
$1 == "mb" { # m b [mark]
expand()
for(i=xmin;i<=xmax;i++) {
nd++; x[nd]=i; y[nd]=$2*i + $3; ch[nd]=$4
}
next;
}
Final case: input error.
{ print "?? line " NR ": ["$0"]" >"/dev/stderr" }
Draw the graph
END { expand(); frame(); ticks(); label(); data(); draw() }
Expand the "x" and "y" boundaries to include all points.
function expand(note) { if (xmin == "") expand1(note) }
function expand1(note) {
xmin = xmax = x[1]
ymin = ymax = y[1]
for (i = 2; i <= nd; i++) {
if (x[i] < xmin) xmin = x[i]
if (x[i] > xmax) xmax = x[i]
if (y[i] < ymin) ymin = y[i]
if (y[i] > ymax) ymax = y[i] }
}
Draw the frame around the graph.
function frame() {
for (i = ox; i < wid; i++) plot(i, oy, "-") # bottom
for (i = ox; i < wid; i++) plot(i, ht-1, "-") # top
for (i = oy; i < ht; i++) plot(ox, i, "|") # left
for (i = oy; i < ht; i++) plot(wid-1, i, "|") # right
}
Create tick marks for both axes.
function ticks( i) {
for (i = 1; i <= nb; i++) {
plot(xscale(bticks[i]), oy, "|")
splot(xscale(bticks[i])-1, 1, bticks[i])
}
for (i = 1; i <= nl; i++) {
plot(ox, yscale(lticks[i]), "-")
splot(0, yscale(lticks[i]), lticks[i])
}
}
Center labels under x-axis.
function label() {
splot(int((wid + ox - length(botlab))/2), 0, botlab)
}
Create data points.
function data( i) {
for (i = 1; i <= nd; i++)
plot(xscale(x[i]),yscale(y[i]),ch[i]=="" ? "*" : ch[i])
for(i in mark) print mark[i]
}
Print graph from array.
function draw( i, j) {
for (i = ht-1; i >= 0; i--) {
for (j = 0; j < wid; j++)
printf((j,i) in array ? array[j,i] : " ")
printf("\n")
}
}
Scale x-values, y-values.
function xscale(x) {
return int((x-xmin)/(xmax-xmin) * (wid-1-ox) + ox + 0.5)
}
function yscale(y) {
return int((y-ymin)/(ymax-ymin) * (ht-1-oy) + oy + 0.5)
}
Put one character into array.
function plot(x, y, c) {
array[x,y] = c
}
Put string "s" into array.
function splot(x, y, s, i, n) {
n = length(s)
for (i = 0; i < n; i++)
array[x+i, y] = substr(s, i+1, 1)
}
This code comes from the original Awk book by Alfred Aho, Peter Weinberger & Brian Kernighan and contains some small modifications by Tim Menzies.
awk [-v profiling=1] -f awklisp [optional-Lisp-source-files]
The -v profiling=1 option turns call-count profiling on.
If you want to use it interactively, be sure to include '-' (for the standard input) among the source files. For example:
gawk -f awklisp startup numbers lists -
This program arose out of one-upmanship. At my previous job I had to use MapBasic, an interpreter so astoundingly slow (around 100 times slower than GWBASIC) that one must wonder if it itself is implemented in an interpreted language. I still wonder, but it clearly could be: a bare-bones Lisp in awk, hacked up in a few hours, ran substantially faster. Since then I've added features and polish, in the hope of taking over the burgeoning market for stately language implementations.
This version tries to deal with as many of the essential issues in interpreter implementation as is reasonable in awk (though most would call this program utterly unreasonable from start to finish, perhaps...). Awk's impoverished control structures put error recovery and tail-call optimization out of reach, in that I can't see a non-painful way to code them. The scope of variables is dynamic because that was easier to implement efficiently. Subject to all those constraints, the language is as Schemely as I could make it: it has a single namespace with uniform evaluation of expressions in the function and argument positions, and the Scheme names for primitives and special forms.
The rest of this file is a reference manual. My favorite tutorial would be The Little LISPer (see section 5, References); don't let the cute name and the cartoons turn you off, because it's a really excellent book with some mind-stretching material towards the end. All of its code will work with awklisp, except for the last two chapters. (You'd be better off learning with a serious Lisp implementation, of course.)
For more details on the implementation, see the Implementation notes (below).
Code:
(define fib
(lambda (n)
(if (< n 2)
1
(+ (fib (- n 1))
(fib (- n 2))))))
(fib 20)
Comamnd line:
gawk -f awklisp startup numbers lists fib.lsp
Output:
10946
Here are the standard ELIZA dialogue patterns:
(define rules
'(((hello)
(How do you do -- please state your problem))
((I want)
(What would it mean if you got -R-)
(Why do you want -R-)
(Suppose you got -R- soon))
((if)
(Do you really think its likely that -R-)
(Do you wish that -R-)
(What do you think about -R-)
(Really-- if -R-))
((I was)
(Were you really?)
(Perhaps I already knew you were -R-)
(Why do you tell me you were -R- now?))
((I am)
(In what way are you -R-)
(Do you want to be -R-))
((because)
(Is that the real reason?)
(What other reasons might there be?)
(Does that reason seem to explain anything else?))
((I feel)
(Do you often feel -R-))
((I felt)
(What other feelings do you have?))
((yes)
(You seem quite positive)
(You are sure)
(I understand))
((no)
(Why not?)
(You are being a bit negative)
(Are you saying no just to be negative?))
((someone)
(Can you be more specific?))
((everyone)
(Surely not everyone)
(Can you think of anyone in particular?)
(Who for example?)
(You are thinking of a special person))
((perhaps)
(You do not seem quite certain))
((are)
(Did you think they might not be -R-)
(Possibly they are -R-))
(()
(Very interesting)
(I am not sure I understand you fully)
(What does that suggest to you?)
(Please continue)
(Go on)
(Do you feel strongly about discussing such things?))))
Command line:
gawk -f awklisp startup numbers lists eliza.lsp -
Interaction:
> (eliza) Hello-- please state your problem > (I feel sick) Do you often feel sick > (I am in love with awk) In what way are you in love with awk > (because it is so easy to use) Is that the real reason? > (I was laughed at by the other kids at space camp) Were you really? > (everyone hates me) Can you think of anyone in particular? > (everyone at space camp) Surely not everyone > (perhaps not tina fey) You do not seem quite certain > (I want her to laugh at me) What would it mean if you got her to laugh at me
Lisp evaluates expressions, which can be simple (atoms) or compound (lists).
An atom is a string of characters, which can be letters, digits, and most punctuation; the characters may -not- include spaces, quotes, parentheses, brackets, '.', '#', or ';' (the comment character). In this Lisp, case is significant ( X is different from x ).
A list is a '(', followed by zero or more objects (each of which is an atom or a list), followed by a ')'.
The special object nil is both an atom and the empty list. That is, nil = (). A non-nil list is called a -pair-, because it is represented by a pair of pointers, one to the first element of the list (its -car-), and one to the rest of the list (its -cdr-). For example, the car of ((a list) of stuff) is (a list), and the cdr is (of stuff). It's also possible to have a pair whose cdr is not a list; the pair with car A and cdr B is printed as (A . B).
That's the syntax of programs and data. Now let's consider their meaning. You can use Lisp like a calculator: type in an expression, and Lisp prints its value. If you type 25, it prints 25. If you type (+ 2 2), it prints 4. In general, Lisp evaluates a particular expression in a particular environment (set of variable bindings) by following this algorithm:
If the procedure's body has more than one expression -- e.g., (lambda () (write 'Hello) (write 'world!)) -- evaluate them each in turn, and return the value of the last one.
We still need the rules for special forms. They are:
It's possible to define new special forms using the macro facility provided in the startup file. The macros defined there are:
(let ((<var> <expr>)...) <body>...)Bind each <var> to its corresponding <expr> (evaluated in the current environment), and evaluate <body> in the resulting environment.
(cond (<test-expr> <result-expr>...)... (else <result-expr>...))where the final else clause is optional. Evaluate each <test-expr> in turn, and for the first non-nil result, evaluate its <result-expr>. If none are non-nil, and there's no else clause, return nil.
(and <expr>...)Evaluate each <expr> in order, until one returns nil; then return nil. If none are nil, return the value of the last <expr>.
(or <expr>...)Evaluate each <expr> in order, until one returns non-nil; return that value. If all are nil, return nil.
Since the code should be self-explanatory to anyone knowledgeable about Lisp implementation, these notes assume you know Lisp but not interpreters. I haven't got around to writing up a complete discussion of everything, though.
The code for an interpreter can be pretty low on redundancy -- this is natural because the whole reason for implementing a new language is to avoid having to code a particular class of programs in a redundant style in the old language. We implement what that class of programs has in common just once, then use it many times. Thus an interpreter has a different style of code, perhaps denser, than a typical application program.
Conceptually, a Lisp datum is a tagged pointer, with the tag giving the datatype and the pointer locating the data. We follow the common practice of encoding the tag into the two lowest-order bits of the pointer. This is especially easy in awk, since arrays with non-consecutive indices are just as efficient as dense ones (so we can use the tagged pointer directly as an index, without having to mask out the tag bits). (But, by the way, mawk accesses negative indices much more slowly than positive ones, as I found out when trying a different encoding.)
This Lisp provides three datatypes: integers, lists, and symbols. (A modern Lisp provides many more.)
For an integer, the tag bits are zero and the pointer bits are simply the numeric value; thus, N is represented by N*4. This choice of the tag value has two advantages. First, we can add and subtract without fiddling with the tags. Second, negative numbers fit right in. (Consider what would happen if N were represented by 1+N*4 instead, and we tried to extract the tag as N%4, where N may be either positive or negative. Because of this problem and the above-mentioned inefficiency of negative indices, all other datatypes are represented by positive numbers.)
The following is from an email discussion; it doesn't develop everything from first principles but is included here in the hope it will be helpful.
Hi. I just took a look at awklisp, and remembered that there's more to your question about why we need a stack -- it's a good question. The real reason is because a stack is accessible to the garbage collector.
We could have had apply() evaluate the arguments itself, and stash the results into variables like arg0 and arg1 -- then the case for ADD would look like
if (proc == ADD) return is(a_number, arg0) + is(a_number, arg1)
The obvious problem with that approach is how to handle calls to user-defined procedures, which could have any number of arguments. Say we're evaluating ((lambda (x) (+ x 1)) 42). (lambda (x) (+ x 1)) is the procedure, and 42 is the argument.
A (wrong) solution could be to evaluate each argument in turn, and bind the corresponding parameter name (like x in this case) to the resulting value (while saving the old value to be restored after we return from the procedure). This is wrong because we must not change the variable bindings until we actually enter the procedure -- for example, with that algorithm ((lambda (x y) y) 1 x) would return 1, when it should return whatever the value of x is in the enclosing environment. (The eval_rands()-type sequence would be: eval the 1, bind x to 1, eval the x -- yielding 1 which is *wrong* -- and bind y to that, then eval the body of the lambda.)
Okay, that's easily fixed -- evaluate all the operands and stash them away somewhere until you're done, and *then* do the bindings. So the question is where to stash them. How about a global array? Like
for (i = 0; arglist != NIL; ++i) {
global_temp[i] = eval(car[arglist])
arglist = cdr[arglist]
}
followed by the equivalent of extend_env(). This will not do, because the global array will get clobbered in recursive calls to eval(). Consider (+ 2 (* 3 4)) -- first we evaluate the arguments to the +, like this: global_temp[0] gets 2, and then global_temp[1] gets the eval of (* 3 4). But in evaluating (* 3 4), global_temp[0] gets set to 3 and global_temp[1] to 4 -- so the original assignment of 2 to global_temp[0] is clobbered before we get a chance to use it. By using a stack[] instead of a global_temp[], we finesse this problem.
You may object that we can solve that by just making the global array local, and that's true; lots of small local arrays may or may not be more efficient than one big global stack, in awk -- we'd have to try it out to see. But the real problem I alluded to at the start of this message is this: the garbage collector has to be able to find all the live references to the car[] and cdr[] arrays. If some of those references are hidden away in local variables of recursive procedures, we're stuck. With the global stack, they're all right there for the gc().
(In C we could use the local-arrays approach by threading a chain of pointers from each one to the next; but awk doesn't have pointers.)
(You may wonder how the code gets away with having a number of local variables holding lisp values, then -- the answer is that in every such case we can be sure the garbage collector can find the values in question from some other source. That's what this comment is about:
# All the interpretation routines have the precondition that their # arguments are protected from garbage collection.
In some cases where the values would not otherwise be guaranteed to be available to the gc, we call protect().)
Oh, there's another reason why apply() doesn't evaluate the arguments itself: it's called by do_apply(), which handles lisp calls like (apply car '((x))) -- where we *don't* want the x to get evaluated by apply().
Roger Rohrbach wrote a Lisp interpreter, in old awk (which has no procedures!), called walk . It can't do as much as this Lisp, but it certainly has greater hack value. Cooler name, too. It's available at http://www-2.cs.cmu.edu/afs/cs/project/ai-repository/ai/lang/lisp/impl/awk/0.html
Eval doesn't check the syntax of expressions. This is a probably-misguided attempt to bump up the speed a bit, that also simplifies some of the code. The macroexpander in the startup file would be the best place to add syntax- checking.
Darius Bacon dairus@wry.me
Copyright (c) 1994, 2001 by Darius Bacon.
Permission is granted to anyone to use this software for any purpose on any computer system, and to redistribute it freely, subject to the following restrictions:
awk -f markdown.awk file.txt > file.html
Download from LAWKER.
(Note: this code was orginally called txt2html.awk by its author but that caused a name clash inside LAWKER. Hence, I've taken the liberty of renamining it. --Timm)
The following code implements a subset of John Gruber's Markdown langauge: a widely-used, ultra light-weight markup language for html.

Level 1 Header =============== Level 2 Header -------------- Level 3 Header ______________
Number of leading "#" codes the heading level:
# Level 1 Header #### Level 4 Header
- List item 1 - List item 2
Note: beginnging and end of list are automatically inferred, maybe not always correctly.
Denoted by a number at start-of-line.
1 A numbered list item
The following code demonstrates a "exception-style" of Awk programming. Note how all the processing relating to each mark-up tag is localized (exception, carrying round prior text and environments). The modularity of the following code should make it easily hackable.
BEGIN {
env = "none";
text = "";
}
/^!\[.+\] *\(.+\)/ {
split($0, a, /\] *\(/);
split(a[1], b, /\[/);
imgtext = b[2];
split(a[2], b, /\)/);
imgaddr = b[1];
print "<p><img src=\"" imgaddr "\" alt=\"" imgtext "\" title=\"\" /></p>\n";
text = "";
next;
}
/\] *\(/ {
do {
na = split($0, a, /\] *\(/);
split(a[1], b, "[");
linktext = b[2];
nc = split(a[2], c, ")");
linkaddr = c[1];
text = text b[1] "<a href=\"" linkaddr "\">" linktext "</a>" c[2];
for(i = 3; i <= nc; i++)
text = text ")" c[i];
for(i = 3; i <= na; i++)
text = text "](" a[i];
$0 = text;;
text = "";
}
while (na > 2);
}
/`/ {
while (match($0, /`/) != 0) {
if (env == "code") {
sub(/`/, "</code>");
env = pcenv;
}
else {
sub(/`/, "<code>");
pcenv = env;
env = "code";
}
}
}
/\*\*/ {
while (match($0, /\*\*/) != 0) {
if (env == "emph") {
sub(//, "</emph>");
env = peenv;
}
else {
sub(/\*\*/, "<emph>");
peenv = env;
env = "emph";
}
}
}
(Plus h3 with underscores.)
/^=+$/ {
print "<h1>" text "</h1>\n";
text = "";
next;
}
/^-+$/ {
print "<h2>" text "</h2>\n";
text = "";
next;
}
/^_+$/ {
print "<h3>" text "</h3>\n";
text = "";
next;
}
/^#/ {
match($0, /#+/);
n = RLENGTH;
if(n > 6)
n = 6;
print "<h" n ">" substr($0, RLENGTH + 1) "</h" n ">\n";
next;
}
/^[*-+]/ {
if (env == "none") {
env = "ul";
print "<ul>";
}
print "<li>" substr($0, 3) "</li>";
text = "";
next;
}
/^[0-9]./ {
if (env == "none") {
env = "ol";
print "<ol>";
}
print "<li>" substr($0, 3) "</li>";
next;
}
/^[ t]*$/ {
if (env != "none") {
if (text)
print text;
text = "";
print "</" env ">\n";
env = "none";
}
if (text)
print "<p>" text "</p>\n";
text = "";
next;
}
// {
text = text $0;
}
END {
if (env != "none") {
if (text)
print text;
text = "";
print "</" env ">\n";
env = "none";
}
if (text)
print "<p>" text "</p>\n";
text = "";
}
Does not implement the full Markdown syntax.
Jesus Galan (yiyus) 2006
<yiyu DOT jgl AT gmail DOT com>
gawk -f awkpp file-name-of-awk++-programThis command is platform independent and sends the translated program to standard output (stdout). See Running awk++ for variations.
This is an updated revision (#21), released August 1, 2009. In this new version:
Download awkpp21.zip from LAWKER
Awk++ is a preprocessor, that is it reads in a program written in the awk++ language and outputs a new program. However, it's different than awka. The output from the awk++ preprocessor is awk code, not C or an executable program. So, some version of AWK, such as awk or gawk, has to be used to run the preprocessed program. awka can be used, in a second step, to turn the preprocessed awk++ program into an executable, if desired.
The awk++ language provides object oriented programming for AWK that includes:
Awk++ adds new keywords to standard Awk:
a = class1.new[(optional parameters)] *** similar to Ruby
b = a.get("aProperty")
a.delete
class class1 {
property aProperty
method new([optional parameters]) {
# put initialization stuff here
}
method get(propName) {
if(propName = "aProperty")
return aProperty ### Note the use of 'return'. It behaves
### exactly the same as in an AWK function.
}
}
To define a class (similar to C++ but no public/private):
class class_name {.....}
To define a class with inheritance:
class class_name : inherited_class_name [ : inherited_class_name...] {.....}
To add local/private variables (persistent variables; syntax is unique to awk++):
class class_name {
attribute|attr|property|prop|element|elem|variable|var variable_name
..... }
To help programmers who are used to other OO languages, "attribute", "property", "element", and "variable", along with their 4-letter abbreviations, are interchangeable.
Note: these persistent variables cannot be accessed directly. The programmer must define method(s) to return them, if their values are to be made available to code that's outside the class.
To add methods
class class_name {
attribute variable_name1
method method_name(parameters) {
...any awk code....
}
..other method definitions...
}
To create an object
object_variable = class_name.new[(optional parameters)](runs the method named "new", if it exists; returns the object ID)
To call an object method
object_variable.method_name(parameters)
The dot isn't used for concatenation in awk/gawk, so it's a natural choice for the separator between the object and method.
To reclaim the memory used by an object, use the delete method, i.e.:
object_variable.delete
but don't define delete() in your classes. awk++ recognizes delete() as a special method and will take care of deleting the object. Deleting objects is only necessary, though, if they hold a lot of data. Overhead for objects themselves is insignificant.
OO syntax goals:
The OO syntax is based partly on C++, partly on Javascript, partly on Ruby and partly on the book "The Object-Oriented Thought Process". It isn't lifted in toto from one langauage because other languages provide features that gawk can't accomplish or have syntax that is hard to parse.
In awk++, if a method is called that isn't in the object's class and there are inherited classes (superclasses) specified, the inherited classes are called in left to right order until one of them returns a value. That value becomes the result of the method call. This is the way awk++ resolves the diamond problem. As a programmer, you control the sequence in which superclasses are called by the left to right order of the list of inherited classes in the class definition.
There are two important things to note.
Calls to undefined methods do nothing and return nothing, silently.
The command to preprocess an awk++ program looks like this:
gawk -f awkpp file-name-of-awk++-programor, if the "she-bang" line (line 1 in awkpp) has the right path to gawk, and awkpp is executable and in a directory in PATH,
awkpp file-name-of-awk++-programTo run the output program immediately,
gawk -f awkpp -r file-name-of-awk++-program [awk options] data-files-to-be-processedor
awkpp -r file-name-of-awk++-program [awk options] data-files-to-be-processedWhen running an awk++ program immediately, standard input (stdin) cannot be used for data. One or more data file paths must be listed on the command line.
There is a bug in the standard AWK distributions that affects the preprocessor. Additionally, the preprocessor uses the 3rd array option of the match() function. So, it's best to use GAWK to run the preprocessor.
On the other hand, the AWK code created by translating awk++ is intended to work with all versions of AWK. If you find otherwise, please notify the developer(s).
Copyright (c) 2008, 2009 Jim Hart, jhart@mail.avcnet.org All rights reserved. The awk++ code is licensed under the GNU Public license (GPL) any version. awk++ documentation, including this page, may be copied only in unmodified form, subject to fair use guidelines.
These pages are focused on Functional Gawk (a.k.a. "Funky").
Funky is enabled by a new feature added to Gawk 3.2: indirect functions. For example:
function foo() { print "foo" }
function bar() { print "bar" }
BEGIN {
the_func = "foo"
@the_func() # calls foo()
the_func = "bar"
@the_func() # calls bar()
}
At the time of this writing, Gawk 3.2 is pre-release and indirect functions can be accessed using the gawk-devel CVS tree:
cvs -d:pserver:anonymous@cvs.sv.gnu.org:/sources/gawk co gawk-devel
Indirect functions enable a new view on library management in Gawk and, perhaps, a way to emulate functional abstraction in languages like Lisp.
So, anyone care to try, say:
In this exchange from comp.lang.awk, Jason Quinn discusses his super-for loop trick. Arnold Robbins then chimes in to say that, with indirect functions, super-for loops could become a generic tool.
Jason Quinn writes:
#shows an example of a superfor loop
BEGIN {
#define loop maximums
loopmax[1]=4
loopmax[2]=6
loopmax[3]=8
loopmax[4]=10
loopmax[5]=12
loopmax[6]=20
#call the loop
superfor(6)
}
function superfor(loopdepth, zz) { # zz is a local variable
currloopnum++
#start of prologue
#end of prologue
for(loopcounter[currloopnum]=1;
loopcounter[currloopnum]<=loopmax[currloopnum];
loopcounter[currloopnum]++) {
if ( loopdepth==1 ) {
#start of superfor body
for (zz=1;zz<=currloopnum;zz++) {
printf loopcounter[zz] FS
}
print ""
#end of superfor body
}
else if ( loopdepth>1 )
superfor(loopdepth-1)
}
#start of epilog
#end of epilog
loopdepth++ ; currloopnum--
}
Arnold Robbins replies:
function superfor(loopdepth, prologue, body, epilogue, zz)
{
currloopnum++
@prologue()
for(loopcounter[currloopnum]=1;
loopcounter[currloopnum]<=loopmax [currloopnum];
loopcounter[currloopnum]++) {
if ( loopdepth==1 ) {
@body()
}
else if ( loopdepth>1 )
superfor(loopdepth-1, proloogue,
body, epilogue)
}
@epilogue()
loopdepth++ ; currloopnum--
}
all( fun, array [,max]
collect( fun, array1, array2 [,max])
select( fun, array1, array2 [,max])
reject( fun, array1, array2 [,max])
detect( fun, array [,max])
inject( fun, array, carry [,max])
All these functions return the size of array or array2
An interesting new feature in Gawk 3.1.7 is indirect functions. This allows the function name to be a variable, passed as an argument to an array, and called using the syntax
@fun(arg1,arg2,...)
This enables a new kind of funcational programming style in Gawk. For example, generic enumeration patterns can be coded once, then called many different ways with different function names passed as arguments.
This document illustrates this style of programming.
For example, here are some standard enumeration functions:
Applies the function fun to all items in the array. If called with the max argument, then they are iterated in the order i=1 .. max, otherwise we use for(i in a).
Applies fun to each item in array1 and collects the results in array2.
Find all the items in array1 that satisfies fun and add them to array2.
Find all the items in array1 that do not satisfy fun and add them to array2.
Return the first item found in array that satisfies fun. If no such item is found, then return the magic global value Fail.
(This one is a little tricky.) The result of applying fun to each item in array is carried into the processing of the next item. Initially, the carried value is carry. This function returns the final carry.
To illusrate the above, consider the following functions. Each of these are defined for one array item.
function odd(x) { return (x % 2) == 1 }
function show(x) { print "[" x "]" }
function mult(x,y) { return x * y }
function halve(x) { return x/2 }
function do_all( arr) {
split("22 23 24 25 26 27 28",arr)
all("show",arr)
}
When we run this ...
eg/enum1
gawk317="$HOME/opt/gawk/bin/gawk"
$gawk317 -f ../enumerate.awk --source 'BEGIN { do_all() }'
we see every item in arr printed using the above show function ...
eg/enum1.out
[25] [26] [27] [28] [22] [23] [24]
function do_collect( max,arr1,arr2,i) {
max=split("22 23 24 25 26 27 28",arr1)
collect("halve",arr1,arr2,max)
for(i=1;i<=max;i++) print arr2[i]
}
When we run this ...
eg/enum2
gawk317="$HOME/opt/gawk/bin/gawk"
$gawk317 -f ../enumerate.awk --source 'BEGIN { do_collect() }'
we see every item in arr divided in two ...
eg/enum2.out
11 11.5 12 12.5 13 13.5 14
function do_select( all,less,arr1,arr2,i) {
all = split("22 23 24 25 26 27 28",arr1)
less = select("odd",arr1,arr2,all)
for(i=1;i<=less;i++) print arr2[i]
}
When we run this ...
eg/enum3
gawk317="$HOME/opt/gawk/bin/gawk"
$gawk317 -f ../enumerate.awk --source 'BEGIN { do_select() }'
we see every item in arr that satisfies odd....
eg/enum3.out
23 25 27
function do_reject( all,less,arr1,arr2,i) {
all = split("22 23 24 25 26 27 28",arr1)
less = reject("odd",arr1,arr2,all)
for(i=1;i<=less;i++) print arr2[i]
}
When we run this ...
eg/enum4
gawk317="$HOME/opt/gawk/bin/gawk"
$gawk317 -f ../enumerate.awk --source 'BEGIN { do_reject() }'
we see every item in arr that do not satisfies odd....
eg/enum4.out
22 24 26 28
function do_detect( all,arr1) {
all = split("22 23 24 25 26 27 28",arr1)
print detect("odd",arr1,all)
}
When we run this ...
eg/enum5
gawk317="$HOME/opt/gawk/bin/gawk"
$gawk317 -f ../enumerate.awk --source 'BEGIN { do_detect() }'
we see the first item in arr that satisfies odd....
eg/enum5.out
23
function do_inject( all,less,arr1,arr2,i) {
split("1 2 3 4 5",arr1)
print inject("mult",arr1,1)
}
When we run this ...
eg/enum6
gawk317="$HOME/opt/gawk/bin/gawk"
$gawk317 -f ../enumerate.awk --source 'BEGIN { do_inject() }'
we see every the result of multiplying every item in arr by its predecessor.
eg/enum6.out
120
Note one design principle in the following: any newly generated arrays have indexes 1..max where max is the number of elements in that array.
function all (fun,a,max, i) {
if (max)
for(i=1;i<=max;i++) @fun(a[i])
else
for(i in a) @fun(a[i])
}
function collect (fun,a,b,max, i) {
if (max)
for(i=1;i<=max;i++) {n++; b[i]= @fun(a[i]) }
else
for(i in a) {n++; b[i]= @fun(a[i])}
return n
}
function select (fun,a,b,max, i,n) {
if (max)
for(i=1;i<=max;i++) {
if (@fun(a[i])) {n++; b[n]= a[i] }}
else
for(i in a) {
if (@fun(a[i])) {n++; b[n]= a[i] }}
return n
}
function reject (fun,a,b,max, i,n) {
if (max)
for(i=1;i<=max;i++) {
if (! @fun(a[i])) {n++; b[n]= a[i] }}
else
for(i in a) {
if (! @fun(a[i])) {n++; b[n]= a[i] }}
return n
}
BEGIN {Fail="someUnLIKELYSymbol"}
function detect (fun,a,max, i) {
if (max)
for(i=1;i<=max;i++) {
if (@fun(a[i])) return a[i] }
else
for(i in a) {
if (@fun(a[i])) return a[i] }
return Fail
}
function inject (fun,a,carry,max, i) {
if (max)
for(i=1;i<=max;i++)
carry = @fun(a[i],carry)
else
for(i in a)
carry = @fun(a[i],carry)
return carry
}
The above code does not pass around any state information that the fum functions can use. So all their deliberations are either with the current array values (integers or strings) or with global state. It might be worthwhile writing new versions of the above with one more argument, to carry that sate.
These pages focus on macro pre-processors (a natural application for Awk).
These pages focus on tools for larger Gawk programs; e.g. ways to load multiple files or auto-generate documentation straight from the source code.
(Note: see recent update.)
Download from LAWKER or a tar file or from SourceForge.
runawk - wrapper for AWK interpreter
runawk [options] program_file
runawk -e program
After years of using AWK for programming I've found that despite of its simplicity and limitations AWK is good enough for scripting a wide range of different tasks. AWK is not as poweful as their bigger counterparts like Perl, Ruby, TCL and others but it has their own advantages like compactness, simplicity and availability on almost all UNIX-like systems. I personally also like its data-driven nature and token orientation, very useful technique for simple text processing utilities.
But! Unfortunately awk interpreters lacks some important features and sometimes work not as good as it whould be.
Problems I see (some of them, of course)
AWK lacks support for modules. Even if I create small programs, I often want to use the functions created earlier and already used in other scripts. That is, it whould great to orginise functions into so called libraries (modules).
In order to pass arguments to #!/usr/bin/awk -f script (not to awk
interpreter), it is necessary to prepand a list of
arguments with -- (two minus signes). In my view, this looks badly.
Example:
awk_program:
#!/usr/bin/awk -f
BEGIN {
for (i=1; i < ARGC; ++i){
printf "ARGV [%d]=%s\n", i, ARGV [i]
}
}
Shell session:
% awk_program --opt1 --opt2
/usr/bin/awk: unknown option --opt1 ignored
/usr/bin/awk: unknown option --opt2 ignored
% awk_program -- --opt1 --opt2
ARGV [1]=--opt1
ARGV [2]=--opt2
%
In my opinion awk_program script should work like this
% awk_program --opt1 --opt2
ARGV [1]=--opt1
ARGV [2]=--opt2
%
It is possible using runawk.
When #!/usr/bin/awk -f script handles arguments (options) and wants
to read from stdin, it is necessary to add
/dev/stdin (or `-') as a last argument explicitly.
Example:
awk_program:
#!/usr/bin/awk -f
BEGIN {
if (ARGV [1] == "--flag"){
flag = 1
ARGV [1] = "" # to not read file named "--flag"
}
}
{
print "flag=" flag " $0=" $0
}
Shell session:
% echo test | awk_program -- --flag
% echo test | awk_program -- --flag /dev/stdin
flag=1 $0=test
%
Ideally awk_program should work like this
% echo test | awk_program --flag
flag=1 $0=test
%
runawk was created to solve all these problems
Display help information.
Display version information.
Turn on a debugging mode in which runawk prints argument list with which real awk interpreter will be run.
Always add stdin file name to a list of awk arguments
Do not add stdin file name to a list of awk arguments
Specify program. If -e is not specified program is read from program_file.
Under UNIX-like OS-es you can use runawk by beginning your script with
#!/usr/local/bin/runawk
line or something like this instead of
#!/usr/bin/awk -f
or similar.
In order to activate modules you should add them into awk script like this
#use "module1.awk" #use "module2.awk"
that is the line that specifies module name is treated as a comment line by normal AWK interpreter but is processed by runawk especially.
Note that #use should begin with column 0, no spaces are allowed before it and no spaces are allowed between # and use.
Also note that AWK modules can also "use" another modules and so forth. All them are collected in a depth-first order and each one is added to the list of awk interpreter arguments prepanded with -f option. That is #use directive is *NOT* similar to #include in C programming language, runawk's module code is not inserted into the place of #use. Runawk's modules are closer to Perl's "use" command. In case some module is mentioned more than once, only one -f will be added for it, i.e duplications are removed automatically.
Position of #use directive in a source file does matter, i.e. the earlier module is mentioned, the earlier -f will be generated for it.
Example:
file prog:
#!/usr/local/bin/runawk
#use "A.awk"
#use "B.awk"
#use "E.awk"
PROG code
...
file B.awk:
#use "A.awk"
#use "C.awk"
B code
...
file C.awk:
#use "A.awk"
#use "D.awk"
C code
...
A.awk and D.awk don't contain #use directive.
If you run
runawk prog file1 file2
or
/path/to/prog file1 file2
the following command
awk -f A.awk -f D.awk -f C.awk -f B.awk -f E.awk -f prog -- file1 file2
will actually run.
You can check this by running
runawk -d prog file1 file2
Modules are first searched in a directory where main program (or module in which #use directive is specified) is placed. If it is not found there, then AWKPATH environment variable is checked. AWKPATH keeps a colon separated list of search directories. Finally, module is searched in system runawk modules directory, by default PREFIX/share/runawk but this can be changed at build time.
An absolute path of the module can also be specified.
In order to pass arguments to AWK script correctly, runawk treats their arguments beginning with `-' sign (minus) especially. The following command
runawk prog2 -x -f=file -o=output file1 file2
or
/path/to/prog2 -x -f=file -o=output file1 file2
will actually run
awk -f prog2 -- -x -f=file -o=output file1 file2
therefore -s, -f, -o options will be passed to ARGV/ARGC awk's variables together with file1 and file2. If all arguments begin with `-' (minus), runawk will add stdin filename to the end of argument list, (unless -I option is specified) i.e. running
runawk prog3 --value=value
or
/path/to/prog3 --value=value
will actually run the following
awk -f prog3 -- --value=value /dev/stdin
Like some other interpreters runawk can obtain the script from a command line like this
/path/to/runawk -e '
#use "alt_assert.awk"
{
assert($1 >= 0 && $1 <= 10, "Bad value: " $1)
# your code below
...
}'
For some reason you may prefer one AWK interpreter or another with a help of #interp command like this
file prog:
#!/usr/local/bin/runawk
#use "A.awk"
#use "B.awk"
#interp "/usr/pkg/bin/nbawk"
# your code here
...
The reason may be efficiency for a particular task, useful but not standard extensions or enything else.
Note that #interp directive should also begin with column 0, no spaces are allowed before it and between # and interp.
In some cases you may want to run AWK interpreter with a specific environment. For example, your script may be oriented to process ASCII text only. In this case you can run AWK with LC_CTYPE=C environment and use regexp ranges.
runawk provides #env directive for this. Strings inside double quotes is passed to putenv(3) libc function.
Example:
file prog:
#!/usr/local/bin/runawk
#env "LC_ALL=C"
$1 ~ /^[A-Z]+$/ { # A-Z is valid if LC_CTYPE=C
print $1
}
If AWK interpreter exits normally, runawk exits with its exit status. If AWK interpreter was killed by signal, runawk exits with exit status 128+signal.
Colon separated list of directories where awk modules are searched.
Sets the path to the AWK interpreter, used by default, i.e. this variable overrides the compile-time default. Note that #interp directive overrides this.
Copyright (c) 2007-2008 Aleksey Cheusov <vle@gmx.net>
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
Please send any comments, questions, bug reports etc. to me by e-mail or (even better) register them at sourceforge project home. Feature requests are also welcomed.
awk -f m1.awk [file...]
Download from LAWKER.
M1 is a simple macro language that supports the essential operations of defining strings and replacing strings in text by their definitions. It also provides facilities for file inclusion and for conditional expan- sion of text. It is not designed for any particular application, so it is mildly useful across several applications, including document preparation and programming. This paper describes the evolution of the program; the final version is implemented in about 110 lines of Awk.
M1 copies its input file(s) to its output unchanged except as modified by certain "macro expressions." The following lines define macros for subsequent processing:
@comment Any text @@ same as @comment @define name value @default name value set if name undefined @include filename @if varname include subsequent text if varname != 0 @unless varname include subsequent text if varname == 0 @fi terminate @if or @unless @ignore DELIM ignore input until line that begins with DELIM @stderr stuff send diagnostics to standard error
A definition may extend across many lines by ending each line with a backslash, thus quoting the following newline.
Any occurrence of @name@ in the input is replaced in the output by the corresponding value.
@name at beginning of line is treated the same as @name@.
We'll start with a toy example that illustrates some simple uses of m1. Here's a form letter that I've often been tempted to use:
@default MYNAME Jon Bentley
@default TASK respond to your special offer
@default EXCUSE the dog ate my homework
Dear @NAME@:
Although I would dearly love to @TASK@,
I am afraid that I am unable to do so because @EXCUSE@.
I am sure that you have been in this situation
many times yourself.
Sincerely,
@MYNAME@
If that file is namedsayno.mac, it might be invoked with this text:
@define NAME Mr. Smith @define TASK subscribe to your magazine @define EXCUSE I suddenly forgot how to read
Recall that a @default takes effect only if its variable was not previously @defined.
I've found m1 to be a handy Troff preprocessor. Many of my text files (including this one) start with m1 definitions like:
@define ArrayFig @StructureSec@.2 @define HashTabFig @StructureSec@.3 @define TreeFig @StructureSec@.4 @define ProblemSize 100
Even a simple form of arithmetic would be useful in numeric sequences of definitions. The longer m1 variables get around Troff's dreadful two-character limit on string names; these variables are also avail- able to Troff preprocessors like Pic and Eqn. Various forms of the @define, @if, and @include facilities are present in some of the Troff-family languages (Pic and Troff) but not others (Tbl); m1 provides a consistent mechanism.
I include figures in documents with lines like this:
@define FIGNUM @FIGMFMOVIE@ @define FIGTITLE The Multiple Fragment heuristic. @FIGSTART@ <PS> <@THISDIR@/mfmovie.pic</PS> @FIGEND@
The two @defines are a hack to supply the two parameters of number and title to the figure. The figure might be set off by horizontal lines or enclosed in a box, the number and title might be printed at the top or the bottom, and the figures might be graphs, pictures, or animations of algorithms. All figures, though, are presented in the consistent format defined by FIGSTART and FIGEND.
I have also used m1 as a preprocessor for Awk programs. The @include statement allows one to build simple libraries of Awk functions (though some- but not all- Awk implementations provide this facility by allowing multiple program files). File inclusion was used in an earlier version of this paper to include individual functions in the text and then wrap them all together into the completem1 program. The conditional statements allow one to customize a program with macros rather than run-time if statements, which can reduce both run time and compile time.
The most interesting application for which I've used this macro language is unfortunately too complicated to describe in detail. The job for which I wrote the original version of m1 was to control a set of experiments. The experiments were described in a language with a lexical structure that forced me to make substitutions inside text strings; that was the original reason that substitutions are bracketed by at-signs. The experiments are currently controlled by text files that contain descriptions in the experiment language, data extraction programs written in Awk, and graphical displays of data written in Grap; all the programs are tailored bym1commands.
Most experiments are driven by short files that set a few keys parameters and then@includea large file with many @defaults. Separate files describe the fields of shared databases:
@define N ($1) @define NODES ($2) @define CPU ($3) ...
These files are @included in both the experiment files and in Troff files that display data from the databases. I had tried to conduct a similar set of experiments before I built m1, and got mired in muck. The few hours I spent building the tool were paid back handsomely in the first days I used it.
M1 uses as fast substitution function. The idea is to process the string from left to right, searching for the first substitution to be made. We then make the substitution, and rescan the string starting at the fresh text. We implement this idea by keeping two strings: the text processed so far is in L (for Left), and unprocessed text is in R (for Right). Here is the pseudocode for dosubs:
L = Empty R = Input String while R contains an "@" sign do let R = A @ B; set L = L A and R = B if R contains no "@" then L = L "@" break let R = A @ B; set M = A and R = B if M is in SymTab then R = SymTab[M] R else L = L "@" M R = "@" R return L R
There are many ways in which them1program could be extended. Here are some of the biggest temptations to "creeping creaturism":
The following code is short (around 100 lines), which is significantly shorter than other macro processors; see, for instance, Chapter 8 of Kernighan and Plauger [1981]. The program uses several techniques that can be applied in many Awk programs.
function error(s) {
print "m1 error: " s | "cat 1>&2"; exit 1
}
function dofile(fname, savefile, savebuffer, newstring) {
if (fname in activefiles)
error("recursively reading file: " fname)
activefiles[fname] = 1
savefile = file; file = fname
savebuffer = buffer; buffer = ""
while (readline() != EOF) {
if (index($0, "@") == 0) {
print $0
} else if (/^@define[ \t]/) {
dodef()
} else if (/^@default[ \t]/) {
if (!($2 in symtab))
dodef()
} else if (/^@include[ \t]/) {
if (NF != 2) error("bad include line")
dofile(dosubs($2))
} else if (/^@if[ \t]/) {
if (NF != 2) error("bad if line")
if (!($2 in symtab) || symtab[$2] == 0)
gobble()
} else if (/^@unless[ \t]/) {
if (NF != 2) error("bad unless line")
if (($2 in symtab) && symtab[$2] != 0)
gobble()
} else if (/^@fi([ \t]?|$)/) { # Could do error checking here
} else if (/^@stderr[ \t]?/) {
print substr($0, 9) | "cat 1>&2"
} else if (/^@(comment|@)[ \t]?/) {
} else if (/^@ignore[ \t]/) { # Dump input until $2
delim = $2
l = length(delim)
while (readline() != EOF)
if (substr($0, 1, l) == delim)
break
} else {
newstring = dosubs($0)
if ($0 == newstring || index(newstring, "@") == 0)
print newstring
else
buffer = newstring "\n" buffer
}
}
close(fname)
delete activefiles[fname]
file = savefile
buffer = savebuffer
}
Put next input line into global string "buffer". Return "EOF" or "" (null string).
function readline( i, status) {
status = ""
if (buffer != "") {
i = index(buffer, "\n")
$0 = substr(buffer, 1, i-1)
buffer = substr(buffer, i+1)
} else {
# Hume: special case for non v10: if (file == "/dev/stdin")
if (getline <file <= 0)
status = EOF
}
# Hack: allow @Mname at start of line w/o closing @
if ($0 ~ /^@[A-Z][a-zA-Z0-9]*[ \t]*$/)
sub(/[ \t]*$/, "@")
return status
}
function gobble( ifdepth) {
ifdepth = 1
while (readline() != EOF) {
if (/^@(if|unless)[ \t]/)
ifdepth++
if (/^@fi[ \t]?/ && --ifdepth <= 0)
break
}
}
function dosubs(s, l, r, i, m) {
if (index(s, "@") == 0)
return s
l = "" # Left of current pos; ready for output
r = s # Right of current; unexamined at this time
while ((i = index(r, "@")) != 0) {
l = l substr(r, 1, i-1)
r = substr(r, i+1) # Currently scanning @
i = index(r, "@")
if (i == 0) {
l = l "@"
break
}
m = substr(r, 1, i-1)
r = substr(r, i+1)
if (m in symtab) {
r = symtab[m] r
} else {
l = l "@" m
r = "@" r
}
}
return l r
}
function dodef(fname, str, x) {
name = $2
sub(/^[ \t]*[^ \t]+[ \t]+[^ \t]+[ \t]*/, "") # OLD BUG: last * was +
str = $0
while (str ~ /\\$/) {
if (readline() == EOF)
error("EOF inside definition")
# OLD BUG: sub(/\\$/, "\n" $0, str)
x = $0
sub(/^[ \t]+/, "", x)
str = substr(str, 1, length(str)-1) "\n" x
}
symtab[name] = str
}
BEGIN {
EOF = "EOF"
if (ARGC == 1)
dofile("/dev/stdin")
else if (ARGC >= 2) {
for (i = 1; i < ARGC; i++)
dofile(ARGV[i])
} else
error("usage: m1 [fname...]")
}
M1 is three steps lower than m4. You'll probably miss something you have learned to expect.
M1 was documented in the 1997 sedawk book by Dale Dougherty & Arnold Robbins (ISBN 1-56592-225-5) but may have been written earlier.
This page was adapted from 131.191.66.141:8181/UNIX_BS/sedawk/examples/ch13/m1.pdf (download from LAWKER).
Jon L. Bentley.
Download from LAWKER.
m5 [ -Dname ] [ -Dname=def ] [-c] [ -dp char ]
[ -o file ] [-sp char ] [ file ... ]
[g|n]awk -f m5.awk X [ -Dname ] [ -Dname=def ] [-c] [ -dp char ]
[ -o file ] [ -sp char ] [ file ... ]
M5 is a Bourne shell script for invoking m5.awk, which actu- ally performs the macro processing. m5, unlike many macroprocessors, does not directly interpret its input. Instead it uses a two-pass approach in which the first pass translates the input to an awk program, and the second pass executes the awk program to produce the final output. Details of usage are provided below.
This two pass sytem means that macros can contain awk commands, to be executed on the second pass. This greatly extends the expressability of the m5 macro system.
As noted in the synopsis above, its invocation may require specification of awk, gawk, or nawk, depending on the ver- sion of awk available on your system. This choice is further complicated on some systems, e.g. Sun, which have both awk (original awk) and nawk (new awk). Other systems appear to have new awk, but have named it just awk. New awk should be used, regardless of what it has been named. The macro processor translator will not work using original awk because the former, for example, uses the built-in function match().
The following options are supported:
The program that performs the first pass noted above is called the m5 translator and is named m5.awk. The input to the translator may be either standard input or one or more files listed on the command line. An input line with the directive prefix character (# by default) in column 1 is treated as a directive statement in the MP directive language (awk). All other input lines are processed as text lines. Simple macros are created using awk assignment statements and their values referenced using the substitu- tion prefix character ($ by default). The backslash (\) is the escape character; its presence forces the next character to literally appear in the output. This is most useful when forcing the appearance of the directive prefix character, the substitution prefix character, and the escape character itself.
All input lines are scanned for macro references that are indicated by the substitution prefix character. Assuming the default value of that character, macro references may be of the form $var, $(var), $(expr), $[str], $var[expr], or $func(args). These are replaced by an awk variable, awk variable, awk expression, awk array reference to the special array M[], regular awk array reference, or awk function call, respectively. These are, in effect, macros. The MP translator checks for proper nesting of parentheses and dou- ble quotes when translating $(expr) and $func(args) macros, and checks for proper nesting of square brackets and double quotes when translating $[expr] and $var[expr] macros. The substitution prefix character indicates a a macro reference unless it is (i) escaped (e.g., \$abc), (ii) followed by a character other than A-Z, a-z, (, or [ (e.g., $@), or (iii) inside a macro reference (e.g., $($abc); probably an error).
An understanding of the implementation of macro substitution will help in its proper usage. When a text line is encoun- tered, it is scanned for macros, embedded in an awk print statement, and copied to the output program. For example, the input line
The quick $fox jumped over the lazy $dog.
is transformed into
print "The quick " fox " jumped over the lazy " dog "."
Obviously the use of this transformation technique relies completely on the presence of the awk concatenation operator (one or more blanks).
As already noted, a macro reference inside another macro reference will not result in substitution and will probably cause an awk execution-time error. Furthermore, a substitution prefix character in the substituted string is also generally not significant because the substitution pre- fix character is detected at translation time, and macro values are assigned at execution time. However, macro references of the form $[expr] provide a simple nested referencing capability. For example, if $[abc] is in a text line, or in a directive line and not on the left hand side of an assignment statement, it is replaced by eval(M["abc"])/. When the output program is executed, the m5 runtime routine eval()/ substitutes the value of M["abc"] examining it for further macro references of the form $[str] (where "str" denotes an arbitrary string). If one is found, substitution and scanning proceed recursively. Function type macro references may result in references to other mac- ros, thus providing an additional form of nested referenc- ing.
Except for the include directive, when a directive line is detected, the directive prefix is removed, the line is scanned for macros, and then the line is copied to the out- put program (as distinct from the final output). Any valid awk construct, including the function statement, is allowed in a directive line. Further information on writing awk programs may be found in Aho, Kernighan, and Weinberger, Dougherty and Robbins, and Robbins.
A single non-awk directive has been provided: the include directive. Assuming that # is the directive prefix, #include(filename) directs the MP translator to immediately read from the indicated file, processing lines from it in the normal manner. This processing mode makes the include directive the only type of directive to take effect at translation time. Nested includes are allowed. Include directives must appear on a line by themselves. More ela- borate types of file processing may be directly programmed using appropriate awk statements in the input file.
The MP translator builds the resulting awk program in one of two ways, depending on the form of the first input line. If that line begins with "function", it is assumed that the user is providing one or more functions, including the func- tion "main" required by m5. If the first line does not begin with "function", then the entire input file is translated into awk statements that are placed inside "main". If some input lines are inside functions, and oth- ers are not, awk will will detect this and complain. The MP by design has little awareness of the syntax of directive lines (awk statements), and as a consequence syntax errors in directive lines are not detected until the output program is executed.
Finally, unless the -c (compile only) option is specified on the command line, the output program is executed to produce the final output (directed by default to standard output). The version of awk specified in ARGV[0] (a built-in awk variable containing the command name) is used to execute the program. If ARGV[0] is null, awk is used.
Understanding this example requires recognition that macro substitution is a two-step process: (i) the input text is translated into an output awk program, and (ii) the awk program is executed to produce the final output with the macro substitutions actually accomplished. The examples below illustrate this process. # and $ are assumed to be the directive and substitution prefix characters. This example was successfully executed using awk on a Cray C90 running UNICOS 10.0.0.3, gawk on a Gateway E-3200 runing SuSE Linux Version 6.0, and nawk on a Sun Ultra 2 Model 2200 running Solaris 2.5.1.
#function main() {
Example 1: Simple Substitution
------------------------------
# br = "brown"
The quick $br fox.
Example 2: Substitution inside a String
---------------------------------------
# r = "row"
The quick b$(r)n fox.
Example 3: Expression Substitution
----------------------------------
# a = 4
# b = 3
The quick $(2*a + b) foxes.
Example 4: Macros References inside a Macro
-------------------------------------------
# $[fox] = "\$[q] \$[b] \$[f]"
# $[q] = "quick"
# $[b] = "brown"
# $[f] = "fox"
The $[fox].
Example 5: Array Reference Substitution
---------------------------------------
# x[7] = "brown"
# b = 3
The quick $x[2*b+1] fox.
Example 6: Function Reference Substitution
------------------------------------------
The quick $color(1,2) fox.
Example 7: Substitution of Special Characters
---------------------------------------------
\# The \$ quick \\ brown $# fox. $$
#}
#include(testincl.m5)
#function color(i,j) {
The lazy dog.
# if (i == j)
# return "blue"
# else
# return "brown"
#}
function main() {
print
print " Example 1: Simple Substitution"
print " ------------------------------"
br = "brown"
print " The quick " br " fox."
print
print " Example 2: Substitution inside a String"
print " ---------------------------------------"
r = "row"
print " The quick b" r "n fox."
print
print " Example 3: Expression Substitution"
print " ----------------------------------"
a = 4
b = 3
print " The quick " 2*a + b " foxes."
print
print " Example 4: Macros References inside a Macro"
print " -------------------------------------------"
M["fox"] = "$[q] $[b] $[f]"
M["q"] = "quick"
M["b"] = "brown"
M["f"] = "fox"
print " The " eval(M["fox"]) "."
print
print " Example 5: Array Reference Substitution"
print " ---------------------------------------"
x[7] = "brown"
b = 3
print " The quick " x[2*b+1] " fox."
print
print " Example 6: Function Reference Substitution"
print " ------------------------------------------"
print " The quick " color(1,2) " fox."
print
print " Example 7: Substitution of Special Characters"
print " ---------------------------------------------"
print "\# The \$ quick \\ brown $# fox. $$"
}
function color(i,j) {
print " The lazy dog."
if (i == j)
return "blue"
else
return "brown"
}
function eval(inp ,isplb,irb,out,name) {
splb = SP "["
out = ""
while( isplb = index(inp, splb) ) {
irb = index(inp, "]")
if ( irb == 0 ) {
out = out substr(inp,1,isplb+1)
inp = substr( inp, isplb+2 )
} else {
name = substr( inp, isplb+2, irb-isplb-2 )
sub( /^ +/, "", name )
sub( / +$/, "", name )
out = out substr(inp,1,isplb-1) eval(M[name])
inp = substr( inp, irb+1 )
}
}
out = out inp
return out
}
BEGIN {
SP = "$"
main()
exit
}
Example 1: Simple Substitution ------------------------------ The quick brown fox. Example 2: Substitution inside a String --------------------------------------- The quick brown fox. Example 3: Expression Substitution ---------------------------------- The quick 11 foxes. Example 4: Macros References inside a Macro ------------------------------------------- The quick brown fox. Example 5: Array Reference Substitution --------------------------------------- The quick brown fox. Example 6: Function Reference Substitution ------------------------------------------ The lazy dog. The quick brown fox. Example 7: Substitution of Special Characters --------------------------------------------- # The $ quick \ brown $# fox. $$
William A. Ward, Jr., School of Computer and Information Sciences, University of South Alabama, Mobile, Alabama, July 23, 1999.
awkwords --title "Title" file > file.html
awkwords file > file.html
This code requires gawk and bash. To download:
wget http://lawker.googlecode.com/svn/fridge/lib/bash/awkwords chmod +x awkwords
To test the code, apply it to itself:
AwkWords is a simple-to-use markup language for writing documentation for programs whose comment lines start with "#" and whose comments contain HTML code.
For example, awk.info?tools/awkwords shows the html generated from this bash script.
When used with the --title option, a stand alone web page is generated (to control the style of that page, see the CSS function, dicussed below). When used without --title it generated some html suitable for inclusion into other pages.
Also, AwkWords finds all the <h2>, <h3>, <h4>, <h5>, <h6>, <h7>, <h8>, <h9> headings and copies them to a table of contents at the front of the file. Note that AwkWords assumes that the file contains only one <h1> heading- this is printed before the table of contents.
AwkWords adds some short cuts for HTML markup, as well as including nested contents (see below: "including nested content"). This is useful for including, say, program output along with the actual program.
Awkwords is divided into three functions: unhtml fixes the printing of pre-formatted blocks; toc adds the table of contents while includes handles the details of the extra mark-up.
unhtml() { cat $1| gawk '
BEGIN {IGNORECASE=1}
/^<PRE>/ {In=1; print; next}
/^<\/PRE>/ {In=0; print; next}
In {gsub("<","\\<",$0); print; next }
{print $0 }'
}
toc() { cat $1 | gawk '
BEGIN { IGNORECASE = 1 }
/^<[h]1>/ { Header=$0; next}
/^[<]h[23456789]>/ {
T++ ;
Toc[T] = gensub(/(.*)<h(.*)>[ \t]*(.*)[ \t]*<\/h(.*)>(.*)/,
"<""h\\2><""font color=black>\\•</font></a> <""a href=#" T ">\\3</a></h\\4>",
"g",$0)
Pre="<a name="T"></a>" }
{ Line[++N] = Pre $0; Pre="" }
END { print Header;
print "<" "h2>Contents</h2>"
print "<" "div id=\"htmltoc\">"
for(I=1;I<=T;I++) print Toc[I]
print "<" "/div><!--- htmltoc --->"
print "<" "div id=\"htmlbody\">"
for(I=1;I<=N;I++) print Line[I]
print "</" "div><!--- htmlbody --->"
}'
}
The xpand function controls recursive inclusion of content. Note that
includes() { cat $1 | gawk '
function xpand(pre, tmp) {
if ($1 ~ "^#.IN") xpands($2,pre)
else if ($1 ~ "^#.BODY" ) xpandsBody($2,pre)
else if ($1 ~ "^#.LISTING") {
print "<" "pre>"
xpands($2,1) # <===== note the recursive call with "1"
print "<" "/pre>" }
else if ($1 ~ "^#.CODE") {
print "<" "p>" $2 "\n<" "pre>"
xpands($2,1) # <===== note the recursive call with "1"
print "<" "/pre>" }
else if ($1 ~ "^#.URL") {
tmp = $2; $1=$2="";
print "<" "a href=\""tmp"\">" trim($0) "</a>"
}
else if ($1 ~ "^#.TO") {
tmp = $2; $1=$2="";
print "<" "a href=\"mailto:"tmp"\">" trim($0) "</a>"
}
else
xpand1(pre)
}
The xpand1 function controls the printing of a single line. If we are formatting verbatim text, we must remove the start-of-html character "<". Otherwise, we expand any html shortcuts.
function xpand1(pre) {
if (pre)
gsub("<","\\<",$0) # <=== remove start-of-html-character
else {
$0= xpandHtml($0) # <=== expand html short cuts
sub(/^#/,"",$0) }
print $0
}
The function xpandHtml controls the html short cuts
function xpandHtml( str,tag) {
if ($0 ~ /^#\.H1/) {
$1=""
return "<" "h""1><join>" $0 "</join></" "h1>" }
if (sub(/^#\./,"",$1)) {
tag=$1; $1=""
return "<" tag ">" (($0 ~ /^[ \t]*$/) ? "" : $0"</"tag">")
}
return $0
}
The rest of the code is just some book-keeping and managing the recursive addition of content.
function xpands(f,pre) {
if (newFile(f)) {
while((getline <f) > 0) xpand(pre)
close(f) }
}
function xpandsBody(f,pre, using) {
if (newFile(f)) {
while((getline <f) >0) {
if ( !using && ($0 ~ /^[\t ]*$/) ) using = 1
if ( using ) xpand(pre)}
close(f) }
}
function newFile(f) { return ++Seen[f]==1 }
function trim (s) { sub(/^[ \t]*/,"",s); sub(/[ \t]*$/,"",s); return s }
BEGIN { IGNORECASE=1 }
{ xpand() }'
}
If used to generate a full web page, then the following styles are added. Note that the htmltoc class controls the appearance of the table of contents.
css() {
echo "<""STYLE type=\"text/css\">"
cat<<-'EOF'
div.htmltoc h2 { font-size: medium; font-weight: normal;
margin: 0 0 0 0; margin-left: 30px;}
div.htmltoc h3 { font-size: medium; font-weight: normal;
margin: 0 0 0 0; margin-left: 60px;}
div.htmltoc h4 { font-size: medium; font-weight: normal;
margin: 0 0 0 0; margin-left: 90px;}
div.htmltoc h5 { font-size: medium; font-weight: normal;
margin: 0 0 0 0; margin-left: 120px;}
div.htmltoc h6 { font-size: medium; font-weight: normal;
margin: 0 0 0 0; margin-left: 150px;}
div.htmltoc h7 { font-size: medium; font-weight: normal;
margin: 0 0 0 0; margin-left: 180px; }
</STYLE>
EOF
}
main() { cat $1 | includes | unhtml | toc; }
if [ $1 == "--title" ]
then
echo "<""html><""head><""title>$2</title>`css`</head><""body>";
shift 2
main $1
echo "<""/body><""/html>"
else
main $1
fi
There's no checking for valid input (e.g. pre-formatting tags that never close).
If the input file contains no html mark up, the results are pretty messy.
Recursive includes fail silently if the referenced file does not exist.
I don't like the way I need a seperate pass to do "unhtml". I tried making it work within the code but it got messy.
awk [-v Dictionaries="sysdict1 sysdict2 ..."] -f spell.awk -- \
[=suffixfile1 =suffixfile2 ...] [+dict1 +dict2 ...] \
[-strip] [-verbose] [file(s)]
Download from LAWKER.
This program is an example par excellence of the power of awk. Yes, if written in "C", it would run faster. But goodness me, it would be much longer to code. These few lines implement a powerful spell checker, with user-specifiable exception lists. The built-in dictionary is constructed from a list of standard Unix spelling dictionaries, overridable on the command line.
It also offers some tips on how to structure larger-than-ten-line awk programs. In the code below, note the:
(And to write even larger programs, divided into many files, see runawk.)
Dictionaries are simple text files, with one word per line. Unlike those for Unix spell(1), the dictionaries need not be sorted, and there is no dependence on the locale in this program that can affect which exceptions are reported, although the locale can affect their reported order in the exception list. A default list of dictionaries can be supplied via the environment variable DICTIONARIES, but that can be overridden on the command line.
For the purposes of this program, words are located by replacing ASCII control characters, digits, and punctuation (except apostrophe) with ASCII space (32). What remains are the words to be matched against the dictionary lists. Thus, files in ASCII and ISO-8859-n encodings are supported, as well as Unicode files in UTF-8 encoding.
All word matching is case insensitive (subject to the workings of tolower()).
In this simple version, which is intended to support multiple languages, no attempt is made to strip word suffixes, unless the +strip option is supplied.
Suffixes are defined as regular expressions, and may be supplied from suffix files (one per name) named on the command line, or from an internal default set of English suffixes. Comments in the suffix file run from sharp (#) to end of line. Each suffix regular expression should end with $, to anchor the expression to the end of the word. Each suffix expression may be followed by a list of one or more strings that can replace it, with the special convention that "" represents an empty string. For example:
ies$ ie ies y # flies -> fly, series -> series, ties -> tie ily$ y ily # happily -> happy, wily -> wily nnily$ n # funnily -> fun
Although it is permissible to include the suffix in the replacement list, it is not necessary to do so, since words are looked up before suffix stripping.
Suffixes are tested in order of decreasing length, so that the longest matches are tried first.
The default output is just a sorted list of unique spelling exceptions, one per line. With the +verbose option, output lines instead take the form
filename:linenumber:exception
Some Unix text editors recognize such lines, and can use them to move quickly to the indicated location.
BEGIN { initialize() }
{ spell_check_line() }
END { report_exceptions() }
function get_dictionaries( files, key)
{
if ((Dictionaries == "") && ("DICTIONARIES" in ENVIRON))
Dictionaries = ENVIRON["DICTIONARIES"]
if (Dictionaries == "") # Use default dictionary list
{
DictionaryFiles["/usr/dict/words"]++
DictionaryFiles["/usr/local/share/dict/words.knuth"]++
}
else # Use system dictionaries from command line
{
split(Dictionaries, files)
for (key in files)
DictionaryFiles[files[key]]++
}
}
function initialize()
{
NonWordChars = "[^" \
"'" \
"ABCDEFGHIJKLMNOPQRSTUVWXYZ" \
"abcdefghijklmnopqrstuvwxyz" \
"\200\201\202\203\204\205\206\207\210\211\212\213\214\215\216\217" \
"\220\221\222\223\224\225\226\227\230\231\232\233\234\235\236\237" \
"\240\241\242\243\244\245\246\247\250\251\252\253\254\255\256\257" \
"\260\261\262\263\264\265\266\267\270\271\272\273\274\275\276\277" \
"\300\301\302\303\304\305\306\307\310\311\312\313\314\315\316\317" \
"\320\321\322\323\324\325\326\327\330\331\332\333\334\335\336\337" \
"\340\341\342\343\344\345\346\347\350\351\352\353\354\355\356\357" \
"\360\361\362\363\364\365\366\367\370\371\372\373\374\375\376\377" \
"]"
get_dictionaries()
scan_options()
load_dictionaries()
load_suffixes()
order_suffixes()
}
function load_dictionaries( file, word)
{
for (file in DictionaryFiles)
{
## print "DEBUG: Loading dictionary " file > "/dev/stderr"
while ((getline word < file) > 0)
Dictionary[tolower(word)]++
close(file)
}
}
function load_suffixes( file, k, line, n, parts)
{
if (NSuffixFiles > 0) # load suffix regexps from files
{
for (file in SuffixFiles)
{
## print "DEBUG: Loading suffix file " file > "/dev/stderr"
while ((getline line < file) > 0)
{
sub(" *#.*$", "", line) # strip comments
sub("^[ \t]+", "", line) # strip leading whitespace
sub("[ \t]+$", "", line) # strip trailing whitespace
if (line == "")
continue
n = split(line, parts)
Suffixes[parts[1]]++
Replacement[parts[1]] = parts[2]
for (k = 3; k <= n; k++)
Replacement[parts[1]]= Replacement[parts[1]] " " parts[k]
}
close(file)
}
}
else # load default table of English suffix regexps
{
split("'$ 's$ ed$ edly$ es$ ing$ ingly$ ly$ s$", parts)
for (k in parts)
{
Suffixes[parts[k]] = 1
Replacement[parts[k]] = ""
}
}
}
function order_suffixes( i, j, key)
{
# Order suffixes by decreasing length
NOrderedSuffix = 0
for (key in Suffixes)
OrderedSuffix[++NOrderedSuffix] = key
for (i = 1; i < NOrderedSuffix; i++)
for (j = i + 1; j <= NOrderedSuffix; j++)
if (length(OrderedSuffix[i]) < length(OrderedSuffix[j]))
swap(OrderedSuffix, i, j)
}
function report_exceptions( key, sortpipe)
{
sortpipe= Verbose ? "sort -f -t: -u -k1,1 -k2n,2 -k3" : "sort -f -u -k1"
for (key in Exception)
print Exception[key] | sortpipe
close(sortpipe)
}
function scan_options( k)
{
for (k = 1; k < ARGC; k++)
{
if (ARGV[k] == "-strip")
{
ARGV[k] = ""
Strip = 1
}
else if (ARGV[k] == "-verbose")
{
ARGV[k] = ""
Verbose = 1
}
else if (ARGV[k] ~ /^=/) # suffix file
{
NSuffixFiles++
SuffixFiles[substr(ARGV[k], 2)]++
ARGV[k] = ""
}
else if (ARGV[k] ~ /^[+]/) # private dictionary
{
DictionaryFiles[substr(ARGV[k], 2)]++
ARGV[k] = ""
}
}
# Remove trailing empty arguments (for nawk)
while ((ARGC > 0) && (ARGV[ARGC-1] == ""))
ARGC--
}
function spell_check_line( k, word)
{
## for (k = 1; k <= NF; k++) print "DEBUG: word[" k "] = \"" $k "\""
gsub(NonWordChars, " ") # eliminate nonword chars
for (k = 1; k <= NF; k++)
{
word = $k
sub("^'+", "", word) # strip leading apostrophes
sub("'+$", "", word) # strip trailing apostrophes
if (word != "")
spell_check_word(word)
}
}
function spell_check_word(word, key, lc_word, location, w, wordlist)
{
lc_word = tolower(word)
## print "DEBUG: spell_check_word(" word ") -> tolower -> " lc_word
if (lc_word in Dictionary) # acceptable spelling
return
else # possible exception
{
if (Strip)
{
strip_suffixes(lc_word, wordlist)
## for (w in wordlist) print "DEBUG: wordlist[" w "]"
for (w in wordlist)
if (w in Dictionary)
break
if (w in Dictionary)
return
}
## print "DEBUG: spell_check():", word
location = Verbose ? (FILENAME ":" FNR ":") : ""
if (lc_word in Exception)
Exception[lc_word] = Exception[lc_word] "\n" location word
else
Exception[lc_word] = location word
}
}
function strip_suffixes(word, wordlist, ending, k, n, regexp)
{
## print "DEBUG: strip_suffixes(" word ")"
split("", wordlist)
for (k = 1; k <= NOrderedSuffix; k++)
{
regexp = OrderedSuffix[k]
## print "DEBUG: strip_suffixes(): Checking \"" regexp "\""
if (match(word, regexp))
{
word = substr(word, 1, RSTART - 1)
if (Replacement[regexp] == "")
wordlist[word] = 1
else
{
split(Replacement[regexp], ending)
for (n in ending)
{
if (ending[n] == "\"\"")
ending[n] = ""
wordlist[word ending[n]] = 1
}
}
break
}
}
## for (n in wordlist) print "DEBUG: strip_suffixes() -> \"" n "\""
}
function swap(a, i, j, temp)
{
temp = a[i]
a[i] = a[j]
a[j] = temp
}
Arnold Robbins and Nelson H.F. Beebe in "Classic Shell Scripting", O'Reilly Books
Some of the code at awk.info is somewhat historical in nature. For example, Scott Pakin's gender predictor was written in 1991. Given that, it might be mistakenly concluded that Awk is somehow old-fashioned and not suitable for modern tasks.
Text mining, on the other hand, could be the killer app for Awk in the 21st century. The language excels at creating one-off reports that handle the quirks of a particular file format.
There is a growing interest in using Awk for this kind of work. All the examples presented below come from work conducted in 2007, 2008:
If we could properly understand unstructured text, this would be a result of tremendous practical importance. A recent study concluded that:
That is, if we can tame the text mining problem, it would be possible to reason and learn from a much wider range of business data than ever before.
Note that, in the Menzies/Marcus and Schmitt/Christianson tool kits, Awk by itself was not enough. The two data mining toolkits mentioned above were all intricate combinations of Awk and sed and bash and etc end etc. Within that combination, Awk was very useful for handling the specifics not managed by the other tools.
Lothar M. Schmitt and Kiel T. Christianson:
Their notes include a short introduction to programming the Bourne-shell and rather short, but complete descriptions of sed and awk customized in regard to language analysis.
Tim Menzies and Andrian Marcus:
Severis is a set of Awk, bash, sed, etc scripts for finding predictors of high severity issues in text reports. Test engineers write such issue reports whenever they encounter anomalies in the code they are inspecting.
Severis was designed to be an audit tool for test engineers, a second "look over the shoulder" to alert a senior engineer if a junior test engineer was doing something strange.
At least for the text issue reports studied by Severis, very simple tools were enough to determine the terms that predicting for different issue severities.
Donald 'Paddy' McCarthy reports an interesting comparison of Awk vs Perl vs Python for doing some text pre-processing.
The example shows off Awk's ability to quickly prototype a one-off specialized report for a particular data format.
It also offers some comment on the language wars between Awk and <insert your favorite scripting language here>: there is no evidence in the following code that dear old-fashioned Awk is more complex or arcane or slower that more recent, supposedly better, languages.
<string:date> [ <float:data-n> <int:flag-n> ]*24
e.g.
1991-03-31 10.000 1 10.000 1 ... 20.000 1 35.000 1
The awk example:
# Author Donald 'Paddy' McCarthy Jan 01 2007
BEGIN{
nodata = 0; # Curret run of consecutive flags < 0 in lines of file
nodata_max=-1; # Max consecutive flags < 0 in lines of file
nodata_maxline="!"; # ... and line number(s) where it occurs
}
FNR==1 {
# Accumulate input file names
if(infiles){
infiles = infiles "," infiles
} else {
infiles = FILENAME
}
}
{
tot_line=0; # sum of line data
num_line=0; # number of line data items with flag>0
# extract field info, skipping initial date field
for(field=2; field < =NF; field+=2){
datum=$field;
flag=$(field+1);
if(flag < 1){
nodata++
}else{
# check run of data-absent fields
if(nodata_max==nodata && (nodata>0)){
nodata_maxline=nodata_maxline ", " $1
}
if(nodata_max < nodata && (nodata>0)){
nodata_max=nodata
nodata_maxline=$1
}
# re-initialise run of nodata counter
nodata=0;
# gather values for averaging
tot_line+=datum
num_line++;
}
}
# totals for the file so far
tot_file += tot_line
num_file += num_line
printf "Line: %11s Reject: %2i Accept: %2i Line_tot: %10.3f Line_avg: %10.3f\n", \
$1, ((NF -1)/2) -num_line, num_line, tot_line, (num_line>0)? tot_line/num_line: 0
# debug prints of original data plus some of the computed values
#printf "%s %15.3g %4i\n", $0, tot_line, num_line
#printf "%s\n %15.3f %4i %4i %4i %s\n", $0, tot_line, num_line, nodata, nodata_max, nodata_maxline
}
END{
printf "\n"
printf "File(s) = %s\n", infiles
printf "Total = %10.3f\n", tot_file
printf "Readings = %6i\n", num_file
printf "Average = %10.3f\n", tot_file / num_file
printf "\nMaximum run(s) of %i consecutive false readings ends at line starting with date(s): %s\n", nodata_max, nodata_maxline
}
The same functionality in perl is very similar to the awk program:
# Author Donald 'Paddy' McCarthy Jan 01 2007
BEGIN {
$nodata = 0; # Curret run of consecutive flags < 0 in lines of file
$nodata_max=-1; # Max consecutive flags < 0 in lines of file
$nodata_maxline="!"; # ... and line number(s) where it occurs
}
foreach (@ARGV) {
# Accumulate input file names
if($infiles ne ""){
$infiles = "$infiles, $_";
} else {
$infiles = $_;
}
}
while ( < >){
$tot_line=0; # sum of line data
$num_line=0; # number of line data items with flag>0
# extract field info, skipping initial date field
chomp;
@fields = split(/\s+/);
$nf = @fields;
$date = $fields[0];
for($field=1; $field < $nf; $field+=2){
$datum = $fields[$field] +0.0;
$flag = $fields[$field+1] +0;
if(($flag+1 < 2)){
$nodata++;
}else{
# check run of data-absent fields
if($nodata_max==$nodata and ($nodata>0)){
$nodata_maxline = "$nodata_maxline, $fields[0]";
}
if($nodata_max < $nodata and ($nodata>0)){
$nodata_max = $nodata;
$nodata_maxline=$fields[0];
}
# re-initialise run of nodata counter
$nodata = 0;
# gather values for averaging
$tot_line += $datum;
$num_line++;
}
}
# totals for the file so far
$tot_file += $tot_line;
$num_file += $num_line;
printf "Line: %11s Reject: %2i Accept: %2i Line_tot: %10.3f Line_avg: %10.3f\n",
$date, (($nf -1)/2) -$num_line, $num_line, $tot_line, ($num_line>0)? $tot_line/$num_line: 0;
}
printf "\n";
printf "File(s) = %s\n", $infiles;
printf "Total = %10.3f\n", $tot_file;
printf "Readings = %6i\n", $num_file;
printf "Average = %10.3f\n", $tot_file / $num_file;
printf "\nMaximum run(s) of %i consecutive false readings ends at line starting with date(s): %s\n",
$nodata_max, $nodata_maxline;
The python program however splits the fields in the line slightly differently (although it could use the method used in the perl and awk programs too):
# Author Donald 'Paddy' McCarthy Jan 01 2007
import fileinput
import sys
nodata = 0; # Curret run of consecutive flags < 0 in lines of file
nodata_max=-1; # Max consecutive flags < 0 in lines of file
nodata_maxline=[]; # ... and line number(s) where it occurs
tot_file = 0 # Sum of file data
num_file = 0 # Number of file data items with flag>0
infiles = sys.argv[1:]
for line in fileinput.input():
tot_line=0; # sum of line data
num_line=0; # number of line data items with flag>0
# extract field info
field = line.split()
date = field[0]
data = [float(f) for f in field[1::2]]
flags = [int(f) for f in field[2::2]]
for datum, flag in zip(data, flags):
if flag < 1:
nodata += 1
else:
# check run of data-absent fields
if nodata_max==nodata and nodata>0:
nodata_maxline.append(date)
if nodata_max < nodata and nodata>0:
nodata_max=nodata
nodata_maxline=[date]
# re-initialise run of nodata counter
nodata=0;
# gather values for averaging
tot_line += datum
num_line += 1
# totals for the file so far
tot_file += tot_line
num_file += num_line
print "Line: %11s Reject: %2i Accept: %2i Line_tot: %10.3f Line_avg: %10.3f" % (
date,
len(data) -num_line,
num_line, tot_line,
tot_line/num_line if (num_line>0) else 0)
print ""
print "File(s) = %s" % (", ".join(infiles),)
print "Total = %10.3f" % (tot_file,)
print "Readings = %6i" % (num_file,)
print "Average = %10.3f" % (tot_file / num_file,)
print "\nMaximum run(s) of %i consecutive false readings ends at line starting with date(s): %s" % (
nodata_max, ", ".join(nodata_maxline))
xmonthly is a hybrid shell script (part bash/gawk/xtoolkit intrinsics) that displays an overview of reminders based on the current month.
It is a simple example of how to use X-window tools with Gawk.
xmonthly employs gawk's implementation of the strftime() function in order to discern the current month. Non-english users must use the 1st three letters of each month in the xmonthly database as determined by the end user's local.
Downlaod: 16K
Author: Michael S. Sanders