Awk.Info

"Cause a little auk awk
goes a long way."

About awk.info
 »  table of contents
 »  featured topics
 »  page tags


About Awk
 »  advocacy
 »  learning
 »  history
 »  Wikipedia entry
 »  mascot
Implementations
 »  Awk (rarely used)
 »  Nawk (the-one-true, old)
 »  Gawk (widely used)
 »  Mawk
 »  Xgawk (gawk + xml + ...)
 »  Spawk (SQL + awk)
 »  Jawk (Awk in Java JVM)
 »  QTawk (extensions to gawk)
 »  Runawk (a runtime tool)
 »  platform support
Coding
 »  one-liners
 »  ten-liners
 »  tips
 »  the Awk 100
Community
 »  read our blog
 »  read/write the awk wiki
 »  discussion news group

Libraries
 »  Gawk
 »  Xgawk
 »  the Lawker library
Online doc
 »  reference card
 »  cheat sheet
 »  manual pages
 »  FAQ

Reading
 »  articles
 »  books:

WHAT'S NEW?

Mar 01: Michael Sanders demos an X-windows GUI for AWK.

Mar 01: Awk100#24: A. Lahm and E. de Rinaldis' patent search, in AWK

Feb 28: Tim Menzies asks this community to write an AWK cookbook.

Feb 28: Arnold Robbins announces a new debugger for GAWK.

Feb 28: Awk100#23: Premysl Janouch offers a IRC bot, In AWK

Feb 28: Updated: the AWK FAQ

Feb 28: Tim Menzies offers a tiny content management system, in Awk.

Jan 31: Comment system added to awk.info. For example, see discussion bottom of ?keys2awk

Jan 31: Martin Cohen shows that Gawk can handle massively long strings (300 million characters).

Jan 31: The AWK FAQ is being updated. For comments/ corrections/ extensions, please mail tim@menzies.us

Jan 31: Martin Cohen finds Awk on the Android platform.

Jan 31: Aleksey Cheusov released a new version of runawk.

Jan 31: Hirofumi Saito contributes a candidate Awk mascot.

Jan 31: Michael Sanders shows how to quickly build an AWK GUI for windows.

Jan 31: Hyung-Hwan Chung offers QSE, an embeddable Awk Interpreter.

[More ...]

Bookmark and Share

categories: CMS,Tools,Mar,2010,Timm

TinyTim: a Content Management System

TINY TIM is a tiny web-site manager written in AWK. For a live demo of the site, see http://at.ttoy.net/?tinytim. The site supports runtime content generation; e.g. the quote shown top right of the demo site is auto-generated each time you refresh the page.

The site was written to demonstrate that a little AWK goes a long way. At the time of this writing, the current system is under 100 lines of code (excluding a seperate formatter, that is another 170 lines of code). It took longer to write this doco and the various HTML/CSS theme files, than the actual code itself (fyi: 6 hours for the themes/doc and 3 hours for the code).

TINY TIM has the following features:

  • Pages can be accessed by their (lowercase) name, or by their (uppercase) tags.
  • Pages can be displayed using a set of customizable themes.
  • Page contents can be written using a HTML shorthand language called MARKUP.
  • Pages can be searched using a Google search box.
  • Source code is auto-displayed using a syntax highlighter.
  • Page content can be auto-created via programmer-modifable plugins.

Install

In a web accessible directory, type

 svn export http://knit.googlecode.com/svn/branches/0.2/tinytim/ 

In the resulting directory, perform the local juju required to make index.cgi web-runnable (e.g. on my ISP, chmod u+rx index.cgi).

Follow the directions in the next section to customize the site.

Using TINY TIM

index.cgi

TINY TIM is controlled by the following index.cgi file. To select a theme, comment out all but one of the last lines (using the "#" character). For a screen-shots of the current themes, see below.

#!/bin/bash
 
[ -n "$1" ] && export QUERY_STRING="$1"
 
tinytim() {
  cat content/* themes/$1/theme.txt |
  gawk -f lib/tinytim.awk |
  sed 's/^<pre>/<script type="syntaxhighlighter" class="brush: cpp"><![CDATA[/' |
  sed 's/^<\/pre>/<\/script>/'
} 
  
 #tinytim auklet
 #tinytim trendygreen
 tinytim wink

Notes:

  • The sed commands: these render normal <pre> using Alex Gorbatchev's excellent syntax highlighter. To change the highlighting rules for a different language, change brush: cpp to one of the supported aliases.
  • The cat command: this assembles the content for the system. Multiple authors can write multiple files in the sub-directorty content.

Themes

Themes are defined in the sub-directory themes/themename. Each theme is defined by a theme.txt file that holds:

  • The HTML template for the theme.
  • The in-line style sheet for the theme.
  • The page contents with pre-defined string names marked with ``; e.g. ``title``. To change those strings, see the instructions at the end of this page.
  • If a `` entry contains a semi-colon (e.g. ``quotes;``) then it is a plugin. Plugin content is generated at runtime using a method described at the end of this document.

To write a new theme:

  1. Create a new folder themes/new.
  2. Copy (e.g.) wink/theme.txt to new.
  3. Using the copied theme as a template, start tinkering.

The following themes are defined in the directory themes.

Auklet:

Trendygreen (adapted from GetTemplates):

Wink:

Defining String Values

The first entry in the content defines strings that can slip into the theme templates. For example, the following slots define the title of a site; the name of formatter script that renders each page; the url of the home directory of the site; a menu to add top of each page; a footer to add to the bottom of each page; and a web-accessible directory for storing images.

 ``title``       Just another Tiny Tim demo
 ``formatter``   lib/markup.awk
 ``description`` (simple cms)
 ``home``        http://at.ttoy.net
 ``menu``        <a href="?index">Home</a> | 
                 <a href="?contact">Contact</a>  |
                 <a href="?about">About</a>
 ``footer``      <p>Powered by <a href="?tinytim">TINY TIM</a>. 
                                 © 2010 by Tim Menzies 
 ``images``      http://at.ttoy.net/img

Note the following important convention. TINY TIM auto-generates some of its own strings. The names of these strings start with an uppercase letter. To avoid confusion of your strings with those that are auto-generated, it is best to start your strings with a lower-case letter (e.g. like all those in the above example.

Adding a Search Engine

Google offers a nice free site-specific search engine. It takes a few days for the spiders to find the site but after that, it works fine. To set this up, follow the instructions at Google custom search, then

  • Add the appropriate magic strings into the first entry of the content (usually content/0config.txt).
  • Add references to those strings to your template.

For example, look for google-search in the current templates and content/0config.txt.

Writing pages

After the first entry, the rest of the entries in the content/* define the pages of a site. Each entry must begin with the magic string

  • Each entry must begin with the magic string #12345
  • The entry consists of paragraphs (separated by blank lines.
  • Paragraph one contains the (short) page name (on line one) following by the page tags (on line two).
      • Note that the page name must start with a lower case letter.
      • And the tags must start with an upper case letter.
  • Paragraph two contains the heading of the page.
  • The remaining paragraphs are the page contents.

For example, this site contains a missing page report. This page is defined as follows. In the following definition of that page, the name is "404"; the tags are "Admin Feb10" and the title is "Sorry".

 #12345####################################################################################
 
 404
 Admin Feb10
  
 Sorry
  
 I have bad news:
 
 <center>
 [img/404book.jpg]
 </center>

The contents can contain HTML and MARKUP tags.

MARKUP

MARKUP is a shorthand for writing HTML pages based on MARKDOWN:

  • Italics, bold, typerwritter font are marked by matching _, *, and ` characters (respectively).
  • Lists are marked by leading "+" characters.
  • Numbered lists are marked by leading "1." strings.
  • Links are enclosed in [square brackets]. The first word in the bracket is the URL and subsequent words are the text for the URL link.
  • Images are marked up with the same [square brackets], but the first work must end in one of .png, .gif, .jpg. Any subsequent words are passed as tags to the <img> tag.

Also, in MARKUP, major, minor, sub-, and sub-sub- headings are two line paragraphs where the second line contains two or more "=", "-", "+", "_" (respectively). MARKUP collects these headings as a table of contents, which is added to the top of the page.

Note that MARKUP is separate to TINY TIM. To change the formatting of pages, write your own AWK code and change the string ``formatter`` in the first entry of content/0config.txt.

Plugins

If a `` entry contains a semi-colon (e.g. ``quotes;``) then it is a plugin. Plugin content is generated at runtime. To write a plugin, modify the file lib/plugins.awk. Currently, that file looks like this:

 function slotsPlugIns(str,slots,   tmp) {
    split(str,tmp,";")
    if (tmp[1]=="quotes")
        return quotes(str,slots)
    return str
 }
 function quotes(str,slots,    n,tmp) {
    srand(systime() + PROCINFO["pid"])
    n=split(slots["quotes"],tmp,"\n")
    return tmp[int(rand()*n) + 1]
 }

The function slotsPlugIns is a "traffic-cop" who decides what plugin to call (in the above, there is only one current plugin: quotes).

Each plugin function (e.g. quotes) is passed the string from the template (see str) and an array of key/value pairs holding all the defined string values (see slots). These functions must return a string to be inserted into the rendered HTML.

In the example above, quotes just returns a random quote. It assumes that the predefined strings includes a set of quotes, one per line:

 ``quotes`` Small  things with great love. <br>-- Mother Teresa
     It's hard work to it look effortless.<br>-- Katarina Witt
    "God bless us every one!".<br>-- Tiny Tim

The quote generated by this plug in can be view, top right of this page.


categories: Runawk,Project,Tools,Jan,2010,AlexC

Runawk 0.19 Released

Download

http://sourceforge.net/projects/runawk

About

runawk is a small wrapper for the AWK interpreter that helps one write standalone AWK scripts. Its main feature is to provide a module/library system for AWK which is somewhat similar to Perl's "use" command. It also allows you to select a preferred AWK interpreter and to setup the environment for your scripts. It also provides other helpful features, for example it includes numerous useful of modules.

Major Changes IN RUNAWK-0.19.0

  • fix in runawk.c: \n was missed in "running '%s' failed: %s" error message. The problem was seen on ancient (12 years old) HP-UX
  • fix in teets/test.mk: "diff -u" is not portable (SunOS, HP-UX),
  • DIFF_PROG variable is introduced to fix the problem
  • fix in modules/power_getopt.awk: after printing help message we
  • should exit immediately not running END section, s/exit/exitnow/
  • new function heapsort_values in heapsort.awk module
  • new function quicksort_values in quicksort.awk module
  • new function sort_values in sort.awk module

Author

Aleksey Cheusov


categories: Runawk,Project,Tools,Nov,2009,AlexC

Runawk 0.18 Released

Download

http://sourceforge.net/projects/runawk

About

runawk is a small wrapper for the AWK interpreter that helps one write standalone AWK scripts. Its main feature is to provide a module/library system for AWK which is somewhat similar to Perl's "use" command. It also allows you to select a preferred AWK interpreter and to setup the environment for your scripts. It also provides other helpful features, for example it includes numerous useful of modules.

Major Changes IN RUNAWK-0.18.0

Makefile:

  • "install-dirs" target has been renamed to "installdirs"
  • At compile time MODULESDIR can contain a *list* of colon-separated directories, e.g. /usr/local/share/runawk:/usr/local/share/awk
  • Support for multiply applied options, e.g. -vvv for increasing verbosity level. If option without arguments is multiply applied, getarg() function returns a number of times it was applied, not just 0 or 1.

New modules:

  • init_getopt.awk using alt_getopt.awk and used by power_getopt.awk. Its goal is to initialize `long_opts' and `long_opts' variables but not run `getopt' function.
  • heapsort.awk : heapsort :-)
  • quicksort.awk : quicksort :-)
  • sort.awk : either heapsort or quicksort, the default is heapsort. Unfortunately GAWK's asort() and asorti() functions do *not* satisfy my needs. Another (and more important) reason is a portability.

Improvements, clean-ups and fixes in regression tests.

Also, runawk-0-18-0 was successfully tested on the following platforms: NetBSD-5.0/x86, NetBSD-2.0/alpha, OpenBSD-4.5/x86, FreeBSD-7.1/x86, FreeBSD-7.1/spark, Linux/x86 and Darwin/ppc.

Author

Aleksey Cheusov


categories: Runawk,Project,Tools,Sept,2009,AlexC

New release: RUNAWK 0.17

What is RUNAWK?

RUNAWK is a small wrapper for the AWK interpreter that helps one write standalone AWK scripts. Its main feature is to provide a module/library system for AWK which is somewhat similar to Perl's "use" command. It also allows you to select a preferred AWK interpreter and to setup the environment for your scripts. RUNAWK makes programming AWK easy and efficient. RUNAWK also provides many useful AWK modules.

Sources

Major Changes

Version 0.17.0, by Aleksey Cheusov, Sat, 12 Sep 2009

runawk:

  • ADDED: new option for runawk for #use'ing modules: -f. runawk can also be used for oneliners! ;-)
          runawk -f abs.awk -e 'BEGIN {print abs(-123); exit}'
    
  • In a multilined code passed to runawk using option -e, spaces are allowed before #directives.
  • After inventing alt_getopt.awk module there is no reason for heuristics that detects whether to add `-' to AWK arguments or not. So I've removed this heuristics. Use alt_getopt.awk module or other "smart" module for handling options correctly!

alt_getopt.awk and power_getopt.awk:

  • FIX: for "abc:" short options specifier BSD and GNU getopt(3) accept "-acb" and understand it as "-a -cb", they also accept "-ac b" and also translate it to "-a -cb". Now alt_getopt.awk and power_getopt.awk work the same way.

power_getopt.awk:

  • -h option doesn't print usage information, --help (and its short synonym) does.

New modules:

  • shquote.awk, implementing shquote() function.
    shquote(str):
      `shquote' transforms the string `str' by adding shell escape and quoting characters to include it to the system() and popen() functions as an argument, so that the arguments will have the correct values after being evaluated by the shell.
    Inspired by NetBSD's shquote(3) from libc.
  • runcmd.awk, implementing functions runcmd1() and xruncmd1()
    runcmd1(CMD, OPTS, FILE):
      wrapper for function system() that runs a command CMD with options OPTS and one filename FILE. Unlike system(CMD " " OPTS " " FILE) the function runcmd1() handles correctly FILE and CMD containing spaces, single quote, double quote, tilde etc.
  • xruncmd1(FILE):
      safe wrapper for 'runcmd(1)'. awk exits with error if running command failed.
  • isnum.awk, implementing trivial isnum() function, see the source code.
  • alt_join.awk, implementing the following functions:
    join_keys(HASH, SEP):
      returns string consisting of all keys from HASH separated by SEP.
    join_values(HASH, SEP):
      returns string consisting of all values from HASH separated by SEP.
    join_by_numkeys (ARRAY, SEP [, START [, END]]):
      returns string consisting of all values from ARRAY separated by SEP. Indices from START (default: 1) to END (default: +inf) are analysed. Collecting values is stopped on index absent in ARRAY.

categories: Runawk,Project,Tools,Apr,2009,AlexC

New release: Runawk 0.16

In comp.lang.awk, Aleksey Cheusov writes:

I've made runawk-0.16.0 release. This release has lots of important improvements and additions. Sources are available from

What is runawk?

RUNAWK is a small wrapper for AWK interpreter that helps to write the standalone programs in AWK. It provides MODULES for AWK similar to PERL's "use" command and other powerful features. Dozens of ready to use modules are also provided.

(For more information, see details from the last release.)

Major changes in this release

Lots of demo programs for most runawk modules were created and they are in examples/ subdirectory now.

New MEGA module ;-) power_getopt.awk See the documentation and demo program examples/demo_power_getopt. It makes options handling REALLY easy (see below).

New modules:

  • embed_str.awk has_suffix.awk
  • has_prefix.awk
  • readfile.awk
  • modinfo.awk

Minor fixes and improvements in dirname.awk and basename.awk. Now they are fully compatible with dirname(1) and basename(1)

RUNAWK sets the following environment variables for the child awk subprocess:

  • RUNAWK_MODC - A number of modules (-f filename) passed to AWK
  • RUNAWK_MODV_<n> - Full path to the module #n, where n is in [0..RUNAWK_MODC) range.

RUNAWK sets RUNAWK_ART_STDIN environment variable for the child awk subprocess to 1 if additional/artificial `-' was added to the list to awk's arguments.

Makefile:

  • bmake-ism were removed. Now Makefile is fully compatible with FreeBSD make.
  • CLEANFILES target is used instead of hand-made rules
  • Minor fix in 'test_all' target

Power_GetOpt.awk

The most powerful feature of this release is power_getopt.awk module. It provides a very powerful and very easy way to handle options. Everything is in the usage message, you should do anything at all. I think example below is easy.

Example Code

% cat 1.awk
#!/usr/bin/env runawk

#use "power_getopt.awk"

#.begin-str help
# power_getopt - program demonstrating a power of power_getopt.awk module
# usage: power_getopt [OPTIONS]
# OPTIONS:
#    -h|--help                  display this screen
#    -f|--flag                  flag
#       --long-flag             long flag only
#    -s                         short flag only
#    =F|--FLAG           flag with value
#.end-str

BEGIN {
        print "f         --- " getarg("f")
        print "flag      --- " getarg("flag")
        print "long-flag --- " getarg("long-flag")
        print "s         --- " getarg("s")
        print "F         --- " getarg("F", "default1")
        print "FLAG      --- " getarg("FLAG", "default2")

        exit 0
}

./1.awk

% ./1.awk
f         --- 0
flag      --- 0
long-flag --- 0
s         --- 0
F         --- default1
FLAG      --- default2

./1.awk -h

% ./1.awk -h
power_getopt - program demonstrating a power of power_getopt.awk module
usage: power_getopt [OPTIONS]
OPTIONS:
   -h|--help                  display this screen
   -f|--flag                  flag
      --long-flag             long flag only
   -s                         short flag only
   -F|--FLAG           flag with value

./1.awk -f

% ./1.awk -f
f         --- 1
flag      --- 1
long-flag --- 0
s         --- 0
F         --- default1
FLAG      --- default2

./1.awk -F value

% ./1.awk -F value
f         --- 0
flag      --- 0
long-flag --- 0
s         --- 0
F         --- value
FLAG      --- value

./1.awk --FLAG=value

% ./1.awk --FLAG=value
f         --- 0
flag      --- 0
long-flag --- 0
s         --- 0
F         --- value
FLAG      --- value

categories: Sorting,Tools,Nov,2009,EdM

Sorting in Awk

Contents

Download

About

Code

selSort

keySort

genSort

Main Loop

Author

Download

Download from LAWKER.

About

Below is a script I wrote to demonstrate how to use arrays, functions, numerical vs string comparison, etc.

It also provides a framework for people to implement sorting algorithms for comparison. I've implemented a couple and I'm hoping others will contribute more in the same style.

I put very few comments in deliberately because I think the only parts that are hard to understand given some small amount of reading awk manuals are the actual sorting algorithms, and those should be well documented already given a reference except my made-up "Key Sort" but I think that's very easy to understand.

Code

selSort

Selection Sort, O(n^2): http://en.wikipedia.org/wiki/Selection_sort

function selSort(keyArr,outArr,   swap,thisIdx,minIdx,cmpIdx,numElts) {
  for (thisIdx in keyArr) {
      outArr[++numElts] = thisIdx
  }
  for (thisIdx=1; thisIdx<=numElts; thisIdx++) {
      minIdx = thisIdx
      for (cmpIdx=thisIdx + 1; cmpIdx <= numElts; cmpIdx++) {
          if (keyArr[outArr[minIdx]] > keyArr[outArr[cmpIdx]]) {
              minIdx = cmpIdx
          }
      }
      if (thisIdx != minIdx) {
          swap = outArr[thisIdx]
          outArr[thisIdx] = outArr[minIdx]
          outArr[minIdx] = swap
      }
  }
  return numElts+0
}

keySort

Key Sort O(n^2): made up by Ed Morton for simplicity.

function keySort(keyArr,outArr,   \
                occArr,thisIdx,thisKey,cmpIdx,outIdx,numElts) {
  for (thisIdx in keyArr) {
      thisKey = keyArr[thisIdx]
      outIdx=++occArr[thisKey]  # start at 1 plus num occurrences
      for (cmpIdx in keyArr) {
          if (thisKey > keyArr[cmpIdx]) {
              outIdx++
          }
      }
      outArr[outIdx] = thisIdx
      numElts++
  }
  return numElts+0
}

genSort

This code demonstrates the use of arrays, functions, and string vs numeric comparisons in awk. It also provides a framework for people to implement various sorting algorithms in awk such as those listed at http://en.wikipedia.org/wiki/Sorting_algorithm

Traverses the input array, storing it's indices in the output array in sorted order of the input array elements. e.g.

 in:  inArr["foo"]="b"; inArr["bar"]="a"; inArr["xyz"]="b"
      outArr[] is empty

 out: inArr["foo"]="b"; inArr["bar"]="a"; inArr["xyz"]="b"
      outArr[1]="bar"; outArr[2]="foo"; outArr[3]="xyz"

Can sort on specific fields given a field number and field separator.

sortType of "n" means sort by numerical comparison, sort by string comparison otherwise.

function genSort(sortAlg,sortType,inArr,outArr,fldNum,fldSep,           \
              keyArr,thisIdx,thisArr) {
  if (fldNum) {
      if (sortType == "n") {
          for (thisIdx in inArr) {
              split(inArr[thisIdx],thisArr,fldSep)
              keyArr[thisIdx] = thisArr[fldNum]+0
          }
      } else {
          for (thisIdx in inArr) {
              split(inArr[thisIdx],thisArr,fldSep)
              keyArr[thisIdx] = thisArr[fldNum]""
          }
      }
  } else {
      if (sortType == "n") {
          for (thisIdx in inArr) {
              keyArr[thisIdx] = inArr[thisIdx]+0
          }
      } else {
          for (thisIdx in inArr) {
              keyArr[thisIdx] = inArr[thisIdx]""
          }
      }
  }
  if (sortAlg ~ /^sel/) {
      numElts = selSort(keyArr,outArr)
  } else {
      numElts = keySort(keyArr,outArr)
  }
  return numElts
}

Main Loop

 { inArr[NR]=$0 }
<H3> Output</H3>
END {
  numElts = genSort(sortAlg,sortType,inArr,outArr,fldNum,FS)
  for (outIdx=1;outIdx<=numElts;outIdx++) {
      print inArr[outArr[outIdx]]
  }
}

Author

Ed Morton


categories: Tools,Nov,2009,PierreG

levenshtein.awk

Contents

Synopsis

Download

Notes

Code

levdist

Demo code

Unit tests

Author

Synopsis

gawk -f levenshtein.awk --source 'BEGIN {
        print levdist("kitten", "sitting")}' 

(The above code should print "3").

Download

Download from LAWKER.

Notes

The Levenshtein edit distance calculation is useful for comparing text strings for similarity, such as would be done with a spell checker.

Hi_saito (from awk.freeshell.org) has written what looks like a straightforward implementation of the reference algorithm described in the above-linked Wikipedia article. hi_saito's code is linked to rather than included outright because no licensing terms appear on the page.

Gnomon (from awk.freeshell.org) is planning to write a more compact (and hopefully speedier) implementation that will appear here soon. The plan is to compute and retain only those values that are necessary to calculate the edit distance, rather than calculating the entire NxM? matrix. The lazy-evaluation method, which can post substantial speed improvements, probably requires more effort and code complexity than the performance gains would be worth; still, for short strings, the lazy code could perhaps be modeled via recursion by executing from the end of the string rather than the beginning. If experiments are run, the results will also appear here.

Here is the abovementioned streamlined implementation. There were eleven previous versions, all of which were benchmarked across gawk, mawk and busybox awk. The approaches started with a naive implementation and explored table-based, recursive (with no, single and shared memoization) and lazy models. As expected, the lazy version was incredibly fiddly and not pleasant to read or pursue. Findings will appear here later, but for now, here's the code.

Code

levdist

function levdist(str1, str2,    l1, l2, tog, arr, i, j, a, b, c) {
        if (str1 == str2) {
                return 0
        } else if (str1 == "" || str2 == "") {
                return length(str1 str2)
        } else if (substr(str1, 1, 1) == substr(str2, 1, 1)) {
                a = 2
                while (substr(str1, a, 1) == substr(str2, a, 1)) a++
                return levdist(substr(str1, a), substr(str2, a))
        } else if (substr(str1, l1=length(str1), 1) == substr(str2, l2=length(str2), 1)) {
                b = 1
                while (substr(str1, l1-b, 1) == substr(str2, l2-b, 1)) b++
                return levdist(substr(str1, 1, l1-b), substr(str2, 1, l2-b))
        }
        for (i = 0; i <= l2; i++) arr[0, i] = i
        for (i = 1; i <= l1; i++) {
                arr[tog = ! tog, 0] = i
                for (j = 1; j <= l2; j++) {
                        a = arr[! tog, j  ] + 1
                        b = arr[  tog, j-1] + 1
                        c = arr[! tog, j-1] + (substr(str1, i, 1) != substr(str2, j, 1))
                        arr[tog, j] = (((a<=b)&&(a<=c)) ? a : ((b<=a)&&(b<=c)) ? b : c)
                }
        }
        return arr[tog, j-1]
}

Demo code

Run demo.awk using gawk -f levenshtein.awk -f demo.awk.

#demo.awk
BEGIN {OFS = "\t"}
{words[NR] = $0}
END {
   max = 0
   for (i = 2; i in words; i++) {
      for (j = i + 1; j in words; j++) {
         new = levdist(words[i], words[j])
         print words[i], words[j], new
         if (new > max) {
            max = new
            bestpair = (words[i] " - " words[j] ": " new)
         }
      }
   }
   print bestpair
}

Unit tests

Run utests.awk using gawk -f levenshtein.awk -f utests.awk.

#utests.awk
function testlevdist(str1, str2, correctval,    testval) {
    testval = levdist(str1, str2)
    if (testval == correctval) {
        printf "%s:\tCorrect distance between '%s' and '%s'\n", testval, str1, str2
        return 1
    } else {
        print "MISMATCH on words '%s' and '%s' (wanted %s, got %s)\n", str1, str2, correctval, testval
        return 0
    }
}
BEGIN {
    testlevdist("kitten",    "sitting",   3)
    testlevdist("Saturday",  "Sunday",    3)
    testlevdist("acc",       "ac",    1)
    testlevdist("foo",       "four",      2)
    testlevdist("foo",       "foo",       0)
    testlevdist("cow",       "cat",       2)
    testlevdist("cat",       "moocow",    5)
    testlevdist("cat",       "cowmoo",    5)
    testlevdist("sebastian", "sebastien", 1)
    testlevdist("more",      "cowbell",   5)
    testlevdist("freshpack", "freshpak",  1)
    testlevdist("freshpak",  "freshpack", 1)
}

Author

pierre.gaston <a.t> gmail.com


categories: Tools,Nov,2009,Admin

Columnate

Contents

Synopsis

Download

About

Code

Author

Synopsis

#e.g.
gawk -F: -f columnate.awk /etc/passwd

Download

Download from LAWKER.

About

This script columnates the input file, so that columns line up like in the GNU column(1) command. Its output is like that of column -t. First, awk reads the whole file, keeps track of the maximum width of each field, and saves all the lines/records. At the END, the lines are printed in columnated format. If your terminal is not too narrow, you'll get a handsome display of the file.

Code

{   line[NR] = $0    # saves the line
    for (f=1; f<=NF; f++) {
        len = length($f)
        if (len>max[f])
            max[f] = len }  # an array of maximum field widths
}
END {
    for(nr=1; nr<=NR; nr++) {
        nf = split(line[nr], fields)
        for (f=1; f<nf; f++)
            printf "%-*s", max[f]+2, fields[f]
        print fields[f] }     # the last field need not be padded
}

Author

h-67-101-152-180.nycmny83.dynamic.covad.net


categories: Macros,Tools,Mar,2009,Timm

Macros

These pages focus on macro pre-processors (a natural application for Awk).


categories: Tools,Jul,2009,WmM

Finite State Machine Generator

Contents

Download

Download from LAWKER

Usage

In general, specify the state machine in FILE.fsm and define the action functions in FILE_actions.c. Then run fsm.awk compile and link fsm.c fsm_FILE.c and any driver file. Thats it.

Multiple fsms may be built and run in the same application using the function fsm_allocFsm(). Moreover, calls to fsm() may be nested using the same state machine as long as a different context is used. fsm_allocFsm() returns a context number that must be stored and passed to fsm() on each invoction. In the provided sample, the context is stored in myContext in test_driver.c.

Fsm() may be called either by polling for events or from inside an interrupt service routine. If fsm() is called from an interrupt service routine, it must be protected from nested calls using the same context. Interrupting calls using other contexts is permitted.

Note that the function fsminit() is called only once and should not be called for each fsm. If there are special requirements for a given fsm, an appropriate init function should be provided and called for that particular fsm.

Currently, fsm traceEnable is set to true and cannot be disbled (without changing fsm_allocFsm()). An array is maintained within each fsm context wherein each state and event are recorded for each call to fsm().

DESCRIPTION

Fsm.awk is an awk script designed to read a finite state machine (fsm) specification and produce C files which implement that fsm. The file fsm.c, included in the distribution, provides the actual state transition function, and the user provides the state transition "action" functions and any special initialization.

The fsm distribution consists mainly of fsm.awk and fsm.c, although there are a number of header files for declarations - doesn't get much simpler than that.

Typically, the fsm specification is named in the form fsm_name.fsm, but may be named any legal filename. The action functions may be placed in any number of files by any name the user chooses. Each function should return either true or false so that the appropriate next state may be chosen.

The chief benefit of using fsm.awk is easy to read, consistent state machine specifications and reuse of existing, tested code. Multiple tables and multiple users are happily accommodated. It's not hi-tech, but in provides an easy avenue to generalization and consistency where fsms are required.

This distribution represents a rewrite of an earlier version written many years ago - rewritten with newer versions of awk and gcc in mind. Consequently, it has not been tested using other compiler suites. There are no known bugs, but, it IS a rewrite.

Although a good candidate for C++, C was used because C++ was not being used in any of the systems currently using fsm-gen. Maybe a C++ version will be in a subsequent release.

Building the Sample FSM

The distribution provides the following files:

COPYING and      FSF licenses
COPYING.LESSER
filelist         the "packing list"
fsm.awk          the code generator
fsm.c            the context and transition code
fsm.h            definitions for the API
makefile         simple makefile for the test driver code
utils.h          error and utility definitions
test.fsm         a sample fsm specification named "test"
test_actions.c   action functions for the sample

To build the sample,

  1. Download the .zip
  2. extract the files from the zip - unzip contents.zip
  3. build the example fsm - ./fsm.awk test.fsm This step will produce fsm_test.c and fsm_test.h.
  4. compile and link the executable (test) using make
  5. run the sample - the executable produced by the makefile is "test". See the section THE EXAMPLE FSM below for information on using the example.

    When fsm.awk is run, (run via fsm.awk fsmName.fsm) it produces two files, fsm_fsmName.c and fsm_fsmName.h. Fsm_fsmName.c will contain an array of struct fsm_s tagged as fsm_fsmName, eg.,

    struct fsm_s fsm_fsmName [STATES_COUNT][EVENTS_COUNT].
    

    In the fsm distribution, the files fsm_test.c, fsm_test.h and test_actions.c may be built as an executable sample.

    The file fsm.c should be compiled and linked with the final executable as it contains the C code necessary to read the generated tables and update context. <> P Building the example should compile error free with the exception of a warning about using "gets()" in the sample driver. Hey - it's just a driver for a test.

Example FSM Specification File

In its purist form, a fsm specifies state, event, action, new state. For example, a rudimentary ftp server might be specified as follows:

# current     event     action          next 
# state                                  state
# --------------+----------+---------------+------------
IDLE            CONN_REQ   makeConnection  CONNECTED
CONNECTED       GET_REQ    sendBuffer      SENDING
SENDING         FILE_SENT  closeFile       IDLE

It is useful on occasion to make the next state depend on the success or failure of the action function. Here, "ok" and "fail" mean "true" and "false", respectively. For example, as each buffer is sent it would be useful to specify a different state if sendFile() returns fail (indicating EOF).

# current     event     action     next         next 
# state                             state        state
#                                    ok           fail
# --------------+----------+----------+---------+-----
CONNECTED       GET_REQ    sendBuffer SENDING   IDLE

State, event, action, and new state may be specified according to the same rules as C variables/functions. In the above table, the words CONNECTED, GET_REQ, SENDING, and IDLE are used to generate #defines, and the action sendBuffer is the name of a user supplied function.

The file test.fsm illustrates several idioms:

  • an event may be a single event or a comma separated list of events that all result in the same action and same next state. For example, the specification
    # current     event     action     next         next 
    # state                             state        state
    #                                    ok           fail
    # --------------+----------+----------+---------+-----
    S1              EVENT_1    action_1   S2        S3
    

    means, when receiving event EVENT_1 or EVENT_2 in state S1,

    execute action action_1 and go to state S2 if the return value of action_1() is true; go to state S3 if the return value of action_1 is false.
  • note that all events must be specified for each state. See the example specification file, test.fsm.
  • an action specified as "-" means, "do nothing". fsm.awk will generate a NULL in the state transition tables which will be treated as "do nothing". When so specified, the next state will always be the next-state-ok state.
  • an action specified as fsm_invalid_event will call the function fsm_invalid_event(void) which always returns false. This function may be edited to suit the situation at hand. When fsm_invalid_event is specified, the next state (both) may be left unspecified - fsm.awk will generate next state information as being the current state (ie., no change in the current state).
  • a fail next state specified as "-" means the fail next state is the same as the success next state. That is, in the specification
    # current     event     action     next         next 
    # state                             state        state
    #                                    ok           fail
    # --------------+----------+----------+---------+-----
    S1              EVENT_1    action_1   S2        -
    
    means, when receiving event EVENT_1 in state S1, execute action action_1 and go to state S2 irrespective of the return value of action_1().

The Example FSM

Included in the distribution are test.fsm and test_actions.c which implement a very simple state machine called "test". After the executable "test" is produced (via make), it may be used to show the behavior of the fsm.

The example fsm was built and tested with gcc version 4.0.2 and awk version 3.1.4.

Example Output from the Sample

On running "test", first the line "testing fsm test" is printed, then a line indicating the initial state. It then asks for the next event. All events in the example are the lowercase letters 'a' thru 'd', entered from the keyboard. A special event 'z' will cause the trace to be dumped. Entering 'q' will cause test to exit. Note that to keep the example simple, other than special events 'z' and 'q', there is no checking of input for being outside the known set of events. A sample session might look like this:

$>
$> ./test
testing fsm test

starting in state 1
next event: a
got a (0)  ----> called fsm_s2_ab ----> ,went to state 0
next event: d
got d (3)  ----> invalid eventwent to state 0
next event: b
got b (1)  ----> called fsm_s1_b ----> ,went to state 1
next event: c
got c (2)  ----> went to state 1
next event: z
trace index is 4
event      state
0          0
3          0
1          1
2          1
0          0 <-- next/oldest
0          0
0          0
0          0

next event: q
bye
$>

Copyright

Copyright 2008 Wm Miller

This file is part of fsm-gen, and is distributed under the terms of the GNU Lesser General Public License .

Copies of the GNU General Public License and the GNU Lesser General Public License are included with this distrubution in the files COPYING and COPYING.LESSER, respectively.

Fsm-gen is free software: you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

Fsm-gen is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details.

You should have received a copy of the GNU Lesser General Public License along with fsm-gen. If not, see http://www.gnu.org/licenses.

Author

Wm Miller. The author may be contacted at wmmsf at users.sourceforge.net.

categories: Sigs,Tools,Apr,2009,Anon

Hiding Email Address

Contents

Synopsis

Download

Description

Code

Author

Synopsis

gawk -f cryptosig.awk tim@menzies.us

Download

Download from LAWKER.

Description

Generates a one-line Awk program that can print your email, from a seemingly jumbled string. This program can then become your email sig and only the Awk cognoscente can generate a reply.

Example

% gawk -f cryptosig.awk tim@menzies.us
BEGIN{a="7059631863556476595569007169";while(a){printf("%c",46+substr(a,1,2));a=substr(a,3)}}

This can be tested as follows:

echo 'BEGIN{a="7059631863556476595569007169";while(a){printf("%c",46+substr(a,1,2));a=substr(a,3)}}' | gawk -f -

or

gawk -f crypotsig.awk tim@menzies.us | gawk -f -

both of which should print "tim@menzies.us".

Code

BEGIN {
  for (i=0; i<=255; i++) {           # build table of char=value pairs
    ord_arr[sprintf("%c",i)] = i     # character = ordinal value
  }
  for (i=1; i<=ARGC-1; i++) {
    str = ""
    for (j=1; j<=length(ARGV[i]); j++) {
      str = sprintf("%s%02d",str,ord_arr[substr(ARGV[i],j,1)]-46)
    }
    printf("BEGIN{a=\"%s\";while(a){printf(\"%%c\",46+substr(a,1,2));a=substr(a,3)}}\n",str)
  }
  exit(0)
}

Author

BEGIN{a="535170696159626207061118755158656500536563";
      while(a){
          printf("%c",46+substr(a,1,2));a=substr(a,3)};
      print("")
}

categories: Sigs,Tools,Apr,2009,Timm

Random Signatures

Contents

Synopsis

chmod +x sigs; ./sigs

Download

Download from LAWKER.

Description

Generates random signtures. Signatures and generation code included in same file so installation is just a matter of calling one file.

Most of the file is a large "here" document. Paragraph 1 of that document is always added to the signatures, followed one of the folowing paragraphs, selected at radonom.

To add to the signtures, include them in the here document, with one preceeding blank line.

Code

Pick1

pick1() {
    gawk 'BEGIN { srand(); RS=""    }
          NR==1 { print $0 "\n"     }
          NR>1  { Recs[rand()] = $0 }
          END   { for ( R in Recs ) {print Recs[R]; exit}}
        ' $1
}

The Signatures

cat << SoMEI_mpOSSIblE_sYMBOl | pick1
tim.menzies {
  title:   dr (Ph.D.) and associate professor;
  align:   csee, west virginia university;
  cell:   esb 841A; 
  url:   http://menzies.us;
  fyi:   unless marked "URGENT", i usually won't get 2 your email b4 5pm; 
}

Doing a job RIGHT the first time gets the job done. Doing the job WRONG
fourteen times gives you job security.

Rome did not create a great empire by having meetings, they did it by
killing all those who opposed them.

INDECISION is the key to FLEXIBILITY.

"When a subject becomes totally obsolete we make it a required
course."  Peter Drucker

I saw two shooting stars last night but they were only satellites .
Its wrong to wish on space hardware. I wish, I wish, I wish you cared.
-- Billy Bragg

Then, in 1995, came the most amazing event in the
history of programming languages: the introduction
of Java.  -- Programming Languages: Principles and Practice

Suburbia is where the developer bulldozes out the trees, then names
the streets after them. --Bill Vaughan

Instant gratification takes too long.
-- Carrie Fisher

Complexity is easy. Simplicity is hard.
--Unknown

Author

Tim Menzies


categories: Stats,Tools,May,2009,TimS

Correlate.awk

Contents

Synopsis

Notes

Example

Code

Author

Synopsis

cat data | gawk -f correlate.awk 

Notes

This script calculates the correlation between two columns of numbers.

For more Sherwood scripts, see Some useful Awk scripts.

Example

cat <<EOF | gawk -f correlate.awk
1	1.417600305
2	2.265271781
3	3.241368347
4	4.367711955
5	5.390612315
6	6.296879718
7	7.43218197
8	8.117831008
9	9.338019481
10	10.01823657
EOF

This outputs

NR=10
ssx=82.5
ssy=79.0584
ssxy=80.6985
r=0.999227

Code

{   xy+=($1*$2); 
	x+=$1; 
	y+=$2; 
	x2+=($1*$1); 
	y2+=($2*$2);
} 
END { 
	print "NR=" NR; 
	ssx=x2-((x*x)/NR); 
	print "ssx=" ssx; 
	ssy=y2-((y*y)/NR); 
	print "ssy=" ssy; 
	ssxy = xy - ((x*y)/NR); 
	print "ssxy=" ssxy; 
	r=ssxy/sqrt(ssx*ssy); 
	print "r=" r; 
}

Author

Tim Sherwood


categories: ,Music,Tools,June,2009,Admin

Music and Awk

These pages focus on muic players and music analysis tools in Awk.


categories: Project,Tools,Mar,2009,Admin

Project Tools

These pages focus on tools for larger Gawk programs; e.g. ways to load multiple files or auto-generate documentation straight from the source code.


categories: Awk100,,Music,Tools,June,2009,StephenJ

Plaiter: a music player

Synopsis

plaiter [options] [file, playlist, directory or stream ...]

Download

Download from LAWKER or, for the latest version, from SourceForge

Description

Plaiter (pronounced "player") is a command line front end to command line music players. It uses shell scripting to try to create the command line music player that Plait would have used if it already existed. It complements Plait but is also quite useful on its own, especially if you already use mpg123 or similar programs and find yourself wanting more features.

What does Plaiter do that (say) mpg123 can't already? It queues tracks, first of all. Secondly, it understands commands like play, plause, stop, next and prev. Finally, unlike most of the command line music players out there, Plaiter can handle a play list with more than one type of audio file, selecting the proper helper app to handle each type of file you throw at it.

Plaiter will automatically configure itself to use ogg123, mpg123, and/or mpg321, if they are installed on your system. If you have a helper application that plays other types of audio, Plaiter can be configured to use it as well.

Like many of us, Plaiter is part daemon and part controller. The controller builds a play list from the files you provide on the command line and forwards commands to the daemon. The daemon reads commands and executes them by running helper applications.

Options

--daemon,-d
daemon mode
--queue,-q
add tracks to queue
--enqueue
add tracks to queue
--random
random shuffle
--play
play
--pause
toggle pause mode
--stop,-s
stop
--latch [on|off]
toggle or set stop after current track
--next,-n [n]
skip forward [n tracks]
--prev [n]
skip backward [n tracks]
--search
search in playlist
--rsearch
reverse search in playlist
--reset,-r
play track 1
--loop [on|off]
toggle or set loop mode
--quit
quit daemon
--status
show status
--list,-l
show playlist
--help
show help
--version
show version
-v
be verbose

Copyright

Copyright (C) 2005, 2006 by Stephen Jungels. Released under the GPL.

Author

Written by Stephen Jungels (sjungels@gmail.com)


categories: ,Music,Tools,June,2009,DavidH

Humdrum

Download

http://www.music-cog.ohio-state.edu/HumdrumDownload/downloading.html.

Description

The Humdrum Toolkit provides a set of free software tools intended to assist in music research. The toolkit is suitable for use in a wide variety of computer-based musical tasks.

The Humdrum web site contains a comprehensive collection of over 200 web pages providing both detailed and summary information concerning all aspects of the Humdrum Toolkit.

About 15% of the code is written in C, another 15% in kornshell, and about 2% using the LEX lexical parser and YACC compiler-compiler. The bulk of the code is written in AWK.

Questions that can be answered in Humdrum are:

  • Determine the rhyme scheme for a vocal text.
  • Identify any French sixth chords.
  • Locate instances of the pitch sequence D-S-C-H in Shostakovich's music.
  • Are German drinking songs more likely to be in triple meter.
  • Determine whether Haydn tends to avoid V-IV progressions.
  • Locate any doubled seventh scale degrees.

(For a longer list of such questions, see the Humdrum sample problems page.

Author

David Huron

For more information

Go to http://www.music-cog.ohio-state.edu/Humdrum/.


categories: TenLiners,Tools,June,2009,Timm

shuffle.awk

Contents

Synopsis

To rearrange the items in the input list:

 nshuffle(Array)

To rearrange the items in a copy of the input list:

 shuffle(Array,Copy)

The above calls assumes that array item zero stores the length of the array. If this is not the case, use:

 shuffles(Array,Copy)

Download

Download from LAWKER.

Description

Suppose we want to shuffle items an array into a random order. This shuffle sort do so in linear time and memory.

The algorithm comes from the dawn of computer time but I first heard of it from Bart Massey (at Portland State). Thank Bart for the clarity of the explanation and blame me for any silliness in the implementation.

The Slow Way

A simple way to shuffle an input array of elements is to:

  • Allocate an output array of the same size.
  • Copy items selected at random from the input to the output array.
  • Compact the input array by sliding the first part of the array down to fill the hole left by the removed item.

This algorithm is clearly correct. However, the algorithm requires time quadratic in the size of the list, and 2x space.

The Better Way

We can easily reduce the time complexity to O(N). The only thing done with the input array is to select random elements from it, the order of the elements in it is irrelevant. Therefore, instead of closing the hole left by a removed element by shifting elements, we'll close it by moving the first remaining element of the input array to fill the gap.

Note an important invariant of the algorithm:

   the number of elements left in the input array 
 + the number of elements in the output array 
 ------------------------------------------------
 = the number of elements initially passed in.  

This means that once an element is removed from the input array and the hole filled, there is a fresh hole created right at the beginning of the input array. Let us put the newly removed element in that hole. Now we can dispense with the output array altogether, and just return the input array. Now the space complexity is just x+1.

Code

This code assumes that the array "a" stores its size at "a[0]".

function nshuffle(a,  i,j,n,tmp) {
  n=a[0]; # a has items at 1...n
  for(i=1;i<=n;i++) {
    j=i+round(rand()*(n-i));
    tmp=a[j];
    a[j]=a[i];
    a[i]=tmp;
  };
  return n;
}
function round(x) { return int(x + 0.5) }

nshuffle is fast, but rearranges the order of items in the original list. shuffle generates a new copy of the list with the items in a random order.

function shuffle(a,b) {
  for(i in a) b[i]=a[i];
  nshuffle(b);
}

nshuffle also assumes that the list is stores the list size at position zero. If this is not the case, use shuffles.

function shuffles(a,b,   c,n) {
  for(i in a) {n++; c[i]=a[i]};
  c[0]=n;
  shuffle(c,b);
}

Correctness proof

By number of loop iterations

Base case:
When i = 0 the 0 array elements in a below i form a shuffled list of 0 elements. All remaining elements are candidates for append.
Inductive case:
Assume that i = k and that the sequence of elements in a below k are a random subsequence of the input values of length k. Now every possible remaining candidate is equally likely to occur at position k in this iteration. Thus at the end of the iteration i = k + 1 and the sequence of elements in a below k + 1 are a random subsequence of the input values of length k + 1.

Examples

Random orders

One way to use the above is to run down a list in a random order. For example:

BEGIN {
  if (ShuffleDemo) {
  		if (Seed) { srand(Seed) } else { srand() };
  		s2i(ShuffleDemo,L1," ");
  		shuffles(L1,L2);
  		while(Item =pop(L2)) print Item;
  }
}
function s2i(str,a,sep,   n,i,tmp) {
  n=split(str,tmp,sep);
  for(i=1;i<=n;i++) a[i]=tmp[i];
  return n;
}
function pop(a,   x,i) {
  i=a[0]--;  
  if (!i) {return ""} else {x=a[i]; delete a[i]; return x}
} 

The above can be run using

 gawk -f shuffle.awk  -v ShuffleDemo="aa bb cc dd"

If you run this twice, you'll see two different orderings. Here's one:

 cc
 aa
 dd
 bb

And here's another:

 dd
 bb
 cc
 aa

Fast sampling

If you are generating the above lists very quickly, then be aware that srand() initializes its random number generator using CPU time in seconds. So, if you are calling the above command line many times per second, you can get repeated outputs.

The fix is to supply a seed from the Bash $RANDOM variable:

 gawk -f shuffle.awk -v ShuffleDemo="aa bb cc dd" -v Seed=$RANDOM

much faster than once a second, the above call will generate (far) fewer repeats.

Repeats

If you want to repeat some prior run (say, during debugging), set the Seed variable on the command line using (e.g.)

 gawk -f shuffle.awk -v ShuffleDemo="aa bb cc dd" -v Seed=23

This will always print out the same ordering.

Author

Tim Menzies


categories: Runawk,Project,Tools,Mar,2009,AlexC

runawk - wrapper for AWK interpreter

(Note: see recent update.)

Contents

Download from...

Download from LAWKER or a tar file or from SourceForge.

NAME

runawk - wrapper for AWK interpreter

SYNOPSIS

runawk [options] program_file

runawk -e program

DESCRIPTION

After years of using AWK for programming I've found that despite of its simplicity and limitations AWK is good enough for scripting a wide range of different tasks. AWK is not as poweful as their bigger counterparts like Perl, Ruby, TCL and others but it has their own advantages like compactness, simplicity and availability on almost all UNIX-like systems. I personally also like its data-driven nature and token orientation, very useful technique for simple text processing utilities.

But! Unfortunately awk interpreters lacks some important features and sometimes work not as good as it whould be.

Problems I see (some of them, of course)

  1. AWK lacks support for modules. Even if I create small programs, I often want to use the functions created earlier and already used in other scripts. That is, it whould great to orginise functions into so called libraries (modules).

  2. In order to pass arguments to #!/usr/bin/awk -f script (not to awk interpreter), it is necessary to prepand a list of arguments with -- (two minus signes). In my view, this looks badly.

    Example:

    awk_program:

        #!/usr/bin/awk -f
    
        BEGIN {
           for (i=1; i < ARGC; ++i){
              printf "ARGV [%d]=%s\n", i, ARGV [i]
           }
        }

    Shell session:

        % awk_program --opt1 --opt2
        /usr/bin/awk: unknown option --opt1 ignored
        /usr/bin/awk: unknown option --opt2 ignored
    
        % awk_program -- --opt1 --opt2
        ARGV [1]=--opt1
        ARGV [2]=--opt2
        %

    In my opinion awk_program script should work like this

        % awk_program --opt1 --opt2
        ARGV [1]=--opt1
        ARGV [2]=--opt2
        %

    It is possible using runawk.

  3. When #!/usr/bin/awk -f script handles arguments (options) and wants to read from stdin, it is necessary to add /dev/stdin (or `-') as a last argument explicitly.

    Example:

    awk_program:

        #!/usr/bin/awk -f
    
        BEGIN {
           if (ARGV [1] == "--flag"){
              flag = 1
              ARGV [1] = "" # to not read file named "--flag"
           }
        }
        {
           print "flag=" flag " $0=" $0
        }

    Shell session:

        % echo test | awk_program -- --flag
        % echo test | awk_program -- --flag /dev/stdin
        flag=1 $0=test
        %

    Ideally awk_program should work like this

        % echo test | awk_program --flag
        flag=1 $0=test
        %

runawk was created to solve all these problems

OPTIONS

-h|--help

Display help information.

-V|--version

Display version information.

-d|--debug

Turn on a debugging mode in which runawk prints argument list with which real awk interpreter will be run.

-i|--with-stdin

Always add stdin file name to a list of awk arguments

-I|--without-stdin

Do not add stdin file name to a list of awk arguments

-e|--execute program

Specify program. If -e is not specified program is read from program_file.

DETAILS/INTERNALS

Standalone script

Under UNIX-like OS-es you can use runawk by beginning your script with

   #!/usr/local/bin/runawk

line or something like this instead of

   #!/usr/bin/awk -f

or similar.

AWK modules

In order to activate modules you should add them into awk script like this

  #use "module1.awk"
  #use "module2.awk"

that is the line that specifies module name is treated as a comment line by normal AWK interpreter but is processed by runawk especially.

Note that #use should begin with column 0, no spaces are allowed before it and no spaces are allowed between # and use.

Also note that AWK modules can also "use" another modules and so forth. All them are collected in a depth-first order and each one is added to the list of awk interpreter arguments prepanded with -f option. That is #use directive is *NOT* similar to #include in C programming language, runawk's module code is not inserted into the place of #use. Runawk's modules are closer to Perl's "use" command. In case some module is mentioned more than once, only one -f will be added for it, i.e duplications are removed automatically.

Position of #use directive in a source file does matter, i.e. the earlier module is mentioned, the earlier -f will be generated for it.

Example:

  file prog:
     #!/usr/local/bin/runawk

     #use "A.awk"
     #use "B.awk"
     #use "E.awk"

     PROG code
     ...
  file B.awk:
     #use "A.awk"
     #use "C.awk"
     B code
     ...
  file C.awk:
     #use "A.awk"
     #use "D.awk"

     C code
     ...
A.awk and D.awk don't contain #use directive.

If you run

  runawk prog file1 file2

or

  /path/to/prog file1 file2

the following command

  awk -f A.awk -f D.awk -f C.awk -f B.awk -f E.awk -f prog -- file1 file2

will actually run.

You can check this by running

  runawk -d prog file1 file2

Module search strategy

Modules are first searched in a directory where main program (or module in which #use directive is specified) is placed. If it is not found there, then AWKPATH environment variable is checked. AWKPATH keeps a colon separated list of search directories. Finally, module is searched in system runawk modules directory, by default PREFIX/share/runawk but this can be changed at build time.

An absolute path of the module can also be specified.

AWK interpreter and its arguments

In order to pass arguments to AWK script correctly, runawk treats their arguments beginning with `-' sign (minus) especially. The following command

  runawk prog2 -x -f=file -o=output file1 file2

or

  /path/to/prog2 -x -f=file -o=output file1 file2

will actually run

  awk -f prog2 -- -x -f=file -o=output file1 file2

therefore -s, -f, -o options will be passed to ARGV/ARGC awk's variables together with file1 and file2. If all arguments begin with `-' (minus), runawk will add stdin filename to the end of argument list, (unless -I option is specified) i.e. running

  runawk prog3 --value=value

or

  /path/to/prog3 --value=value

will actually run the following

  awk -f prog3 -- --value=value /dev/stdin

Program as an argument

Like some other interpreters runawk can obtain the script from a command line like this

 /path/to/runawk -e '
 #use "alt_assert.awk"

 {
   assert($1 >= 0 && $1 <= 10, "Bad value: " $1)

   # your code below
   ...
 }'

Selecting a preferred AWK interpreter

For some reason you may prefer one AWK interpreter or another with a help of #interp command like this

  file prog:
     #!/usr/local/bin/runawk

     #use "A.awk"
     #use "B.awk"

     #interp "/usr/pkg/bin/nbawk"

     # your code here
     ...

The reason may be efficiency for a particular task, useful but not standard extensions or enything else.

Note that #interp directive should also begin with column 0, no spaces are allowed before it and between # and interp.

Setting environment

In some cases you may want to run AWK interpreter with a specific environment. For example, your script may be oriented to process ASCII text only. In this case you can run AWK with LC_CTYPE=C environment and use regexp ranges.

runawk provides #env directive for this. Strings inside double quotes is passed to putenv(3) libc function.

Example:

  file prog:
     #!/usr/local/bin/runawk

     #env "LC_ALL=C"

     $1 ~ /^[A-Z]+$/ { # A-Z is valid if LC_CTYPE=C
         print $1
     }

EXIT STATUS

If AWK interpreter exits normally, runawk exits with its exit status. If AWK interpreter was killed by signal, runawk exits with exit status 128+signal.

ENVIRONMENT

AWKPATH

Colon separated list of directories where awk modules are searched.

RUNAWK_AWKPROG

Sets the path to the AWK interpreter, used by default, i.e. this variable overrides the compile-time default. Note that #interp directive overrides this.

AUTHOR/LICENSE

Copyright (c) 2007-2008 Aleksey Cheusov <vle@gmx.net>

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

BUGS/FEEDBACK

Please send any comments, questions, bug reports etc. to me by e-mail or (even better) register them at sourceforge project home. Feature requests are also welcomed.


categories: Awk100,Macros,Tools,Mar,2009,JonB

m1 : A Micro Macro Processor

Contents

Synopsis

awk -f m1.awk [file...]

Download

Download from LAWKER.

Description

M1 is a simple macro language that supports the essential operations of defining strings and replacing strings in text by their definitions. It also provides facilities for file inclusion and for conditional expan- sion of text. It is not designed for any particular application, so it is mildly useful across several applications, including document preparation and programming. This paper describes the evolution of the program; the final version is implemented in about 110 lines of Awk.

M1 copies its input file(s) to its output unchanged except as modified by certain "macro expressions." The following lines define macros for subsequent processing:

 @comment Any text
 @@                     same as @comment
 @define name value
 @default name value    set if name undefined
 @include filename
 @if varname            include subsequent text if varname != 0
 @unless varname        include subsequent text if varname == 0
 @fi                    terminate @if or @unless
 @ignore DELIM          ignore input until line that begins with DELIM
 @stderr stuff          send diagnostics to standard error

A definition may extend across many lines by ending each line with a backslash, thus quoting the following newline.

Any occurrence of @name@ in the input is replaced in the output by the corresponding value.

@name at beginning of line is treated the same as @name@.

Applications

Form Letters

We'll start with a toy example that illustrates some simple uses of m1. Here's a form letter that I've often been tempted to use:

@default MYNAME Jon Bentley 
@default TASK respond to your special offer 
@default EXCUSE the dog ate my homework 
Dear @NAME@: 
    Although I would dearly love to @TASK@, 
I am afraid that I am unable to do so because @EXCUSE@. 
I am sure that you have been in this situation 
many times yourself. 
            Sincerely, 
            @MYNAME@ 

If that file is namedsayno.mac, it might be invoked with this text:

@define NAME Mr. Smith 
@define TASK subscribe to your magazine 
@define EXCUSE I suddenly forgot how to read 

Recall that a @default takes effect only if its variable was not previously @defined.

Troff Pre-Processing

I've found m1 to be a handy Troff preprocessor. Many of my text files (including this one) start with m1 definitions like:

@define ArrayFig @StructureSec@.2 
@define HashTabFig @StructureSec@.3 
@define TreeFig @StructureSec@.4 
@define ProblemSize 100 

Even a simple form of arithmetic would be useful in numeric sequences of definitions. The longer m1 variables get around Troff's dreadful two-character limit on string names; these variables are also avail- able to Troff preprocessors like Pic and Eqn. Various forms of the @define, @if, and @include facilities are present in some of the Troff-family languages (Pic and Troff) but not others (Tbl); m1 provides a consistent mechanism.

I include figures in documents with lines like this:

@define FIGNUM @FIGMFMOVIE@ 
@define FIGTITLE The Multiple Fragment heuristic. 
@FIGSTART@ 
<PS> <@THISDIR@/mfmovie.pic</PS>
@FIGEND@ 

The two @defines are a hack to supply the two parameters of number and title to the figure. The figure might be set off by horizontal lines or enclosed in a box, the number and title might be printed at the top or the bottom, and the figures might be graphs, pictures, or animations of algorithms. All figures, though, are presented in the consistent format defined by FIGSTART and FIGEND.

Awk Library Management

I have also used m1 as a preprocessor for Awk programs. The @include statement allows one to build simple libraries of Awk functions (though some- but not all- Awk implementations provide this facility by allowing multiple program files). File inclusion was used in an earlier version of this paper to include individual functions in the text and then wrap them all together into the completem1 program. The conditional statements allow one to customize a program with macros rather than run-time if statements, which can reduce both run time and compile time.

Controlling Experiments

The most interesting application for which I've used this macro language is unfortunately too complicated to describe in detail. The job for which I wrote the original version of m1 was to control a set of experiments. The experiments were described in a language with a lexical structure that forced me to make substitutions inside text strings; that was the original reason that substitutions are bracketed by at-signs. The experiments are currently controlled by text files that contain descriptions in the experiment language, data extraction programs written in Awk, and graphical displays of data written in Grap; all the programs are tailored bym1commands.

Most experiments are driven by short files that set a few keys parameters and then@includea large file with many @defaults. Separate files describe the fields of shared databases:

 @define N ($1) 
 @define NODES ($2) 
 @define CPU ($3) 
 ... 

These files are @included in both the experiment files and in Troff files that display data from the databases. I had tried to conduct a similar set of experiments before I built m1, and got mired in muck. The few hours I spent building the tool were paid back handsomely in the first days I used it.

The Substitution Function

M1 uses as fast substitution function. The idea is to process the string from left to right, searching for the first substitution to be made. We then make the substitution, and rescan the string starting at the fresh text. We implement this idea by keeping two strings: the text processed so far is in L (for Left), and unprocessed text is in R (for Right). Here is the pseudocode for dosubs:

L = Empty 
R = Input String 
while R contains an "@" sign do 
	let R = A @ B; set L = L A and R = B 
	if R contains no "@" then 
		L = L "@" 
		break 
	let R = A @ B; set M = A and R = B 
	if M is in SymTab then 
		R = SymTab[M] R 
	else 
		L = L "@" M 
		R = "@" R 
	return L R 

Possible Extensions

There are many ways in which them1program could be extended. Here are some of the biggest temptations to "creeping creaturism":

  • A long definition with a trail of backslashes might be more graciously expressed by a @longdefinestatement terminated by a@longend.
  • An @undefinestatement would remove a definition from the symbol table.
  • I've been tempted to add parameters to macros, but so far I have gotten around the problem by using an idiom described in the next section.
  • It would be easy to add stack-based arithmetic and strings to the language by adding@pushand @popcommands that read and write variables.
  • As soon as you try to write interesting macros, you need to have mechanisms for quoting strings (to postpone evaluation) and for forcing immediate evaluation.

Code

The following code is short (around 100 lines), which is significantly shorter than other macro processors; see, for instance, Chapter 8 of Kernighan and Plauger [1981]. The program uses several techniques that can be applied in many Awk programs.

  • Symbol tables are easy to implement with Awk┐s associative arrays.
  • The program makes extensive use of Awk's string-handling facilities: regular expressions, string concatenation, gsub, index, andsubstr.
  • Awk's file handling makes the dofile procedure straightforward.
  • The readline function and pushback mechanism associated with buffer are of general utility.

error

function error(s) {
	print "m1 error: " s | "cat 1>&2"; exit 1
}

dofile

function dofile(fname,  savefile, savebuffer, newstring) {
	if (fname in activefiles)
		error("recursively reading file: " fname)
	activefiles[fname] = 1
	savefile = file; file = fname
	savebuffer = buffer; buffer = ""
	while (readline() != EOF) {
		if (index($0, "@") == 0) {
			print $0
		} else if (/^@define[ \t]/) {
			dodef()
		} else if (/^@default[ \t]/) {
			if (!($2 in symtab))
				dodef()
		} else if (/^@include[ \t]/) {
			if (NF != 2) error("bad include line")
			dofile(dosubs($2))
		} else if (/^@if[ \t]/) {
			if (NF != 2) error("bad if line")
			if (!($2 in symtab) || symtab[$2] == 0)
				gobble()
		} else if (/^@unless[ \t]/) {
			if (NF != 2) error("bad unless line")
			if (($2 in symtab) && symtab[$2] != 0)
				gobble()
		} else if (/^@fi([ \t]?|$)/) { # Could do error checking here
		} else if (/^@stderr[ \t]?/) {
			print substr($0, 9) | "cat 1>&2"
		} else if (/^@(comment|@)[ \t]?/) {
		} else if (/^@ignore[ \t]/) { # Dump input until $2
			delim = $2
			l = length(delim)
			while (readline() != EOF)
				if (substr($0, 1, l) == delim)
					break
		} else {
			newstring = dosubs($0)
			if ($0 == newstring || index(newstring, "@") == 0)
				print newstring
			else
				buffer = newstring "\n" buffer
		}
	}
	close(fname)
	delete activefiles[fname]
	file = savefile
	buffer = savebuffer
}

readline

Put next input line into global string "buffer". Return "EOF" or "" (null string).

function readline(  i, status) {
	status = ""
	if (buffer != "") {
		i = index(buffer, "\n")
		$0 = substr(buffer, 1, i-1)
		buffer = substr(buffer, i+1)
	} else {
		# Hume: special case for non v10: if (file == "/dev/stdin")
		if (getline <file <= 0)
			status = EOF
	}
	# Hack: allow @Mname at start of line w/o closing @
	if ($0 ~ /^@[A-Z][a-zA-Z0-9]*[ \t]*$/)
		sub(/[ \t]*$/, "@")
	return status
}

gobble

function gobble(  ifdepth) {
	ifdepth = 1
	while (readline() != EOF) {
		if (/^@(if|unless)[ \t]/)
			ifdepth++
		if (/^@fi[ \t]?/ && --ifdepth <= 0)
			break
	}
}

dosubs

function dosubs(s,  l, r, i, m) {
	if (index(s, "@") == 0)
		return s
	l = ""	# Left of current pos; ready for output
	r = s	# Right of current; unexamined at this time
	while ((i = index(r, "@")) != 0) {
		l = l substr(r, 1, i-1)
		r = substr(r, i+1)	# Currently scanning @
		i = index(r, "@")
		if (i == 0) {
			l = l "@"
			break
		}
		m = substr(r, 1, i-1)
		r = substr(r, i+1)
		if (m in symtab) {
			r = symtab[m] r
		} else {
			l = l "@" m
			r = "@" r
		}
	}
	return l r
}

docodef

function dodef(fname,  str, x) {
	name = $2
	sub(/^[ \t]*[^ \t]+[ \t]+[^ \t]+[ \t]*/, "")  # OLD BUG: last * was +
	str = $0
	while (str ~ /\\$/) {
		if (readline() == EOF)
			error("EOF inside definition")
		# OLD BUG: sub(/\\$/, "\n" $0, str)
		x = $0
		sub(/^[ \t]+/, "", x)
		str = substr(str, 1, length(str)-1) "\n" x
	}
	symtab[name] = str
}

BEGIN

BEGIN {	
    EOF = "EOF"
	if (ARGC == 1)
		dofile("/dev/stdin")
	else if (ARGC >= 2) {
		for (i = 1; i < ARGC; i++)
			dofile(ARGV[i])
	} else
		error("usage: m1 [fname...]")
}

Bugs

M1 is three steps lower than m4. You'll probably miss something you have learned to expect.

History

M1 was documented in the 1997 sedawk book by Dale Dougherty & Arnold Robbins (ISBN 1-56592-225-5) but may have been written earlier.

This page was adapted from 131.191.66.141:8181/UNIX_BS/sedawk/examples/ch13/m1.pdf (download from LAWKER).

Author

Jon L. Bentley.


categories: Macros,Tools,Mar,2009,WillW

m5 - macro processor

Download

Download from LAWKER.

Synopsis

m5 [ -Dname ] [ -Dname=def ] [-c] [ -dp char ] 
   [ -o file ] [-sp char ] [ file ... ]
 
[g|n]awk -f m5.awk X [ -Dname ] [ -Dname=def ]  [-c]  [ -dp char ] 
                     [ -o file ] [ -sp char ] [ file ... ]

Description

M5 is a Bourne shell script for invoking m5.awk, which actu- ally performs the macro processing. m5, unlike many macroprocessors, does not directly interpret its input. Instead it uses a two-pass approach in which the first pass translates the input to an awk program, and the second pass executes the awk program to produce the final output. Details of usage are provided below.

This two pass sytem means that macros can contain awk commands, to be executed on the second pass. This greatly extends the expressability of the m5 macro system.

As noted in the synopsis above, its invocation may require specification of awk, gawk, or nawk, depending on the ver- sion of awk available on your system. This choice is further complicated on some systems, e.g. Sun, which have both awk (original awk) and nawk (new awk). Other systems appear to have new awk, but have named it just awk. New awk should be used, regardless of what it has been named. The macro processor translator will not work using original awk because the former, for example, uses the built-in function match().

Options

The following options are supported:

-Dname
Following the cpp convention, define name as 1 (one). This is the same as if a -Dname=1 appeared as an option or #name=1 appeared as an input line. Names specified using -D are awk variables defined just before main is invoked.
-Dname=def
Define name as "def". This is the same as if #name="def" appeared as an input line. Names specified using -D are awk variables defined just before main is invoked.
X
Yes, that really is a capital "X". The ver- sion of nawk on Sun Solaris 2.5.1 apparently does its own argument processing before pass- ing the arguments on to the awk program. In this case, X (and all succeeding options) are believed by nawk to be file names and are passed on to the macro processor translator (m5.awk) for its own argument processing). Without the X, Sun nawk attempts to process succeeding options (e.g., -Dname) as valid nawk arguments or files, thus causing an error. This may not be a problem for all awks.
-c
Compile only. The output program is still produced, but the final output is not.
-dp char
The directive prefix character (default is #).
-o file
The output program file (default is a.awk).
-sp char
The substitution prefix character (default is $).

Usage

Overview

The program that performs the first pass noted above is called the m5 translator and is named m5.awk. The input to the translator may be either standard input or one or more files listed on the command line. An input line with the directive prefix character (# by default) in column 1 is treated as a directive statement in the MP directive language (awk). All other input lines are processed as text lines. Simple macros are created using awk assignment statements and their values referenced using the substitu- tion prefix character ($ by default). The backslash (\) is the escape character; its presence forces the next character to literally appear in the output. This is most useful when forcing the appearance of the directive prefix character, the substitution prefix character, and the escape character itself.

Macro Substitution

All input lines are scanned for macro references that are indicated by the substitution prefix character. Assuming the default value of that character, macro references may be of the form $var, $(var), $(expr), $[str], $var[expr], or $func(args). These are replaced by an awk variable, awk variable, awk expression, awk array reference to the special array M[], regular awk array reference, or awk function call, respectively. These are, in effect, macros. The MP translator checks for proper nesting of parentheses and dou- ble quotes when translating $(expr) and $func(args) macros, and checks for proper nesting of square brackets and double quotes when translating $[expr] and $var[expr] macros. The substitution prefix character indicates a a macro reference unless it is (i) escaped (e.g., \$abc), (ii) followed by a character other than A-Z, a-z, (, or [ (e.g., $@), or (iii) inside a macro reference (e.g., $($abc); probably an error).

An understanding of the implementation of macro substitution will help in its proper usage. When a text line is encoun- tered, it is scanned for macros, embedded in an awk print statement, and copied to the output program. For example, the input line

The quick $fox jumped over the lazy $dog.

is transformed into

print "The quick " fox " jumped over the lazy " dog "."

Obviously the use of this transformation technique relies completely on the presence of the awk concatenation operator (one or more blanks).

Macros Containing Macros

As already noted, a macro reference inside another macro reference will not result in substitution and will probably cause an awk execution-time error. Furthermore, a substitution prefix character in the substituted string is also generally not significant because the substitution pre- fix character is detected at translation time, and macro values are assigned at execution time. However, macro references of the form $[expr] provide a simple nested referencing capability. For example, if $[abc] is in a text line, or in a directive line and not on the left hand side of an assignment statement, it is replaced by eval(M["abc"])/. When the output program is executed, the m5 runtime routine eval()/ substitutes the value of M["abc"] examining it for further macro references of the form $[str] (where "str" denotes an arbitrary string). If one is found, substitution and scanning proceed recursively. Function type macro references may result in references to other mac- ros, thus providing an additional form of nested referenc- ing.

Directive Lines

Except for the include directive, when a directive line is detected, the directive prefix is removed, the line is scanned for macros, and then the line is copied to the out- put program (as distinct from the final output). Any valid awk construct, including the function statement, is allowed in a directive line. Further information on writing awk programs may be found in Aho, Kernighan, and Weinberger, Dougherty and Robbins, and Robbins.

Include Directive

A single non-awk directive has been provided: the include directive. Assuming that # is the directive prefix, #include(filename) directs the MP translator to immediately read from the indicated file, processing lines from it in the normal manner. This processing mode makes the include directive the only type of directive to take effect at translation time. Nested includes are allowed. Include directives must appear on a line by themselves. More ela- borate types of file processing may be directly programmed using appropriate awk statements in the input file.

Main Program and Functions

The MP translator builds the resulting awk program in one of two ways, depending on the form of the first input line. If that line begins with "function", it is assumed that the user is providing one or more functions, including the func- tion "main" required by m5. If the first line does not begin with "function", then the entire input file is translated into awk statements that are placed inside "main". If some input lines are inside functions, and oth- ers are not, awk will will detect this and complain. The MP by design has little awareness of the syntax of directive lines (awk statements), and as a consequence syntax errors in directive lines are not detected until the output program is executed.

Output

Finally, unless the -c (compile only) option is specified on the command line, the output program is executed to produce the final output (directed by default to standard output). The version of awk specified in ARGV[0] (a built-in awk variable containing the command name) is used to execute the program. If ARGV[0] is null, awk is used.

EXAMPLE

Understanding this example requires recognition that macro substitution is a two-step process: (i) the input text is translated into an output awk program, and (ii) the awk program is executed to produce the final output with the macro substitutions actually accomplished. The examples below illustrate this process. # and $ are assumed to be the directive and substitution prefix characters. This example was successfully executed using awk on a Cray C90 running UNICOS 10.0.0.3, gawk on a Gateway E-3200 runing SuSE Linux Version 6.0, and nawk on a Sun Ultra 2 Model 2200 running Solaris 2.5.1.

Input Text

#function main() {

   Example 1: Simple Substitution
   ------------------------------
#  br = "brown"
   The quick $br fox.

   Example 2: Substitution inside a String
   ---------------------------------------
#  r = "row"
   The quick b$(r)n fox.

   Example 3: Expression Substitution
   ----------------------------------
#  a = 4
#  b = 3
   The quick $(2*a + b) foxes.

   Example 4: Macros References inside a Macro
   -------------------------------------------
#  $[fox] = "\$[q] \$[b] \$[f]"
#  $[q] = "quick"
#  $[b] = "brown"
#  $[f] = "fox"
   The $[fox].

   Example 5: Array Reference Substitution
   ---------------------------------------
#  x[7] = "brown"
#  b = 3
   The quick $x[2*b+1] fox.

   Example 6: Function Reference Substitution
   ------------------------------------------
   The quick $color(1,2) fox.

   Example 7: Substitution of Special Characters
   ---------------------------------------------
\#  The \$ quick \\ brown $# fox. $$
#}
#include(testincl.m5)

Included File testincl.m5

#function color(i,j) {
   The lazy dog.
#  if (i == j)
#     return "blue"
#  else
#     return "brown"
#}

Output Program

function main() {
   print
   print "   Example 1: Simple Substitution"
   print "   ------------------------------"
   br = "brown"
   print "   The quick " br " fox."
   print
   print "   Example 2: Substitution inside a String"
   print "   ---------------------------------------"
   r = "row"
   print "   The quick b" r "n fox."
   print
   print "   Example 3: Expression Substitution"
   print "   ----------------------------------"
   a = 4
   b = 3
   print "   The quick " 2*a + b " foxes."
   print
   print "   Example 4: Macros References inside a Macro"
   print "   -------------------------------------------"
   M["fox"] = "$[q] $[b] $[f]"
   M["q"] = "quick"
   M["b"] = "brown"
   M["f"] = "fox"
   print "   The " eval(M["fox"]) "."
   print
   print "   Example 5: Array Reference Substitution"
   print "   ---------------------------------------"
   x[7] = "brown"
   b = 3
   print "   The quick " x[2*b+1] " fox."
   print
   print "   Example 6: Function Reference Substitution"
   print "   ------------------------------------------"
   print "   The quick " color(1,2) " fox."
   print
   print "   Example 7: Substitution of Special Characters"
   print "   ---------------------------------------------"
   print "\#  The \$ quick \\ brown $# fox. $$"
}
function color(i,j) {
   print "   The lazy dog."
   if (i == j)
      return "blue"
   else
      return "brown"
}

function eval(inp   ,isplb,irb,out,name) {

   splb = SP "["
   out = ""

   while( isplb = index(inp, splb) ) {
      irb = index(inp, "]")
      if ( irb == 0 ) {
         out = out substr(inp,1,isplb+1)
         inp = substr( inp, isplb+2 )
      } else {
         name = substr( inp, isplb+2, irb-isplb-2 )
         sub( /^ +/, "", name )
         sub( / +$/, "", name )
         out = out substr(inp,1,isplb-1) eval(M[name])
         inp = substr( inp, irb+1 )
      }
   }

   out = out inp

   return out
}
BEGIN {
   SP = "$"
   main()
   exit
}

Final Output

   Example 1: Simple Substitution
   ------------------------------
   The quick brown fox.

   Example 2: Substitution inside a String
   ---------------------------------------
   The quick brown fox.

   Example 3: Expression Substitution
   ----------------------------------
   The quick 11 foxes.

   Example 4: Macros References inside a Macro
   -------------------------------------------
   The quick brown fox.

   Example 5: Array Reference Substitution
   ---------------------------------------
   The quick brown fox.

   Example 6: Function Reference Substitution
   ------------------------------------------
   The lazy dog.
   The quick brown fox.

   Example 7: Substitution of Special Characters
   ---------------------------------------------
#  The $ quick \ brown $# fox. $$

File

a.awk is the default output program file.

See Also

awk(1), cpp(1), gawk(1), m4(1), nawk(1). vi(1)

Author

William A. Ward, Jr., School of Computer and Information Sciences, University of South Alabama, Mobile, Alabama, July 23, 1999.


categories: Wp,Project,Tools,Mar,2009,Timm

AWKWORDS

Contents

Synopsis

awkwords --title "Title" file > file.html

awkwords file > file.html

Download

This code requires gawk and bash. To download:

wget  http://lawker.googlecode.com/svn/fridge/lib/bash/awkwords
chmod +x awkwords

To test the code, apply it to itself:

  • ./awkwords --title "Does this work?" awkwords > awkwards.html

Description

AwkWords is a simple-to-use markup language for writing documentation for programs whose comment lines start with "#" and whose comments contain HTML code.

For example, awk.info?tools/awkwords shows the html generated from this bash script.

When used with the --title option, a stand alone web page is generated (to control the style of that page, see the CSS function, dicussed below). When used without --title it generated some html suitable for inclusion into other pages.

Also, AwkWords finds all the <h2>, <h3>, <h4>, <h5>, <h6>, <h7>, <h8>, <h9> headings and copies them to a table of contents at the front of the file. Note that AwkWords assumes that the file contains only one <h1> heading- this is printed before the table of contents.

AwkWords adds some short cuts for HTML markup, as well as including nested contents (see below: "including nested content"). This is useful for including, say, program output along with the actual program.

Extra Markup

Short cuts for HTML

#.XX
This is replaced by <XX>.
#.XX words
This is replaced by <XX>words</XX>. Note that this tag won't work properly if the source text spills over more than one line.
#.TO url words
This is replaced by a link to mail to url.
#.URL url words
This is replaced by a link to mail to url.

Including nested content:

#.IN file
This line is replaced by the contents of file.
#.LISTING file
This line is replaced by the name of the file, followed by a verbatbim displau of file (no formatting).
#.CODE file
This line is replaced by the name of the file, followed verbatbim by file (no formatting).
#.BODY file
This line is replaced by file, less the lines before the first blank line.

Programmer's Guide

Awkwords is divided into three functions: unhtml fixes the printing of pre-formatted blocks; toc adds the table of contents while includes handles the details of the extra mark-up.

Functions

unhtml

unhtml() { cat $1| gawk '
  BEGIN {IGNORECASE=1}
  /^<PRE>/   {In=1; print; next}
  /^<\/PRE>/ {In=0; print; next}
  In         {gsub("<","\\<",$0); print; next }
             {print $0 }'
}

toc

toc() { cat $1 | gawk '
 BEGIN             { IGNORECASE = 1 }
 /^<[h]1>/         { Header=$0; next}
 /^[<]h[23456789]>/  { 
       T++ ;
      Toc[T]  = gensub(/(.*)<h(.*)>[ \t]*(.*)[ \t]*<\/h(.*)>(.*)/,
      "<""h\\2><""font color=black>\\•</font></a> <""a href=#" T ">\\3</a></h\\4>",
                "g",$0)
		Pre="<a name="T"></a>" }
     { Line[++N] = Pre $0; Pre="" }
 END { print Header;
       print "<" "h2>Contents</h2>"
       print "<" "div id=\"htmltoc\">"
       for(I=1;I<=T;I++) print Toc[I]	
       print "<" "/div><!--- htmltoc --->"
       print "<" "div id=\"htmlbody\">"
       for(I=1;I<=N;I++) print Line[I]
       print "</" "div><!--- htmlbody --->"		
     }'
}

includes

The xpand function controls recursive inclusion of content. Note that

  • The last act of this function must be to call xpand1.
  • When including verbatim text, the recursive call to xpands must pass "1" to the second paramter.
includes() { cat $1 | gawk '
function xpand(pre,  tmp) {
   if      ($1 ~ "^#.IN")    xpands($2,pre) 
   else if ($1 ~ "^#.BODY" ) xpandsBody($2,pre)
   else if ($1 ~ "^#.LISTING")  {
  	    print "<" "pre>"
	    xpands($2,1)     # <===== note the recursive call with "1"
	    print "<" "/pre>" } 
   else if ($1 ~ "^#.CODE")  {
  	    print "<" "p>" $2 "\n<" "pre>"
	    xpands($2,1)     # <===== note the recursive call with "1"
	    print "<" "/pre>" } 
   else if ($1 ~ "^#.URL") {
	    tmp = $2; $1=$2="";
	    print "<" "a href=\""tmp"\">" trim($0) "</a>"
	    }
   else if ($1 ~ "^#.TO") {
	    tmp = $2; $1=$2="";
	    print "<" "a href=\"mailto:"tmp"\">" trim($0) "</a>"
	    }
   else 
	xpand1(pre)
}

The xpand1 function controls the printing of a single line. If we are formatting verbatim text, we must remove the start-of-html character "<". Otherwise, we expand any html shortcuts.

function xpand1(pre) {
   if (pre)
        gsub("<","\\<",$0)  # <=== remove start-of-html-character
   else {
        $0= xpandHtml($0)      # <=== expand html short cuts
        sub(/^#/,"",$0) }
        print $0 
}

The function xpandHtml controls the html short cuts

function xpandHtml(    str,tag) {
   if ($0 ~ /^#\.H1/) {         
	   $1=""
	   return "<" "h""1><join>" $0 "</join></" "h1>" }
   if (sub(/^#\./,"",$1)) {
	   tag=$1;  $1=""
	   return "<" tag ">"  (($0 ~ /^[ \t]*$/) ? "" : $0"</"tag">")
   }
   return $0
}

The rest of the code is just some book-keeping and managing the recursive addition of content.

function xpands(f,pre) {
     if (newFile(f)) {
	  while((getline <f) > 0) xpand(pre)
          close(f) }
}
function xpandsBody(f,pre, using) {
     if (newFile(f)) { 
	  while((getline <f) >0) {
	    if ( !using && ($0 ~ /^[\t ]*$/) ) using = 1
	    if ( using ) xpand(pre)}
	  close(f) }
}
function newFile(f) { return ++Seen[f]==1 }
function trim (s)   { sub(/^[ \t]*/,"",s);  sub(/[ \t]*$/,"",s); return s } 

BEGIN { IGNORECASE=1 }
      { xpand()      }'
}

CSS styles

If used to generate a full web page, then the following styles are added. Note that the htmltoc class controls the appearance of the table of contents.

css() { 
      echo "<""STYLE type=\"text/css\">"
      cat<<-'EOF'
         div.htmltoc h2 { font-size: medium; font-weight: normal; 
                          margin: 0 0 0 0; margin-left: 30px;}
	 div.htmltoc h3 { font-size: medium; font-weight: normal; 
                          margin: 0 0 0 0; margin-left: 60px;}
         div.htmltoc h4 { font-size: medium; font-weight: normal; 
                          margin: 0 0 0 0; margin-left: 90px;}
         div.htmltoc h5 { font-size: medium; font-weight: normal; 
                          margin: 0 0 0 0; margin-left: 120px;}
         div.htmltoc h6 { font-size: medium; font-weight: normal; 
                          margin: 0 0 0 0; margin-left: 150px;}
         div.htmltoc h7 { font-size: medium; font-weight: normal; 
                          margin: 0 0 0 0; margin-left: 180px; }
      </STYLE>
EOF
}

Main command line

main() { cat $1 | includes | unhtml | toc; }

if [ $1 == "--title" ]
then 
     echo "<""html><""head><""title>$2</title>`css`</head><""body>"; 
     shift 2
     main $1
     echo "<""/body><""/html>"
else 
     main $1
fi 

Bugs

There's no checking for valid input (e.g. pre-formatting tags that never close).

If the input file contains no html mark up, the results are pretty messy.

Recursive includes fail silently if the referenced file does not exist.

I don't like the way I need a seperate pass to do "unhtml". I tried making it work within the code but it got messy.

Author

Tim Menzies

categories: Wp,Awk100,Wp,Tools,Apr,2009,HenryS

awf

The amazingly workable (text) formatter

Synopsis

awf -macros [ file ] ...

Download

Download from LAWKER. Type "make r" to run a regression test, formatting the manual page (awf.1) and comparing it to a preformatted copy (awf.1.out). Type "make install" to install it. Pathnames may need changing.

Description

Awf formats the text from the input file(s) (standard input if none) in an imitation of nroff's style with the -man or -ms macro packages. The -macro option is mandatory and must be `-man' or `-ms'.

Awf is slow and has many restrictions, but does a decent job on most manual pages and simple -ms documents, and isn't subject to AT&T's brain-damaged licensing that denies many System V users any text formatter at all. It is also a text formatter that is simple enough to be tinkered with, for people who want to experiment.

Awf implements the following raw nroff requests:

.\"  .ce  .fi  .in  .ne  .pl  .sp
.ad  .de  .ft  .it  .nf  .po  .ta
.bp  .ds  .ie  .ll  .nr  .ps  .ti
.br  .el  .if  .na  .ns  .rs  .tm

and the following in-text codes:

\$   \%   \*   \c   \f   \n   \s

plus the full list of nroff/troff special characters in the original V7 troff manual.

Many restrictions are present; the behavior in general is a subset of nroff's. Of particular note are the following:

  • Point sizes do not exist; .ps and \s are ignored.
  • Conditionals implement only numeric comparisons on \n(.$, string com- parisons between a macro parameter and a literal, and n (always true) and t (always false).
  • The implementation of strings is generally primitive.
  • Expressions in (e.g.) .sp are fairly general, but the |, &, and : operators do not exist, and the implementation of \w requires that quote (') be used as the delimiter and simply counts the characters inside (so that, e.g., \w'\(bu' equals 4).

White space at the beginning of lines, and imbedded white space within lines, is dealt with properly. Sentence terminators at ends of lines are understood to imply extra space afterward in filled lines. Tabs are implemented crudely and not quite correctly, although in most cases they work as expected. Hyphenation is done only at explicit hyphens, emdashes, and nroff discretionary hyphens.

MAN Macros

The -man macro set implements the full V7 manual macros, plus a few semi- random oddballs. The full list is:

.B   .DT  .IP  .P   .RE  .SM
.BI  .HP  .IR  .PD  .RI  .TH
.BR  .I   .LP  .PP  .RS  .TP
.BY  .IB  .NB  .RB  .SH  .UC

.BY and .NB each take a single string argument (respectively, an indi- cation of authorship and a note about the status of the manual page) and arrange to place it in the page footer.

MS Macros

The -ms macro set is a substantial subset of the V7 manuscript macros. The implemented macros are:

.AB  .CD  .ID  .ND  .QP  .RS  .UL
.AE  .DA  .IP  .NH  .QS  .SH  .UX
.AI  .DE  .LD  .NL  .R   .SM
.AU  .DS  .LG  .PP  .RE  .TL
.B   .I   .LP  .QE  .RP  .TP

Size changes are recognized but ignored, as are .RP and .ND. .UL just prints its argument in italics. .DS/.DE does not do a keep, nor do any of the other macros that normally imply keeps.

Assignments to the header/footer string variables are recognized and implemented, but there is otherwise no control over header/footer formatting. The DY string variable is available. The PD, PI, and LL number registers exist and can be changed.

Output

The only output format supported by awf, in its distributed form, is that appropriate to a dumb terminal, using overprinting for italics (via underlining) and bold. The nroff special characters are printed as some vague approximation (it's sometimes very vague) to their correct appearance.

Awf's knowledge of the output device is established by a device file, which is read before the user's input. It is sought in awf's library directory, first as dev.term (where term is the value of the TERM environment variable) and, failing that, as dev.dumb. The device file uses special internal commands to set up resolution, special characters, fonts, etc., and more normal nroff commands to set up page length etc.

FiLes

All in /usr/lib/awf (this can be overridden by the AWFLIB environment variable):

common     common device-independent initialization
dev.*      device-specific initialization
mac.m*     macro packages
pass1      macro substituter
pass2.base central formatter
pass2.m*   macro-package-specific bits of formatter
pass3      line and page composer

See Also

awk(1), nroff(1), man(7), ms(7)

Diagnostics

Unlike nroff, awf complains whenever it sees unknown commands and macros. All diagnostics (these and some internal ones) appear on standard error at the end of the run.

Author

Written at University of Toronto by Henry Spencer, more or less as a supplement to the C News project.

Copyright

Copyright 1990 University of Toronto. All rights reserved. Written by Henry Spencer. This software is not subject to any license of the American Telephone and Telegraph Company or of the Regents of the University of California.

Permission is granted to anyone to use this software for any purpose on any computer system, and to alter it and redistribute it freely, subject to the following restrictions:

  1. The author is not responsible for the consequences of use of this software, no matter how awful, even if they arise from flaws in it.
  2. The origin of this software must not be misrepresented, either by explicit claim or by omission. Since few users ever read sources, credits must appear in the documentation.
  3. Altered versions must be plainly marked as such, and must not be misrepresented as being the original software. Since few users ever read sources, credits must appear in the documentation.
  4. This notice may not be removed or altered.

Bugs

There are plenty, but what do you expect for a text formatter written entirely in (old) awk?

The -ms stuff has not been checked out very thoroughly.


categories: Tools,May,2009,AlexR

Linking Awk to Spreadsheets

Axel Renihold's MacroCALC (mc) interactive spreadhsheet calculator is an interactive, macro-programmable tool. mc has no graphic features, but therefore it can run also on terminals. It uses a convenient, well-known user interface and has some special features especially interesting in the UNIX environment.

mc has an elaborate operating system via piping. That is, mc and Unix tools like Awk can be easily intergrated.

A "cell" statement has the syntax:

cell < command
(and "command" is any Unix script, e.g. using Awk). When such a cell is entered, it will:
  • execute the command
  • put the command's output into the range of cells starting with cell as the upper-left corner.

The output is read line by line into the rows of the range. The columns, which have to be separated by "tab" in the output of the command, are placed into the columns of the range.

At the end of the data a special cell value designated 'EOF' (end of file) is placed in the cell below the data. This offers great flexibility based upon the Unix operating system's piping mechanism

For more details, see the MacroCALC home page.

blog comments powered by Disqus