Awk.Info

"Cause a little auk awk
goes a long way."

About awk.info
 »  table of contents
 »  featured topics
 »  page tags


About Awk
 »  advocacy
 »  learning
 »  history
 »  Wikipedia entry
 »  mascot
Implementations
 »  Awk (rarely used)
 »  Nawk (the-one-true, old)
 »  Gawk (widely used)
 »  Mawk
 »  Xgawk (gawk + xml + ...)
 »  Spawk (SQL + awk)
 »  Jawk (Awk in Java JVM)
 »  QTawk (extensions to gawk)
 »  Runawk (a runtime tool)
 »  platform support
Coding
 »  one-liners
 »  ten-liners
 »  tips
 »  the Awk 100
Community
 »  read our blog
 »  read/write the awk wiki
 »  discussion news group

Libraries
 »  Gawk
 »  Xgawk
 »  the Lawker library
Online doc
 »  reference card
 »  cheat sheet
 »  manual pages
 »  FAQ

Reading
 »  articles
 »  books:

WHAT'S NEW?

Mar 01: Michael Sanders demos an X-windows GUI for AWK.

Mar 01: Awk100#24: A. Lahm and E. de Rinaldis' patent search, in AWK

Feb 28: Tim Menzies asks this community to write an AWK cookbook.

Feb 28: Arnold Robbins announces a new debugger for GAWK.

Feb 28: Awk100#23: Premysl Janouch offers a IRC bot, In AWK

Feb 28: Updated: the AWK FAQ

Feb 28: Tim Menzies offers a tiny content management system, in Awk.

Jan 31: Comment system added to awk.info. For example, see discussion bottom of ?keys2awk

Jan 31: Martin Cohen shows that Gawk can handle massively long strings (300 million characters).

Jan 31: The AWK FAQ is being updated. For comments/ corrections/ extensions, please mail tim@menzies.us

Jan 31: Martin Cohen finds Awk on the Android platform.

Jan 31: Aleksey Cheusov released a new version of runawk.

Jan 31: Hirofumi Saito contributes a candidate Awk mascot.

Jan 31: Michael Sanders shows how to quickly build an AWK GUI for windows.

Jan 31: Hyung-Hwan Chung offers QSE, an embeddable Awk Interpreter.

[More ...]

Bookmark and Share

categories: Dsl,Mar,2009,Admin

Domain-Specific Langauges

These pages focus on domain-specific languages (a.k.a. "little langauges") written in Awk.

These little languages can range from the simple to the quite intricate. For example, LAWKER contains code for

  • Simple:
    • Graph- a simple ascii graph generator;
    • Markdown- an ultra lightweight HTML markup language;
  • Intricate:
    • Awk++- enables object-oriented programming in Awk;
    • AwkLisp- a fully functioning LISP interpreter, written in Awk.

Interestingly, without comments, the LISP interpreter is only three times longer than the HTML markup language. This comments either on the power of Awk, the regularity of LISP's core semantics, or both.


categories: Dsl,Mar,2009,BrianK,PeterW,AlfredA

Graph.awk

Contents

Synopsis

gawk -f graph.awk graphFile

Description

A processor for a little language, specialized for graph-drawing.

The code inputs data, which includes a specification of a graph The output is data plotted in specified area

For example, here is an input specification:

label here's some stuff
bottom ticks 1 5 10 
left ticks 1 2 10 20
range 1 1 10 22
height 10
width 30
1 2 *
2 4 * 
3 6 *
4 8 *
7 14 +
8 12 +
9 10 +
mb 0.9 11 =

It produces the following output

      |----------------------|
20    -                 = =  =
      |       = =  = =       |
      =  = =         +  +    |
10    -                   +  |
      |    *  *              |
      |  *                   |
2     *---------|------------|
     1         5            10
         here's some stuff    

Code

Initialization

Set frame dimensions: height and width; offset for x and y axes.

BEGIN {                
    ht = 24; wid = 80  
    ox = 6; oy = 2     
    number = "^[-+]?([0-9]+[.]?[0-9]*|[.][0-9]+)" \
                            "([eE][-+]?[0-9]+)?$"
}

Handling patterns

Skip comments

/^[ \t]*#/     { next } 

Simple tags

$1 == "height" { ht = $2;  next }
$1 == "width"  { wid = $2; next }
$1 == "label"  {                       # for bottom
    sub(/^ *label */, "")
    botlab = $0
    next
}
$1 == "bottom" && $2 == "ticks" {     # ticks for x-axis
    for (i = 3; i <= NF; i++) bticks[++nb] = $i
    next
}
$1 == "left" && $2 == "ticks" {       # ticks for y-axis
    for (i = 3; i <= NF; i++) lticks[++nl] = $i
    next
}
$1 == "range" {                       # xmin ymin xmax ymax
    xmin = $2; ymin = $3; xmax = $4; ymax = $5
    next
}

Handling numerics.

$1 ~ number && $2 ~ number {  # pair of numbers
    nd++                      # count number of data points
    x[nd] = $1; y[nd] = $2
    ch[nd] = $3               # optional plotting character
    next
}
$1 ~ number && $2 !~ number { # single number
    nd++                      # count number of data points
    x[nd] = nd; y[nd] = $1; ch[nd] = $2
    next
}

Line functions, defined by a slope "m" and a y-intercept "b".

$1 == "mb" {  # m b [mark]
	expand()
    for(i=xmin;i<=xmax;i++) {
		nd++; x[nd]=i; y[nd]=$2*i + $3; ch[nd]=$4 
    }
    next;
}		

Final case: input error.

{ print "?? line " NR ": ["$0"]" >"/dev/stderr" }

Draw the graph

END { expand();   frame(); ticks(); label(); data(); draw() }

Functions

Expand the "x" and "y" boundaries to include all points.

function expand(note) { if (xmin == "") expand1(note) }
function expand1(note) {
 	xmin = xmax = x[1]    
    ymin = ymax = y[1]
    for (i = 2; i <= nd; i++) {
        if (x[i] < xmin) xmin = x[i]
        if (x[i] > xmax) xmax = x[i]
        if (y[i] < ymin) ymin = y[i]
        if (y[i] > ymax) ymax = y[i] }
}

Draw the frame around the graph.

function frame() {        
    for (i = ox; i < wid; i++) plot(i, oy, "-")     # bottom
    for (i = ox; i < wid; i++) plot(i, ht-1, "-")   # top
    for (i = oy; i < ht; i++) plot(ox, i, "|")      # left
    for (i = oy; i < ht; i++) plot(wid-1, i, "|")   # right
}

Create tick marks for both axes.

function ticks(    i) {   
    for (i = 1; i <= nb; i++) {
        plot(xscale(bticks[i]), oy, "|")
        splot(xscale(bticks[i])-1, 1, bticks[i])
    }
    for (i = 1; i <= nl; i++) {
        plot(ox, yscale(lticks[i]), "-")
        splot(0, yscale(lticks[i]), lticks[i])
    }
}

Center labels under x-axis.

function label() {        
    splot(int((wid + ox - length(botlab))/2), 0, botlab)
}

Create data points.

function data(    i) {    
    for (i = 1; i <= nd; i++)
        plot(xscale(x[i]),yscale(y[i]),ch[i]=="" ? "*" : ch[i])
    for(i in mark) print mark[i]
}

Print graph from array.

function draw(    i, j) { 
    for (i = ht-1; i >= 0; i--) {
        for (j = 0; j < wid; j++)
            printf((j,i) in array ? array[j,i] : " ")
        printf("\n")
    }
}

Scale x-values, y-values.

function xscale(x) {      
    return int((x-xmin)/(xmax-xmin) * (wid-1-ox) + ox + 0.5)
}
function yscale(y) {      
    return int((y-ymin)/(ymax-ymin) * (ht-1-oy) + oy + 0.5)
}

Put one character into array.

function plot(x, y, c) {  
    array[x,y] = c
}

Put string "s" into array.

function splot(x, y, s,    i, n) { 
    n = length(s)
    for (i = 0; i < n; i++)
        array[x+i, y] = substr(s, i+1, 1)
}

Author

This code comes from the original Awk book by Alfred Aho, Peter Weinberger & Brian Kernighan and contains some small modifications by Tim Menzies.


categories: Dsl,April,2009,MartinF

UML in Awk

Contents

Synopsis

Description

Example

Code

Author

Synopsis

gawk -f uml.sh  file.sdml >  sequence_diagram

Description

This program will turn SDML into simple ascii text uml sequence diagrams. SDML is an extremely simplistic uml Sequence Diagram Markup Language. SDML is specified as:

  • Lines starting with a [ are a comma separated list of actors (bar headers)
  • Events are defined easily by the following symbols:
    >
    rightward event
    <
    leftward event
    -
    extension of the previous event
  • Actors can be skipped with a |
  • Text on a line after a # is a comment
  • Lines starting with a @ are text lines
  • Lines starting with a " are indented text lines
  • Lines starting with a : are comma separated list of parameter assignment lines. Parameters are:
  • E
    Event Padding (spaces on each side)
    ES
    Event Spacing (lines below)
    EA
    Events Above (put event text above arrows)
    HP
    Header Padding (spaces on each side)
    HS
    Header Spacing (lines below)
    LM
    Left Margin (spaces on the left)
    TSM
    Text Spacing Margin (lines above & below)
    TD
    Text Dots (instead of bars in text margins)
    SS
    Enable Single Arrow Spans (|---A-->|, not |-A-+-A>|)

Example

Given this input:

    [Client, Proxy, DNS, Server
    Query Name->
    Answer IP<-
    http GET >->
    <<-html

this code generates:

    Client          Proxy           DNS         Server
       |              |              |             |
       |----------Query Name-------->|             |
       |<---------Answer IP----------|             |
       |--http GET -->|----------http GET -------->|
       |<----html-----|<-----------html------------|

Code

if [ "$1" = "--awkprog" ] ; then

cat - <<"EOF"

BEGIN {
  EFS="[|<>-]";
  AFS="[<>-]";
  RAFS="[{}RL]";
  FS= EFS;
  ARROWS = 2 ; # Arrowhead constant
  ST=1;

  ARG["EP"] = 1;  # Event Padding
  ARG["ES"] = 0;  # Event Spacing (lines below)
  ARG["EA"] = 0;  # Events Above

  ARG["HP"] = 2;  # Header Padding
  ARG["HS"] = 1;  # Header Spacing (lines below)

  ARG["LM"] = 0;  # Left Margin

  ARG["SP"] = 2;  # Start Row Padding (For continuous operation)

  ARG["TSM"] = 1; # Text Spacing Margin (lines above & below)
  ARG["TD"] = 1;  # Text Dots (instead of bars in text margins)
  ARG["SS"] = 1;  # Enable Single Arrow Spans (|---A-->|, not |-A-+-A>|)
}

function padding(outter, inner, extra    ,p,m) {
  p = (outter - inner);
  m = p % 2 ;
  p =  ((p - m)/2) + (extra ? m:0);
  if(p<0) return 0;
  return p;
}
function pad(char, count    ,i,r) {
  for(i=1 ; i <= count ; i++) { r = r char };
  return r;
}
function ltrim(s) { gsub(/^[     ]*/, "", s) ; return s; }

function center(string, width, padchar, favor    ,p,r,sw) {
  sw = length(string);
  p = padding(width, sw, favor=="r"?1:0);
  r = pad(padchar, p);
  r = r string;
  p = padding(width, sw, favor=="r"?0:1);
  return r pad(padchar, p);
}

function getevent_rev(row, field   ,p) {
  for(p=field-1; p>0; p--) { # search to the left
    if(RF_s[row,p] !~ AFS) return "";
    if(RF_f[row,p] != "") return RF_f[row,p];
  }
  return "";
}
function getevent_for(row, field   ,n) {
  for(n=field+1; n <= R_nf[row]; n++) { # search to the right
    if(RF_s[row,n-1] !~ AFS) return "";
    if(RF_f[row,n] != "") return RF_f[row,n];
  }
  return "";
}

function rlarrow(arrow, prevarrow) {
  if(arrow == ">") return "R";
  if(arrow == "<") return "L";
  if(arrow == "R" || arrow=="L") return arrow;
  return prevarrow;
}

function debug_events(s) {
  for(r=1; r <= NRS; r++) debug_row(r, s);
}

function debug_row(r, s) {
  if(!DEBUG_ROW) return;
  printf("Row["r"]/Stage["s"]:  ");
  for(f=1; f <= R_nf[r]; f++) {
    printf(f"="RF_f[r,f]"("RF_s[r,f]") ");
  }
  printf("\n");
}

function print_bars(num, char    ,i,out) {
  if(char == "") char = "|";
  while(num--) {
    # Center the bars under the Headers
    out = pad(" ", F_width[0]);
    for(i=1; i<= NH; i++) {
      out = out  char pad(" ", F_width[i]);
    }
    print out;
  }
}

function print_event(r, type   ,i,bar,out,aspad,span_width,arrow){
  out = pad(" ", F_width[0]);

  for(i=1; i<= MAXNF; i++) {

    out = out "|";

    arrow=" ";
    if(type == "both" || type == "arrow") {
      if(RF_s[r,i] == "{") arrow = "<";
      if(RF_s[r,i] ~ /[}RL]/)  arrow = "-";
    }
    out = out arrow;


    aspad = "-"; # arrow or space pad
    if(RF_s[r,i] == "|" || RF_s[r,i] == ""|| type == "event") aspad = " ";

    span_width = F_width[i];
    if(ARG["SS"]) while(RF_s[r,i] == "R" || RF_s[r,i+1] == "L") {
      span_width += 1 + F_width[++i]; # include bar
    }

    event ="";
    if(type == "both" || type == "event") {
      event = RF_f[r,i];
    }
    out = out center(event, span_width - ARROWS, aspad, i>MAXNF/2? "r":"l");


    if(type == "both" || type == "arrow") {
      if(RF_s[r,i] == "}") arrow = ">";
      if(RF_s[r,i] ~ /[{RL]/) arrow = "-";
    }
    out = out arrow;
  }
  out = out "|";
  print out;
}

function print_sd(start_row) {
 print "         1         2         3         4         5         6"
 print "123456789012345678901234567890123456789012345678901234567890"
  if(start_row!=1) { for(i=0; i<ARG["SP"];i++) print ""; }

  for(j=start_row; j<= NRS; j++) {

    if(R_ltype[j] == "Header") {
      NH = R_nf[j];
      out = pad(" ", ARG["HP"]+ARG["LM"]);
      i =1;
      out = out RF_f[j,i];
      hp = ARG["HP"] + ARG["LM"] + RF_l[j,i]; # header pointer (last char)
      bp = F_width[0] + 1 + F_width[i] + 1; # bar pointer
 print "HP:" hp " BP: "bp
      for(i=2; i<= NH; i++) {
        l = int(RF_l[j,i]/2); r = RF_l[j,i] -l; # Header left & right
        lp = (bp - hp) - (l + 1); # left padding
        out = out pad(" ", lp) RF_f[j,i];
        hp = bp + r - 1;
        bp = bp + F_width[i] + 1;
 print "HP:" hp " BP: "bp " LP:"lp " r:"r" l:"l
      }

      print out;
      print_bars(ARG["HS"]);
    }

    if(R_ltype[j] == "Text") {
      if(R_ltype[j-1] != "Text") {
        if(ARG["TD"]) { 
          print_bars(ARG["TSM"], ".");
        } else {
          for(l=0;l<ARG["TSM"]; l++) print "";
        }
      }

      if(T_type[j] == "indent") printf(pad(" ", F_width[0]));
      print RF_f[j,1];

      if(R_ltype[j+1] != "Text") {
        if(ARG["TD"]) { 
          print_bars(ARG["TSM"], ".");
        } else {
          for(l=0;l<ARG["TSM"]; l++) print "";
        }
      }
    }

    if(R_ltype[j] == "Event") {
      if (ARG["EA"]) {
        print_event(j, "event");
        print_event(j, "arrow");
      } else print_event(j, "both");
      print_bars(ARG["ES"]);
    }

  }
  return j;
}


/^[     ]*#/ {next} # we don't want bars for comment only lines!
/#/ { $0 = sub(/#.*$/, ""); }

/^:/ {
 print "Argument Variable Assignment" $0
  i = split(substr($0,2), v, /,/);
  for(;i>0;i--) {
    j = split(v[i], kv, "=");
    if(j==1) { ARG[kv[1]]= ""; }
    if(j==2) { ARG[kv[1]]=kv[2]; }
  }
 for(k in ARG) { printf("ARG["k"]='"ARG[k]"' "); } ; print "";
  next ;
}

{
  NRS++; # NRSequences
}

/^;/ { ST=print_sd(ST); next; }  # Allow continuous operation

/^@/ {
 print "text line"
  R_ltype[NRS] = "Text";
  T_type[NRS] = "left";
  sub(/^@/,"");
  RF_f[NRS,1]=$0;
  next;
}

/^"/ {
 print "text line"
  R_ltype[NRS] = "Text";
  T_type[NRS] = "indent";
  sub(/^"/,"");
  RF_f[NRS,1]=$0;
  next;
}

/^\[/ {
 print "Event Headers (Titles)" $0
  R_ltype[NRS] = "Header";

  sub(/^\[/,"");
  FS=","; $0 = $0; # resplit line
  R_nf[NRS] = NF;
  if(MAXNF < R_nf[NRS]-1) MAXNF= R_nf[NRS]-1; # print MAXNF;
  for(i=1; i<= NF; i++) {
    f= ltrim($i);
    RF_f[NRS,i]=f;
    RF_l[NRS,i]= length(f);
    RF_s[NRS,i]= ",";
  }
  for(i=1; i<= NF; i++) {
    F_width[i] = padding(RF_l[NRS,i] + 2*ARG["HP"], 1, 1) +\
                 padding(RF_l[NRS,i+1] + 2*ARG["HP"], 1, 0)\
                 -1; # Do not include width of bar
    if(F_width[i] < 2*ARG["HP"])  F_width[i] = 2*ARG["HP"];

 print padding(RF_l[NRS,i] + 2*ARG["HP"], 1, 1) " "\
       padding(RF_l[NRS,i+1] + 2*ARG["HP"], 1, 0);
  }
  F_width[0] = padding(RF_l[NRS,1] + 2*ARG["HP"], 1, 1);
 print padding(RF_l[NRS,1] + 2*ARG["HP"],1,0);
  if(F_width[0] < ARG["HP"])  F_width["0"] = ARG["HP"];
  F_width[0] += ARG["LM"];
 for(i=0; i<= MAXNF; i++) printf("FW["i"]="F_width[i]" "); print ""

  FS=EFS;
  next;
}

{
 print "Event Line: " $0 ; DEBUG_ROW=1;
  R_ltype[NRS] = "Event";

  stl=0;
  for(i=1; i<= NF; i++) {
    f = $i;
    l = length(f);
    stl += l +1;
    s = substr($0, stl, 1);

    RF_f[NRS,i]= f;
    RF_s[NRS,i]= s;
  }
  R_nf[NRS] = NF;
  debug_row(NRS, 1);

  # Fill in missing (assumed) fields
  for(i=1; i<= R_nf[NRS]; i++) {
    if (RF_f[NRS,i]=="") RF_f[NRS,i] = getevent_rev(NRS, i);
    if (RF_f[NRS,i]=="") RF_f[NRS,i] = getevent_for(NRS, i);
  }
  debug_row(NRS, 2);

  # ->  <-   ->>  >->  <-<  <<-
  # >-  -<        >>-  -<<
  # R>  <L   R>>  >R>  <L<  <<L

  for(i=1; i<= R_nf[NRS]; ) {
    if(RF_s[NRS,i] ~ AFS) {
      if(RF_s[NRS,i] == "-") { # left tail
        for(n=i+1; n<= R_nf[NRS]; n++) {
          if(RF_s[NRS,n]==">") {
            pi=i; i=n;  RF_s[NRS,n]="}";
            for(n--; n>=pi; n--) RF_s[NRS,n]="R"; n= R_nf[NRS];
          } else if(RF_s[NRS,n]=="<") {
            pi=i; i=n;  RF_s[NRS,pi]="{";
            for(; n>pi; n--) RF_s[NRS,n]="L"; n= R_nf[NRS];
          }
        }
        i++;
      } else if(RF_s[NRS,i+1] != "-") { # singleton
        RF_s[NRS,i]= RF_s[NRS,i]==">" ? "}":"{";
        i++;
      } else {
        rl= rlarrow(RF_s[NRS,i], "");
        for(n=i+1; n<= R_nf[NRS] && RF_s[NRS,n] ~ AFS; n++) {
          rl= rlarrow(RF_s[NRS,n], rl);
        }
        n--;
        if (RF_s[NRS,n] == "-") { # right tail
          if (rl=="R") RF_s[NRS,n--]="}";
          for(; n>=i && RF_s[NRS,n] == "-"; n--) RF_s[NRS,n]=rl;
          if (rl=="L") RF_s[NRS,n]="{"; else RF_s[NRS,n]="R";
        } else if (RF_s[NRS,n-1] != "-") { # singleton
          RF_s[NRS,n]= RF_s[NRS,n]==">" ? "}":"{";
        } else { # double ended -
          if(RF_s[NRS,i]=="<") { # trumps no matter what
            RF_s[NRS,i]="{";
            for(i++; i<= R_nf[NRS] && RF_s[NRS,i]=="-"; i++) {
              RF_s[NRS,i]="L";
            }
          } else {
            for(n=i+1; n<= R_nf[NRS] && RF_s[NRS,n] =="-"; n++) ;
            if(RF_s[NRS,n]==">") {
              RF_s[NRS,n]="}";
              for(n--; n>i && RF_s[NRS,n]=="-"; n--) {
                RF_s[NRS,n]="R";
              }
            } else { # >-<  # > is on the right and trumps
              for(; i<= R_nf[NRS] && RF_s[NRS,i]=="-"; i++) {
                RF_s[NRS,i]="R";
              }
              RF_s[NRS,i]="}";
            }
          }
        }
      }
    } else i++;
  }

  debug_row(NRS, 3);


  # ~ we need to test this with multi shifts (arrow/bar/arrow)
  shift = 0;
  for(i=1; i<= R_nf[NRS]+1; i++) {
    if(RF_s[NRS,i-1] ~ RAFS && RF_s[NRS,i] !~ RAFS) shift++;
    if(shift) RF_f[NRS,i-shift]=RF_f[NRS,i];
  }
  R_nf[NRS] = R_nf[NRS] - shift;
  debug_row(NRS, 4);

  # Trim empty trailing fields
  for(i= R_nf[NRS]; i>0 && RF_f[NRS,i]==""; i--) R_nf[NRS]--;
  debug_row(NRS, 5);

  # Get event wlength and adjust the max length of each event
  for(i=1; i<= R_nf[NRS]; i++) {
    RF_l[NRS,i]= length(RF_f[NRS,i]);
    if(RF_l[NRS,i] > E_ml[i]) E_ml[i] = RF_l[NRS,i];
  }

  # Adjust the max width of each column (headers/events)
  if(MAXNF < R_nf[NRS]) MAXNF= R_nf[NRS]; # print MAXNF;
  for(i=1; i<= MAXNF; i++) {
    w = E_ml[i] + 2 * ARG["EP"] + ARROWS;
    if (F_width[i] < w)  F_width[i] = w;
   printf("FW:"F_width[i]" W:"w" ");
  }
 print ""
}

END { ST=print_sd(ST); }


EOF
exit
fi


Usage()
{
  cat - <<-EOF

  use(v1.0): $0 file.sdml >  sequence_diagram

  This program will turn SDML into simple ascii text uml sequence
  diagrams.  SDML is an extremely simplistic uml Sequence Diagram
  Markup Language.  SDML is specified as:

  .Lines starting with a [ are a comma separated list
    of actors (bar headers)
  .Events are defined easily by the following symbols:
    >  rightward event
    <  leftward event
    -  extension of the previous event
  .Actors can be skipped with a |
  .Text on a line after a # is a comment
  .Lines starting with a @ are text lines
  .Lines starting with a " are indented text lines
  .Lines starting with a : are comma separated list of
    parameter assignment lines.  Parameters are:

    E   Event Padding (spaces on each side)
    ES  Event Spacing (lines below)
    EA  Events Above (put event text above arrows)

    HP  Header Padding (spaces on each side)
    HS  Header Spacing (lines below)

    LM  Left Margin (spaces on the left)

    TSM Text Spacing Margin (lines above & below)
    TD  Text Dots (instead of bars in text margins)
    SS  Enable Single Arrow Spans (|---A-->|, not |-A-+-A>|)

  Example SDML Input:

    [Client, Proxy, DNS, Server
    Query Name->
    Answer IP<-
    http GET >->
    <<-html

  Sequence Diagram Output:

    Client          Proxy           DNS         Server
       |              |              |             |
       |----------Query Name-------->|             |
       |<---------Answer IP----------|             |
       |--http GET -->|----------http GET -------->|
       |<----html-----|<-----------html------------|

  Copyright:  Martin Fick <mogulguy@yahoo.com>, Date: 2008-02-15
  License:    None.  This is released into the public domain: do
              as you wish.

EOF
exit
}

[ "$1" = "--help"  -o  "$1" = "-h"  -o  "$1" = "-u" ] &&  Usage


 Hack to attempt to make this somewhat portable


AWK_PROG="`"$0" --awkprog`"

AWK=awk  # default (should work most places)
[ -x /usr/bin/nawk ] && AWK=/usr/bin/nawk # solaris

$AWK "$AWK_PROG" "$@"

Author

Martin Fick


categories: Eliza,Top10,AwkLisp,Interpreters,Dsl,Mar,2009,DariusB

AWKLISP v1.2

Download from

Synopsis

awk [-v profiling=1] -f awklisp [optional-Lisp-source-files]

The -v profiling=1 option turns call-count profiling on.

If you want to use it interactively, be sure to include '-' (for the standard input) among the source files. For example:

gawk -f awklisp startup numbers lists -

Description

Overview

This program arose out of one-upmanship. At my previous job I had to use MapBasic, an interpreter so astoundingly slow (around 100 times slower than GWBASIC) that one must wonder if it itself is implemented in an interpreted language. I still wonder, but it clearly could be: a bare-bones Lisp in awk, hacked up in a few hours, ran substantially faster. Since then I've added features and polish, in the hope of taking over the burgeoning market for stately language implementations.

This version tries to deal with as many of the essential issues in interpreter implementation as is reasonable in awk (though most would call this program utterly unreasonable from start to finish, perhaps...). Awk's impoverished control structures put error recovery and tail-call optimization out of reach, in that I can't see a non-painful way to code them. The scope of variables is dynamic because that was easier to implement efficiently. Subject to all those constraints, the language is as Schemely as I could make it: it has a single namespace with uniform evaluation of expressions in the function and argument positions, and the Scheme names for primitives and special forms.

The rest of this file is a reference manual. My favorite tutorial would be The Little LISPer (see section 5, References); don't let the cute name and the cartoons turn you off, because it's a really excellent book with some mind-stretching material towards the end. All of its code will work with awklisp, except for the last two chapters. (You'd be better off learning with a serious Lisp implementation, of course.)

For more details on the implementation, see the Implementation notes (below).

Examples

fib.lsp

Code:

(define fib
  (lambda (n)
    (if (< n 2)
        1
        (+ (fib (- n 1))
           (fib (- n 2))))))
(fib 20)

Comamnd line:

gawk -f awklisp startup numbers  lists fib.lsp

Output:

10946

Eliza

Here are the standard ELIZA dialogue patterns:

(define rules
  '(((hello)
     (How do you do -- please state your problem))
    ((I want)
     (What would it mean if you got -R-)
     (Why do you want -R-)
     (Suppose you got -R- soon))
    ((if)
     (Do you really think its likely that -R-)
     (Do you wish that -R-)
     (What do you think about -R-)
     (Really-- if -R-))
    ((I was)
     (Were you really?)
     (Perhaps I already knew you were -R-)
     (Why do you tell me you were -R- now?))
    ((I am)
     (In what way are you -R-)
     (Do you want to be -R-))
    ((because)
     (Is that the real reason?)
     (What other reasons might there be?)
     (Does that reason seem to explain anything else?))
    ((I feel)
     (Do you often feel -R-))
    ((I felt)
     (What other feelings do you have?))
    ((yes)
     (You seem quite positive)
     (You are sure)
     (I understand))
    ((no)
     (Why not?)
     (You are being a bit negative)
     (Are you saying no just to be negative?))
    ((someone)
     (Can you be more specific?))
    ((everyone)
     (Surely not everyone)
     (Can you think of anyone in particular?)
     (Who for example?)
     (You are thinking of a special person))
    ((perhaps)
     (You do not seem quite certain))
    ((are)
     (Did you think they might not be -R-)
     (Possibly they are -R-))
    (()
     (Very interesting)
     (I am not sure I understand you fully)
     (What does that suggest to you?)
     (Please continue)
     (Go on)
     (Do you feel strongly about discussing such things?))))

Command line:

gawk -f awklisp startup numbers  lists eliza.lsp -

Interaction:

> (eliza)
Hello-- please state your problem 
> (I feel sick)
Do you often feel sick 
> (I am in love with awk)
In what way are you in love with awk 
> (because it is so easy to use)
Is that the real reason? 
> (I was laughed at by the other kids at space camp)
Were you really? 
> (everyone hates me)
Can you think of anyone in particular? 
> (everyone at space camp)
Surely not everyone 
> (perhaps not tina fey)
You do not seem quite certain 
> (I want her to laugh at me)
What would it mean if you got her to laugh at me 

Expressions and their evaluation

Lisp evaluates expressions, which can be simple (atoms) or compound (lists).

An atom is a string of characters, which can be letters, digits, and most punctuation; the characters may -not- include spaces, quotes, parentheses, brackets, '.', '#', or ';' (the comment character). In this Lisp, case is significant ( X is different from x ).

  • Atoms: atom 42 1/137 + ok? hey:names-with-dashes-are-easy-to-read
  • Not atoms: don't-include-quotes (or spaces or parentheses)

A list is a '(', followed by zero or more objects (each of which is an atom or a list), followed by a ')'.

  • Lists: () (a list of atoms) ((a list) of atoms (and lists))
  • Not lists: ) ((()) (two) (lists)

The special object nil is both an atom and the empty list. That is, nil = (). A non-nil list is called a -pair-, because it is represented by a pair of pointers, one to the first element of the list (its -car-), and one to the rest of the list (its -cdr-). For example, the car of ((a list) of stuff) is (a list), and the cdr is (of stuff). It's also possible to have a pair whose cdr is not a list; the pair with car A and cdr B is printed as (A . B).

That's the syntax of programs and data. Now let's consider their meaning. You can use Lisp like a calculator: type in an expression, and Lisp prints its value. If you type 25, it prints 25. If you type (+ 2 2), it prints 4. In general, Lisp evaluates a particular expression in a particular environment (set of variable bindings) by following this algorithm:

  • If the expression is a number, return that number.
  • If the expression is a non-numeric atom (a -symbol-), return the value of that symbol in the current environment. If the symbol is currently unbound, that's an error.
  • Otherwise the expression is a list. If its car is one of the symbols: quote, lambda, if, begin, while, set!, or define, then the expression is a -special- -form-, handled by special rules. Otherwise it's just a procedure call, handled like this: evaluate each element of the list in the current environment, and then apply the operator (the value of the car) to the operands (the values of the rest of the list's elements). For example, to evaluate (+ 2 3), we first evaluate each of its subexpressions: the value of + is (at least in the initial environment) the primitive procedure that adds, the value of 2 is 2, and the value of 3 is 3. Then we call the addition procedure with 2 and 3 as arguments, yielding 5. For another example, take (- (+ 2 3) 1). Evaluating each subexpression gives the subtraction procedure, 5, and 1. Applying the procedure to the arguments gives 4.
We'll see all the primitive procedures in the next section. A user-defined procedure is represented as a list of the form (lambda <parameters> <body>), such as (lambda (x) (+ x 1)). To apply such a procedure, evaluate its body in the environment obtained by extending the current environment so that the parameters are bound to the corresponding arguments. Thus, to apply the above procedure to the argument 41, evaluate (+ x 1) in the same environment as the current one except that x is bound to 41.

If the procedure's body has more than one expression -- e.g., (lambda () (write 'Hello) (write 'world!)) -- evaluate them each in turn, and return the value of the last one.

We still need the rules for special forms. They are:

  • The value of (quote <x>) is <x>. There's a shorthand for this form: '. E.g., the value of '(+ 2 2) is (+ 2 2), -not- 4.
  • (lambda <parameters> ) returns itself: e.g., the value of (lambda (x) x) is (lambda (x) x).
  • To evaluate (if <test-expr> <then-exp> <else-exp>), first evaluate <test-expr>. If the value is true (non-nil), then return the value of <then-exp>, otherwise return the value of <else-exp>. (<else-exp> is optional; if it's left out, pretend there's a nil there.) Example: (if nil 'yes 'no) returns no.
  • To evaluate (begin <expr-1> <expr-2>...), evaluate each of the subexpressions in order, returning the value of the last one.
  • To evaluate (while <test> <expr-1> <expr-2>...), first evaluate <test>. If it's nil, return nil. Otherwise, evaluate <expr-1>, <expr-2>,... in order, and then repeat.
  • To evaluate (set! <variable> <expr>), evaluate <expr>, and then set the value of <variable> in the current environment to the result. If the variable is currently unbound, that's an error. The value of the whole set! expression is the value of <expr>.
  • (define <variable> <expr>) is like set!, except it's used to introduce new bindings, and the value returned is <variable>.

It's possible to define new special forms using the macro facility provided in the startup file. The macros defined there are:

  • (let ((<var> <expr>)...)
      <body>...)
    Bind each <var> to its corresponding <expr> (evaluated in the current environment), and evaluate <body> in the resulting environment.
  • (cond (<test-expr> <result-expr>...)... (else <result-expr>...))
    where the final else clause is optional. Evaluate each <test-expr> in turn, and for the first non-nil result, evaluate its <result-expr>. If none are non-nil, and there's no else clause, return nil.
  • (and <expr>...)
    Evaluate each <expr> in order, until one returns nil; then return nil. If none are nil, return the value of the last <expr>.
  • (or <expr>...)
    Evaluate each <expr> in order, until one returns non-nil; return that value. If all are nil, return nil.

Built-in procedures

List operations:

  • (null? <x>) returns true (non-nil) when <x> is nil.
  • (atom? <x>) returns true when <x> is an atom.
  • (pair? <x>) returns true when <x> is a pair.
  • (car <pair>) returns the car of <pair>.
  • (cdr <pair>) returns the cdr of <pair>.
  • (cadr <pair>) returns the car of the cdr of <pair>. (i.e., the second element.)
  • (cddr <pair>) returns the cdr of the cdr of <pair>.
  • (cons <x> <y>) returns a new pair whose car is <x> and whose cdr is <y>.
  • (list <x>...) returns a list of its arguments.
  • (set-car! <pair> <x>) changes the car of <pair> to <x>.
  • (set-cdr! <pair> <x>) changes the cdr of <pair> to <x>.
  • (reverse! <list>) reverses <list> in place, returning the result.

Numbers:

  • (number? <x>) returns true when <x> is a number.
  • (+ <n> <n>) returns the sum of its arguments.
  • (- <n> <n>) returns the difference of its arguments.
  • (* <n> <n>) returns the product of its arguments.
  • (quotient <n> <n>) returns the quotient. Rounding is towards zero.
  • (remainder <n> <n>) returns the remainder.
  • (< <n1> <n2>) returns true when <n1> is less than <n2>.

I/O:

  • (write <x>) writes <x> followed by a space.
  • (newline) writes the newline character.
  • (read) reads the next expression from standard input and returns it.

Meta-operations:

  • (eval <x>) evaluates <x> in the current environment, returning the result.
  • (apply <proc> <list>) calls <proc> with arguments <list>, returning the result.

Miscellany:

  • (eq? <x> <y>) returns true when <x> and <y> are the same object. Be careful using eq? with lists, because (eq? (cons <x> <y>) (cons <x> <y>)) is false.
  • (put <x> <y> <z>)
  • (get <x> <y>) returns the last value <z> that was put for <x> and <y>, or nil if there is no such value.
  • (symbol? <x>) returns true when <x> is a symbol.
  • (gensym) returns a new symbol distinct from all symbols that can be read.
  • (random <n>) returns a random integer between 0 and <n>-1 (if <n> is positive).
  • (error <x>...) writes its arguments and aborts with error code 1.

Implementation Notes

Overview

Since the code should be self-explanatory to anyone knowledgeable about Lisp implementation, these notes assume you know Lisp but not interpreters. I haven't got around to writing up a complete discussion of everything, though.

The code for an interpreter can be pretty low on redundancy -- this is natural because the whole reason for implementing a new language is to avoid having to code a particular class of programs in a redundant style in the old language. We implement what that class of programs has in common just once, then use it many times. Thus an interpreter has a different style of code, perhaps denser, than a typical application program.

Data representation

Conceptually, a Lisp datum is a tagged pointer, with the tag giving the datatype and the pointer locating the data. We follow the common practice of encoding the tag into the two lowest-order bits of the pointer. This is especially easy in awk, since arrays with non-consecutive indices are just as efficient as dense ones (so we can use the tagged pointer directly as an index, without having to mask out the tag bits). (But, by the way, mawk accesses negative indices much more slowly than positive ones, as I found out when trying a different encoding.)

This Lisp provides three datatypes: integers, lists, and symbols. (A modern Lisp provides many more.)

For an integer, the tag bits are zero and the pointer bits are simply the numeric value; thus, N is represented by N*4. This choice of the tag value has two advantages. First, we can add and subtract without fiddling with the tags. Second, negative numbers fit right in. (Consider what would happen if N were represented by 1+N*4 instead, and we tried to extract the tag as N%4, where N may be either positive or negative. Because of this problem and the above-mentioned inefficiency of negative indices, all other datatypes are represented by positive numbers.)

The evaluation/saved-bindings stack

The following is from an email discussion; it doesn't develop everything from first principles but is included here in the hope it will be helpful.

Hi. I just took a look at awklisp, and remembered that there's more to your question about why we need a stack -- it's a good question. The real reason is because a stack is accessible to the garbage collector.

We could have had apply() evaluate the arguments itself, and stash the results into variables like arg0 and arg1 -- then the case for ADD would look like

if (proc == ADD) return is(a_number, arg0) + is(a_number, arg1)

The obvious problem with that approach is how to handle calls to user-defined procedures, which could have any number of arguments. Say we're evaluating ((lambda (x) (+ x 1)) 42). (lambda (x) (+ x 1)) is the procedure, and 42 is the argument.

A (wrong) solution could be to evaluate each argument in turn, and bind the corresponding parameter name (like x in this case) to the resulting value (while saving the old value to be restored after we return from the procedure). This is wrong because we must not change the variable bindings until we actually enter the procedure -- for example, with that algorithm ((lambda (x y) y) 1 x) would return 1, when it should return whatever the value of x is in the enclosing environment. (The eval_rands()-type sequence would be: eval the 1, bind x to 1, eval the x -- yielding 1 which is *wrong* -- and bind y to that, then eval the body of the lambda.)

Okay, that's easily fixed -- evaluate all the operands and stash them away somewhere until you're done, and *then* do the bindings. So the question is where to stash them. How about a global array? Like

for (i = 0; arglist != NIL; ++i) {
    global_temp[i] = eval(car[arglist])
    arglist = cdr[arglist]
}

followed by the equivalent of extend_env(). This will not do, because the global array will get clobbered in recursive calls to eval(). Consider (+ 2 (* 3 4)) -- first we evaluate the arguments to the +, like this: global_temp[0] gets 2, and then global_temp[1] gets the eval of (* 3 4). But in evaluating (* 3 4), global_temp[0] gets set to 3 and global_temp[1] to 4 -- so the original assignment of 2 to global_temp[0] is clobbered before we get a chance to use it. By using a stack[] instead of a global_temp[], we finesse this problem.

You may object that we can solve that by just making the global array local, and that's true; lots of small local arrays may or may not be more efficient than one big global stack, in awk -- we'd have to try it out to see. But the real problem I alluded to at the start of this message is this: the garbage collector has to be able to find all the live references to the car[] and cdr[] arrays. If some of those references are hidden away in local variables of recursive procedures, we're stuck. With the global stack, they're all right there for the gc().

(In C we could use the local-arrays approach by threading a chain of pointers from each one to the next; but awk doesn't have pointers.)

(You may wonder how the code gets away with having a number of local variables holding lisp values, then -- the answer is that in every such case we can be sure the garbage collector can find the values in question from some other source. That's what this comment is about:

  # All the interpretation routines have the precondition that their
  # arguments are protected from garbage collection.

In some cases where the values would not otherwise be guaranteed to be available to the gc, we call protect().)

Oh, there's another reason why apply() doesn't evaluate the arguments itself: it's called by do_apply(), which handles lisp calls like (apply car '((x))) -- where we *don't* want the x to get evaluated by apply().

References

  • Harold Abelson and Gerald J. Sussman, with Julie Sussman. Structure and Interpretation of Computer Programs. MIT Press, 1985.
  • John Allen. Anatomy of Lisp. McGraw-Hill, 1978. <;i> Daniel P. Friedman and Matthias Felleisen. The Little LISPer. Macmillan, 1989.

Roger Rohrbach wrote a Lisp interpreter, in old awk (which has no procedures!), called walk . It can't do as much as this Lisp, but it certainly has greater hack value. Cooler name, too. It's available at http://www-2.cs.cmu.edu/afs/cs/project/ai-repository/ai/lang/lisp/impl/awk/0.html

Bugs

Eval doesn't check the syntax of expressions. This is a probably-misguided attempt to bump up the speed a bit, that also simplifies some of the code. The macroexpander in the startup file would be the best place to add syntax- checking.

Author

Darius Bacon dairus@wry.me

Copyright

Copyright (c) 1994, 2001 by Darius Bacon.

Permission is granted to anyone to use this software for any purpose on any computer system, and to redistribute it freely, subject to the following restrictions:

  1. The author is not responsible for the consequences of use of this software, no matter how awful, even if they arise from defects in it.
  2. The origin of this software must not be misrepresented, either by explicit claim or by omission.
  3. Altered versions must be plainly marked as such, and must not be misrepresented as being the original software.

categories: Awk100,Top10,Interpreters,Dsl,Apr,2009,HenryS

Amazing Awk Assembler

Download from

Download from LAWKER.

Description

"aaa" (the Amazing Awk Assembler) is a primitive assembler written entirely in awk and sed. It was done for fun, to establish whether it was possible. It is; it works. It's quite slow, the input syntax is eccentric and rather restricted, and error-checking is virtually nonexistent, but it does work. Furthermore it's very easy to adapt to a new machine, provided the machine falls into the generic "8-bit-micro" category. It is supplied "as is", with no guarantees of any kind. I can't be bothered to do any more work on it right now, but even in its imperfect state it may be useful to someone.

aaa is the mainline shell file.

aux is a subdirectory with machine-independent stuff. Anon, 6801, and 6809 are subdirectories with machine-dependent stuff, choice specified by a -m option (default is "anon"). Actually, even the stuff that is supposedly machine-independent does have some machine-dependent assumptions; notably, it knows that bytes are 8 bits (not serious) and that the byte is the basic unit of instructions (more serious). These would have to change for the 68000 (going to 16-bit "bytes" might be sufficient) and maybe for the 32016 (harder).

aaa thinks that the machine subdirectories and the aux subdirectory are in the current directory, which is almost certainly wrong.

abst is an abstract for a paper. "card", in each machine directory, is a summary card for the slightly-eccentric input language. There is no real manual at present; sorry.

try.s is a sample piece of 6809 input; it is semantic trash, purely for test purposes. The assembler produces try.a, try.defs, and try.x as outputs from "aaa try.s". try.a is an internal file that looks somewhat like an assembly listing. try.defs is another internal file that looks somewhat like a symbol table. These files are preserved because of possible usefulness; tmp[123] are non-preserved temporaries. try.x is the Intel-hex output. try.x.good is identical to try.x and is a saved copy for regression testing of new work.

01pgm.s is a self-programming program for a 68701, based on the one in the Motorola ap note. 01pgm.x.good is another regression-test file.

If your C library (used by awk) has broken "%02x" so it no longer means "two digits of hex, *zero-filled*" (as some SysV libraries have), you will have to fall back from aux/hex to aux/hex.argh, which does it the hard way. Oh yes, you'll note that aaa feeds settings into awk on the command line; don't assume your awk won't do this until you try it.

Author

Henry Spencer


categories: Wp,Dsl,Jul,2009,JesusG

md2html : Update to Markdown.awk

Jesus Galan (yiyus) (yiyu DOT jgl AT gmail DOT com) has updated his markdown system.

His new md2html.awk code adds several new functionality extensions and implements numerous bug fixes.

For more on this new code, see his history of a rewrite.

Download

Download from LAWKER.


categories: Top10,Wp,Dsl,Mar,2009,JesusG

Markdown.awk

Contents

Synopsis

awk -f markdown.awk file.txt > file.html

Download

Download from LAWKER.

Description

(Note: this code was orginally called txt2html.awk by its author but that caused a name clash inside LAWKER. Hence, I've taken the liberty of renamining it. --Timm)

The following code implements a subset of John Gruber's Markdown langauge: a widely-used, ultra light-weight markup language for html.

  • Paragraghs- denoted by a leading blank line.
  • Images:
    ![alt text](/path/img.jpg "Title")
  • Emphasis: **To be in italics**
  • Code: `<code>` spans are delimited by backticks.
  • Headings (Setex style)
    Level 1 Header 
    =============== 
    
    Level 2 Header
    --------------
    
    Level 3 Header 
    ______________
    
  • Heaings (Atx style):

    Number of leading "#" codes the heading level:

    # Level 1 Header
    #### Level 4 Header
    
  • Unordered lists
  • - List item 1
    - List item 2
    

    Note: beginnging and end of list are automatically inferred, maybe not always correctly.

  • Ordered lists
  • Denoted by a number at start-of-line.

    1 A numbered list item
    

Code

The following code demonstrates a "exception-style" of Awk programming. Note how all the processing relating to each mark-up tag is localized (exception, carrying round prior text and environments). The modularity of the following code should make it easily hackable.

Globals

BEGIN {
	env = "none";
	text = "";
}

Images

/^!\[.+\] *\(.+\)/ {
	split($0, a, /\] *\(/);
	split(a[1], b, /\[/);
	imgtext = b[2];
	split(a[2], b, /\)/);
	imgaddr = b[1];
	print "<p><img src=\"" imgaddr "\" alt=\"" imgtext "\" title=\"\" /></p>\n";
	text = "";
	next;
}

Links

/\] *\(/ {
	do {
		na = split($0, a, /\] *\(/);
		split(a[1], b, "[");
		linktext = b[2];
		nc = split(a[2], c, ")");
		linkaddr = c[1];
		text = text b[1] "<a href=\"" linkaddr "\">" linktext "</a>" c[2];
		for(i = 3; i <= nc; i++)
			text = text ")" c[i];
		for(i = 3; i <= na; i++)
			text = text "](" a[i];
		$0 = text;;
		text = "";
	}
	while (na > 2);
}

Code

/`/ {
	while (match($0, /`/) != 0) {
		if (env == "code") {
			sub(/`/, "</code>");
			env = pcenv;
		}
		else {
			sub(/`/, "<code>");
			pcenv = env;
			env = "code";
		}
	}
}

Emphasis

/\*\*/ {
	while (match($0, /\*\*/) != 0) {
		if (env == "emph") {
			sub(//, "</emph>");
			env = peenv;
		}
		else {
			sub(/\*\*/, "<emph>");
			peenv = env;
			env = "emph";
		}
	}
}

Setex-style Headers

(Plus h3 with underscores.)

/^=+$/ {
	print "<h1>" text "</h1>\n";
	text = "";
	next;
}

/^-+$/ {
	print "<h2>" text "</h2>\n";
	text = "";
	next;
}

/^_+$/ {
	print "<h3>" text "</h3>\n";
	text = "";
	next;
}

Atx-style headers

/^#/ {
	match($0, /#+/);
	n = RLENGTH;
	if(n > 6)
		n = 6;
	print "<h" n ">" substr($0, RLENGTH + 1) "</h" n ">\n";
	next;
}

Unordered Lists

/^[*-+]/ {
	if (env == "none") {
		env = "ul";
		print "<ul>";
	}
	print "<li>" substr($0, 3) "</li>";
	text = "";
	next;
}

/^[0-9]./ {
	if (env == "none") {
		env = "ol";
		print "<ol>";
	}
	print "<li>" substr($0, 3) "</li>";
	next;
}

Paragraphs

/^[ t]*$/ {
	if (env != "none") {
		if (text)
			print text;
		text = "";
		print "</" env ">\n";
		env = "none";
	}
	if (text)
		print "<p>" text "</p>\n";
	text = "";
	next;
}

Default

// {
	text = text $0;
}

End

END {
        if (env != "none") {
                if (text)
                        print text;
                text = "";
                print "</" env ">\n";
                env = "none";
        }
        if (text)
                print "<p>" text "</p>\n";
        text = "";
}

Bugs

Does not implement the full Markdown syntax.

Author

Jesus Galan (yiyus) 2006

<yiyu DOT jgl AT gmail DOT com>

categories: Awk100,Oo,Dsl,Mar,2009,Jimh

Awk++

Contents

Synopsis

 gawk -f awkpp file-name-of-awk++-program
This command is platform independent and sends the translated program to standard output (stdout). See Running awk++ for variations.

This is an updated revision (#21), released August 1, 2009. In this new version:

  • The code no longer needs a shell script or batch file to launch awkpp
  • Multiple inheritance improved
  • added configuration items at the top of the program
This document may be copied only as part of an awk++ distribution and in unmodified form.

Download

Download awkpp21.zip from LAWKER

Description

Awk++ is a preprocessor, that is it reads in a program written in the awk++ language and outputs a new program. However, it's different than awka. The output from the awk++ preprocessor is awk code, not C or an executable program. So, some version of AWK, such as awk or gawk, has to be used to run the preprocessed program. awka can be used, in a second step, to turn the preprocessed awk++ program into an executable, if desired.

OO in AWK++

The awk++ language provides object oriented programming for AWK that includes:

  • classes
  • class properties (persistent object variables)
  • methods
  • inheritance, including multiple inheritance

Awk++ adds new keywords to standard Awk:

  • class
  • method
  • prop
  • property
  • attr
  • attribute
  • elem
  • element
  • var
  • variable

Syntax

Samples:

 a = class1.new[(optional parameters)] *** similar to Ruby
 b = a.get("aProperty")
 a.delete

 class class1 {
 property aProperty
 method new([optional parameters]) {
 # put initialization stuff here
 }

 method get(propName) {
 if(propName = "aProperty")
 return aProperty ### Note the use of 'return'. It behaves
 ### exactly the same as in an AWK function.
 }
 }

Details

To define a class (similar to C++ but no public/private):

class class_name {.....}

To define a class with inheritance:

class class_name : inherited_class_name [ : inherited_class_name...] {.....}

To add local/private variables (persistent variables; syntax is unique to awk++):

class class_name {
 attribute|attr|property|prop|element|elem|variable|var variable_name
 ..... }

To help programmers who are used to other OO languages, "attribute", "property", "element", and "variable", along with their 4-letter abbreviations, are interchangeable.

Note: these persistent variables cannot be accessed directly. The programmer must define method(s) to return them, if their values are to be made available to code that's outside the class.

To add methods

class class_name {
 attribute variable_name1

 method method_name(parameters) {
 ...any awk code....
 }
 ..other method definitions...
 }

To create an object

 object_variable = class_name.new[(optional parameters)]
(runs the method named "new", if it exists; returns the object ID)

To call an object method

object_variable.method_name(parameters)

The dot isn't used for concatenation in awk/gawk, so it's a natural choice for the separator between the object and method.

To reclaim the memory used by an object, use the delete method, i.e.:

object_variable.delete

but don't define delete() in your classes. awk++ recognizes delete() as a special method and will take care of deleting the object. Deleting objects is only necessary, though, if they hold a lot of data. Overhead for objects themselves is insignificant.

Naming and behavior rules:

  • Class names must obey the same rules as user defined function names.
  • Method names must follow the same rules as AWK user defined function names.
  • Class "local" variables (properties, attributes, etc.) must follow the same
  • naming rules as AWK variables.
  • Objects are number variables, so they must obey number variable rules. However,
  • the values in variables holding objects should never be changed, as they are simply identifiers. Performing math operations on them is meaningless.

Syntax notes

OO syntax goals:

  • easy to parse and match to awk code using an awk program as the "preprocessor"
  • easy to understand
  • easy to remember
  • easy and fast to type
  • distinct from existing AWK syntax

The OO syntax is based partly on C++, partly on Javascript, partly on Ruby and partly on the book "The Object-Oriented Thought Process". It isn't lifted in toto from one langauage because other languages provide features that gawk can't accomplish or have syntax that is hard to parse.

Multiple Inheritance

In awk++, if a method is called that isn't in the object's class and there are inherited classes (superclasses) specified, the inherited classes are called in left to right order until one of them returns a value. That value becomes the result of the method call. This is the way awk++ resolves the diamond problem. As a programmer, you control the sequence in which superclasses are called by the left to right order of the list of inherited classes in the class definition.

There are two important things to note.

  1. The search will proceed up through as many ancestors as it takes to find a matching method.
  2. A "match" is made when a value is returned. If a superclass has a matching
  3. method that returns nothing, the search will continue. Thus, it's possible that more than one method could be executed resulting in unintended consequences. Be careful!

Calls to undefined methods do nothing and return nothing, silently.

Running awk++

The command to preprocess an awk++ program looks like this:

gawk -f awkpp file-name-of-awk++-program
or, if the "she-bang" line (line 1 in awkpp) has the right path to gawk, and awkpp is executable and in a directory in PATH,
awkpp file-name-of-awk++-program
To run the output program immediately,
gawk -f awkpp -r file-name-of-awk++-program [awk options] data-files-to-be-processed
or
awkpp -r file-name-of-awk++-program [awk options] data-files-to-be-processed
When running an awk++ program immediately, standard input (stdin) cannot be used for data. One or more data file paths must be listed on the command line.

Bugs

There is a bug in the standard AWK distributions that affects the preprocessor. Additionally, the preprocessor uses the 3rd array option of the match() function. So, it's best to use GAWK to run the preprocessor.

On the other hand, the AWK code created by translating awk++ is intended to work with all versions of AWK. If you find otherwise, please notify the developer(s).

Copyright

Copyright (c) 2008, 2009 Jim Hart, jhart@mail.avcnet.org All rights reserved. The awk++ code is licensed under the GNU Public license (GPL) any version. awk++ documentation, including this page, may be copied only in unmodified form, subject to fair use guidelines.

Author

Jim Hart, jhart@mail.avcnet.org

categories: Awk100,Oo,Dsl,May,2009,AlexS

Awk + ANSI-C = OO

Description

ooc is an awk program which reads class descriptions and performs the routine coding tasks necessary to do object-oriented coding in ANSI C.

The tool is exceptionally well documented in Object oriented programming with ANSI-C.

Download

Download a 2002 copy of this code from LAWKER.

Or go to the author's web site.

Description

ooc is a technique to do object-oriented programming (classes, methods, dynamic linkage, simple inheritance, polymorphisms, persistent objects, method existence testing, message forwarding, exception handling, etc.) using ANSI-C.

ooc is a preprocessor to simplify the coding task by converting class descriptions and method implementations into ANSI-C as required by the technique. You implement the algorithms inside the methods and the ooc preprocessor produces the boilerplate.

ooc consists of a shell script driving a modular awk script (with provisions for debugging), a set of reports -- code generation templates -- interpreted by the script, and the source of a root class to provide basic functionality. Everything is designed to be changed if desired. There are manual pages, lots of examples, among them a calculator based on curses and X11, and you can ask me about the book.

ooc as a technique requires an ANSI-C system -- classic C would necessitate substantial changes. The preprocessor needs a healthy Bourne-Shell and "new" awk as described in Aho, Weinberger, and Kernighan's book.

ooc was developed primarily to teach about object-oriented programming without having to learn a new language. If you see how it is done in a familiar setting, it is much easier to grasp the concepts and to know what miracles to expect from the technique and what not. Conceivably, the preprocessor can be used for production programming but this was not the original intent. Being able to roll your own object-oriented coding techniques has its possibilities, however...

Technical Details

Most sources should be viewed with tab stops set at 4 characters.

The original system ran on NeXTSTEP 3.2 and older, ESIX (System V) 4.0.4, and Linux 0.99.pl4-49. This rerelease was tested on MacOS X version 10.1.2 and Solaris version 5.8. You need to review paths in the script 'ooc/ooc' before running anything. Make sure the first line of this script points to a Bourne-style shell. Also make sure that the first line of '09/munch' points to a (new) awk.

The rereleased 'ooc' awk-programs have been tested with GNU awk versions 3.0.1 and 3.0.3. Previous versions did not support AWKPATH properly (but this is not essential).

The makefiles could be smarter but they are naive enough for all systems. This is a heterogeneous system -- set the environment variable $OSTYPE to an architecture-specific name. 'make' in the current directory will create everything by calling 'make' in the various subdirectories. Each 'makefile' includes 'make/Makefile.$OSTYPE', review your 'make/Makefile.$OSTYPE' before you start.

The following make calls are supported throughout:

make [all]	create examples
make test	[make and] run examples
make clean	remove all but sources
make depend	make dependencies (if makefile.$OSTYPE supports it)

Make dependencies can be built with the -MM option of the GNU C compiler. They are stored in a file 'depend' in each subdirectory. They should apply to all systems. 'makefile.$OSTYPE' may include a target 'depend' to recreate 'depend' -- check 'makefile.darwin1.4' for an example.

Contents

The following is a walk through the file hierarchy in the order of the book:

makefile
dispatch standard make calls to known directories
make/
Makefile: boilerplate code for makefiles
01/*
chapter 1: abstract data types
  • sets: Set demo
  • bags: Bag demo: Set with reference count
02/*
chapter 2: dynamic linkage
  • strings: String demo
  • atoms: Atom demo: unique String
03/*
chapter 3: manipulating expressions with dyn. linkage
  • postfix: postfix output of expression
  • value: expression evaluation
  • infix: infix output of expression
04/*
chapter 4: inheritance
  • points: Point demo
  • circles: Circle demo: Circle: Point with radius
05/*
chapter 5: symbol table with inheritance
  • value: expression evaluation with vars, consts, functions
06/*
chapter 6: class hierarchy and meta classes
  • any: objects that do not differ from any object
07/*
chapter 7: ooc preprocessor; use ooc -7
  • points: Point demo: PointClass is a new metaclass
  • circles: Circle demo: Circle is a new class
  • queue: Queue demo: List is an abstract base class
  • stack: Stack demo: another subclass of List
08/*
chapter 8: dynamic type checking; use ooc -8
  • circles: Circle demo: nothing changed
  • list: List demo: traps insertion of numbers or strings
09/*
chapter 9: automatic initialization; use ooc -9
  • munch: awk program to collect class list from nm -p output
  • circles: Circle demo: no more init calls
  • list: List demo: no more init calls
10/*
chapter 10: respondsTo method; use ooc -10
  • cmd: Filter demo: how flags and options are handled
  • wc: word count filter
  • sort: sorting filter, adds sort method to List
11/*
chapter 11: class methods
  • value: expression evaluator, based on class hierarchy
  • value: x memory reclamation enabled
12/*
chapter 12: persistent objects
  • value: expression evaluator, with save and load
13/*
chapter 13: exception handling
  • value: expression evaluator with exception handler
  • except: Exception demo
14/*
chapter 14: message forwarding
  • makefile.etc: (naive) generated rules for the library
  • Xapp: resources for X11-based programs
  • hello: LineOut demo: hello, world
  • button: Button demo
  • run: terminal-oriented calculator
  • cbutton: Crt demo: hello, world changes into a
  • crun: curses-based caluclator
  • xhello: XLineOut demo: hello, world
  • xbutton: XButton demo with XawBox and XawForm
  • xrun: X11-based calculator with callbacks
man/*
manual pages
  • *.1: tools
  • *.2: functions
  • *.3: some classes
  • *.4: classes in chapter 14
ooc/*
ooc preprocessor
  • ooc: command script; review 'home' 'OOCPATH' 'AWKPATH'
  • awk/*.awk: modules
  • awk/*.dbg: debugging modules
  • rep/*.rep: reports
  • rep-*/*.rep: reports for early chapters

Copyright

Copyright (c) 1993

While you may use this software package, neither I nor my employers can be made responsible for whatever problems you might cause or encounter.

While you may give away this package and/or software derived with it, you should not charge for it, you should not claim that ooc is your work, and I have published my own book about ooc before you did.

The same restrictions apply to whoever might get this package from you.

Author

Axel T. Schreiner, http://www.cs.rit.edu/~ats/
blog comments powered by Disqus