About awk.info
» table of contents
» featured topics
» page tags
|
|
|
|
|
|
Mar 01: Michael Sanders demos an X-windows GUI for AWK.
Mar 01: Awk100#24: A. Lahm and E. de Rinaldis' patent search, in AWK
Feb 28: Tim Menzies asks this community to write an AWK cookbook.
Feb 28: Arnold Robbins announces a new debugger for GAWK.
Feb 28: Awk100#23: Premysl Janouch offers a IRC bot, In AWK
Feb 28: Updated: the AWK FAQ
Feb 28: Tim Menzies offers a tiny content management system, in Awk.
Jan 31: Comment system added to awk.info. For example, see discussion bottom of ?keys2awk
Jan 31: Martin Cohen shows that Gawk can handle massively long strings (300 million characters).
Jan 31: The AWK FAQ is being updated. For comments/ corrections/ extensions, please mail tim@menzies.us
Jan 31: Martin Cohen finds Awk on the Android platform.
Jan 31: Aleksey Cheusov released a new version of runawk.
Jan 31: Hirofumi Saito contributes a candidate Awk mascot.
Jan 31: Michael Sanders shows how to quickly build an AWK GUI for windows.
Jan 31: Hyung-Hwan Chung offers QSE, an embeddable Awk Interpreter.
These pages focus on domain-specific languages (a.k.a. "little langauges") written in Awk.
These little languages can range from the simple to the quite intricate. For example, LAWKER contains code for
Interestingly, without comments, the LISP interpreter is only three times longer than the HTML markup language. This comments either on the power of Awk, the regularity of LISP's core semantics, or both.
gawk -f graph.awk graphFile
A processor for a little language, specialized for graph-drawing.
The code inputs data, which includes a specification of a graph The output is data plotted in specified areaFor example, here is an input specification:
label here's some stuff bottom ticks 1 5 10 left ticks 1 2 10 20 range 1 1 10 22 height 10 width 30 1 2 * 2 4 * 3 6 * 4 8 * 7 14 + 8 12 + 9 10 + mb 0.9 11 =
It produces the following output
|----------------------|
20 - = = =
| = = = = |
= = = + + |
10 - + |
| * * |
| * |
2 *---------|------------|
1 5 10
here's some stuff
Set frame dimensions: height and width; offset for x and y axes.
BEGIN {
ht = 24; wid = 80
ox = 6; oy = 2
number = "^[-+]?([0-9]+[.]?[0-9]*|[.][0-9]+)" \
"([eE][-+]?[0-9]+)?$"
}
Skip comments
/^[ \t]*#/ { next }
Simple tags
$1 == "height" { ht = $2; next }
$1 == "width" { wid = $2; next }
$1 == "label" { # for bottom
sub(/^ *label */, "")
botlab = $0
next
}
$1 == "bottom" && $2 == "ticks" { # ticks for x-axis
for (i = 3; i <= NF; i++) bticks[++nb] = $i
next
}
$1 == "left" && $2 == "ticks" { # ticks for y-axis
for (i = 3; i <= NF; i++) lticks[++nl] = $i
next
}
$1 == "range" { # xmin ymin xmax ymax
xmin = $2; ymin = $3; xmax = $4; ymax = $5
next
}
Handling numerics.
$1 ~ number && $2 ~ number { # pair of numbers
nd++ # count number of data points
x[nd] = $1; y[nd] = $2
ch[nd] = $3 # optional plotting character
next
}
$1 ~ number && $2 !~ number { # single number
nd++ # count number of data points
x[nd] = nd; y[nd] = $1; ch[nd] = $2
next
}
Line functions, defined by a slope "m" and a y-intercept "b".
$1 == "mb" { # m b [mark]
expand()
for(i=xmin;i<=xmax;i++) {
nd++; x[nd]=i; y[nd]=$2*i + $3; ch[nd]=$4
}
next;
}
Final case: input error.
{ print "?? line " NR ": ["$0"]" >"/dev/stderr" }
Draw the graph
END { expand(); frame(); ticks(); label(); data(); draw() }
Expand the "x" and "y" boundaries to include all points.
function expand(note) { if (xmin == "") expand1(note) }
function expand1(note) {
xmin = xmax = x[1]
ymin = ymax = y[1]
for (i = 2; i <= nd; i++) {
if (x[i] < xmin) xmin = x[i]
if (x[i] > xmax) xmax = x[i]
if (y[i] < ymin) ymin = y[i]
if (y[i] > ymax) ymax = y[i] }
}
Draw the frame around the graph.
function frame() {
for (i = ox; i < wid; i++) plot(i, oy, "-") # bottom
for (i = ox; i < wid; i++) plot(i, ht-1, "-") # top
for (i = oy; i < ht; i++) plot(ox, i, "|") # left
for (i = oy; i < ht; i++) plot(wid-1, i, "|") # right
}
Create tick marks for both axes.
function ticks( i) {
for (i = 1; i <= nb; i++) {
plot(xscale(bticks[i]), oy, "|")
splot(xscale(bticks[i])-1, 1, bticks[i])
}
for (i = 1; i <= nl; i++) {
plot(ox, yscale(lticks[i]), "-")
splot(0, yscale(lticks[i]), lticks[i])
}
}
Center labels under x-axis.
function label() {
splot(int((wid + ox - length(botlab))/2), 0, botlab)
}
Create data points.
function data( i) {
for (i = 1; i <= nd; i++)
plot(xscale(x[i]),yscale(y[i]),ch[i]=="" ? "*" : ch[i])
for(i in mark) print mark[i]
}
Print graph from array.
function draw( i, j) {
for (i = ht-1; i >= 0; i--) {
for (j = 0; j < wid; j++)
printf((j,i) in array ? array[j,i] : " ")
printf("\n")
}
}
Scale x-values, y-values.
function xscale(x) {
return int((x-xmin)/(xmax-xmin) * (wid-1-ox) + ox + 0.5)
}
function yscale(y) {
return int((y-ymin)/(ymax-ymin) * (ht-1-oy) + oy + 0.5)
}
Put one character into array.
function plot(x, y, c) {
array[x,y] = c
}
Put string "s" into array.
function splot(x, y, s, i, n) {
n = length(s)
for (i = 0; i < n; i++)
array[x+i, y] = substr(s, i+1, 1)
}
This code comes from the original Awk book by Alfred Aho, Peter Weinberger & Brian Kernighan and contains some small modifications by Tim Menzies.
gawk -f uml.sh file.sdml > sequence_diagram
This program will turn SDML into simple ascii text uml sequence diagrams. SDML is an extremely simplistic uml Sequence Diagram Markup Language. SDML is specified as:
Given this input:
[Client, Proxy, DNS, Server
Query Name->
Answer IP<-
http GET >->
<<-html
this code generates:
Client Proxy DNS Server
| | | |
|----------Query Name-------->| |
|<---------Answer IP----------| |
|--http GET -->|----------http GET -------->|
|<----html-----|<-----------html------------|
if [ "$1" = "--awkprog" ] ; then
cat - <<"EOF"
BEGIN {
EFS="[|<>-]";
AFS="[<>-]";
RAFS="[{}RL]";
FS= EFS;
ARROWS = 2 ; # Arrowhead constant
ST=1;
ARG["EP"] = 1; # Event Padding
ARG["ES"] = 0; # Event Spacing (lines below)
ARG["EA"] = 0; # Events Above
ARG["HP"] = 2; # Header Padding
ARG["HS"] = 1; # Header Spacing (lines below)
ARG["LM"] = 0; # Left Margin
ARG["SP"] = 2; # Start Row Padding (For continuous operation)
ARG["TSM"] = 1; # Text Spacing Margin (lines above & below)
ARG["TD"] = 1; # Text Dots (instead of bars in text margins)
ARG["SS"] = 1; # Enable Single Arrow Spans (|---A-->|, not |-A-+-A>|)
}
function padding(outter, inner, extra ,p,m) {
p = (outter - inner);
m = p % 2 ;
p = ((p - m)/2) + (extra ? m:0);
if(p<0) return 0;
return p;
}
function pad(char, count ,i,r) {
for(i=1 ; i <= count ; i++) { r = r char };
return r;
}
function ltrim(s) { gsub(/^[ ]*/, "", s) ; return s; }
function center(string, width, padchar, favor ,p,r,sw) {
sw = length(string);
p = padding(width, sw, favor=="r"?1:0);
r = pad(padchar, p);
r = r string;
p = padding(width, sw, favor=="r"?0:1);
return r pad(padchar, p);
}
function getevent_rev(row, field ,p) {
for(p=field-1; p>0; p--) { # search to the left
if(RF_s[row,p] !~ AFS) return "";
if(RF_f[row,p] != "") return RF_f[row,p];
}
return "";
}
function getevent_for(row, field ,n) {
for(n=field+1; n <= R_nf[row]; n++) { # search to the right
if(RF_s[row,n-1] !~ AFS) return "";
if(RF_f[row,n] != "") return RF_f[row,n];
}
return "";
}
function rlarrow(arrow, prevarrow) {
if(arrow == ">") return "R";
if(arrow == "<") return "L";
if(arrow == "R" || arrow=="L") return arrow;
return prevarrow;
}
function debug_events(s) {
for(r=1; r <= NRS; r++) debug_row(r, s);
}
function debug_row(r, s) {
if(!DEBUG_ROW) return;
printf("Row["r"]/Stage["s"]: ");
for(f=1; f <= R_nf[r]; f++) {
printf(f"="RF_f[r,f]"("RF_s[r,f]") ");
}
printf("\n");
}
function print_bars(num, char ,i,out) {
if(char == "") char = "|";
while(num--) {
# Center the bars under the Headers
out = pad(" ", F_width[0]);
for(i=1; i<= NH; i++) {
out = out char pad(" ", F_width[i]);
}
print out;
}
}
function print_event(r, type ,i,bar,out,aspad,span_width,arrow){
out = pad(" ", F_width[0]);
for(i=1; i<= MAXNF; i++) {
out = out "|";
arrow=" ";
if(type == "both" || type == "arrow") {
if(RF_s[r,i] == "{") arrow = "<";
if(RF_s[r,i] ~ /[}RL]/) arrow = "-";
}
out = out arrow;
aspad = "-"; # arrow or space pad
if(RF_s[r,i] == "|" || RF_s[r,i] == ""|| type == "event") aspad = " ";
span_width = F_width[i];
if(ARG["SS"]) while(RF_s[r,i] == "R" || RF_s[r,i+1] == "L") {
span_width += 1 + F_width[++i]; # include bar
}
event ="";
if(type == "both" || type == "event") {
event = RF_f[r,i];
}
out = out center(event, span_width - ARROWS, aspad, i>MAXNF/2? "r":"l");
if(type == "both" || type == "arrow") {
if(RF_s[r,i] == "}") arrow = ">";
if(RF_s[r,i] ~ /[{RL]/) arrow = "-";
}
out = out arrow;
}
out = out "|";
print out;
}
function print_sd(start_row) {
print " 1 2 3 4 5 6"
print "123456789012345678901234567890123456789012345678901234567890"
if(start_row!=1) { for(i=0; i<ARG["SP"];i++) print ""; }
for(j=start_row; j<= NRS; j++) {
if(R_ltype[j] == "Header") {
NH = R_nf[j];
out = pad(" ", ARG["HP"]+ARG["LM"]);
i =1;
out = out RF_f[j,i];
hp = ARG["HP"] + ARG["LM"] + RF_l[j,i]; # header pointer (last char)
bp = F_width[0] + 1 + F_width[i] + 1; # bar pointer
print "HP:" hp " BP: "bp
for(i=2; i<= NH; i++) {
l = int(RF_l[j,i]/2); r = RF_l[j,i] -l; # Header left & right
lp = (bp - hp) - (l + 1); # left padding
out = out pad(" ", lp) RF_f[j,i];
hp = bp + r - 1;
bp = bp + F_width[i] + 1;
print "HP:" hp " BP: "bp " LP:"lp " r:"r" l:"l
}
print out;
print_bars(ARG["HS"]);
}
if(R_ltype[j] == "Text") {
if(R_ltype[j-1] != "Text") {
if(ARG["TD"]) {
print_bars(ARG["TSM"], ".");
} else {
for(l=0;l<ARG["TSM"]; l++) print "";
}
}
if(T_type[j] == "indent") printf(pad(" ", F_width[0]));
print RF_f[j,1];
if(R_ltype[j+1] != "Text") {
if(ARG["TD"]) {
print_bars(ARG["TSM"], ".");
} else {
for(l=0;l<ARG["TSM"]; l++) print "";
}
}
}
if(R_ltype[j] == "Event") {
if (ARG["EA"]) {
print_event(j, "event");
print_event(j, "arrow");
} else print_event(j, "both");
print_bars(ARG["ES"]);
}
}
return j;
}
/^[ ]*#/ {next} # we don't want bars for comment only lines!
/#/ { $0 = sub(/#.*$/, ""); }
/^:/ {
print "Argument Variable Assignment" $0
i = split(substr($0,2), v, /,/);
for(;i>0;i--) {
j = split(v[i], kv, "=");
if(j==1) { ARG[kv[1]]= ""; }
if(j==2) { ARG[kv[1]]=kv[2]; }
}
for(k in ARG) { printf("ARG["k"]='"ARG[k]"' "); } ; print "";
next ;
}
{
NRS++; # NRSequences
}
/^;/ { ST=print_sd(ST); next; } # Allow continuous operation
/^@/ {
print "text line"
R_ltype[NRS] = "Text";
T_type[NRS] = "left";
sub(/^@/,"");
RF_f[NRS,1]=$0;
next;
}
/^"/ {
print "text line"
R_ltype[NRS] = "Text";
T_type[NRS] = "indent";
sub(/^"/,"");
RF_f[NRS,1]=$0;
next;
}
/^\[/ {
print "Event Headers (Titles)" $0
R_ltype[NRS] = "Header";
sub(/^\[/,"");
FS=","; $0 = $0; # resplit line
R_nf[NRS] = NF;
if(MAXNF < R_nf[NRS]-1) MAXNF= R_nf[NRS]-1; # print MAXNF;
for(i=1; i<= NF; i++) {
f= ltrim($i);
RF_f[NRS,i]=f;
RF_l[NRS,i]= length(f);
RF_s[NRS,i]= ",";
}
for(i=1; i<= NF; i++) {
F_width[i] = padding(RF_l[NRS,i] + 2*ARG["HP"], 1, 1) +\
padding(RF_l[NRS,i+1] + 2*ARG["HP"], 1, 0)\
-1; # Do not include width of bar
if(F_width[i] < 2*ARG["HP"]) F_width[i] = 2*ARG["HP"];
print padding(RF_l[NRS,i] + 2*ARG["HP"], 1, 1) " "\
padding(RF_l[NRS,i+1] + 2*ARG["HP"], 1, 0);
}
F_width[0] = padding(RF_l[NRS,1] + 2*ARG["HP"], 1, 1);
print padding(RF_l[NRS,1] + 2*ARG["HP"],1,0);
if(F_width[0] < ARG["HP"]) F_width["0"] = ARG["HP"];
F_width[0] += ARG["LM"];
for(i=0; i<= MAXNF; i++) printf("FW["i"]="F_width[i]" "); print ""
FS=EFS;
next;
}
{
print "Event Line: " $0 ; DEBUG_ROW=1;
R_ltype[NRS] = "Event";
stl=0;
for(i=1; i<= NF; i++) {
f = $i;
l = length(f);
stl += l +1;
s = substr($0, stl, 1);
RF_f[NRS,i]= f;
RF_s[NRS,i]= s;
}
R_nf[NRS] = NF;
debug_row(NRS, 1);
# Fill in missing (assumed) fields
for(i=1; i<= R_nf[NRS]; i++) {
if (RF_f[NRS,i]=="") RF_f[NRS,i] = getevent_rev(NRS, i);
if (RF_f[NRS,i]=="") RF_f[NRS,i] = getevent_for(NRS, i);
}
debug_row(NRS, 2);
# -> <- ->> >-> <-< <<-
# >- -< >>- -<<
# R> <L R>> >R> <L< <<L
for(i=1; i<= R_nf[NRS]; ) {
if(RF_s[NRS,i] ~ AFS) {
if(RF_s[NRS,i] == "-") { # left tail
for(n=i+1; n<= R_nf[NRS]; n++) {
if(RF_s[NRS,n]==">") {
pi=i; i=n; RF_s[NRS,n]="}";
for(n--; n>=pi; n--) RF_s[NRS,n]="R"; n= R_nf[NRS];
} else if(RF_s[NRS,n]=="<") {
pi=i; i=n; RF_s[NRS,pi]="{";
for(; n>pi; n--) RF_s[NRS,n]="L"; n= R_nf[NRS];
}
}
i++;
} else if(RF_s[NRS,i+1] != "-") { # singleton
RF_s[NRS,i]= RF_s[NRS,i]==">" ? "}":"{";
i++;
} else {
rl= rlarrow(RF_s[NRS,i], "");
for(n=i+1; n<= R_nf[NRS] && RF_s[NRS,n] ~ AFS; n++) {
rl= rlarrow(RF_s[NRS,n], rl);
}
n--;
if (RF_s[NRS,n] == "-") { # right tail
if (rl=="R") RF_s[NRS,n--]="}";
for(; n>=i && RF_s[NRS,n] == "-"; n--) RF_s[NRS,n]=rl;
if (rl=="L") RF_s[NRS,n]="{"; else RF_s[NRS,n]="R";
} else if (RF_s[NRS,n-1] != "-") { # singleton
RF_s[NRS,n]= RF_s[NRS,n]==">" ? "}":"{";
} else { # double ended -
if(RF_s[NRS,i]=="<") { # trumps no matter what
RF_s[NRS,i]="{";
for(i++; i<= R_nf[NRS] && RF_s[NRS,i]=="-"; i++) {
RF_s[NRS,i]="L";
}
} else {
for(n=i+1; n<= R_nf[NRS] && RF_s[NRS,n] =="-"; n++) ;
if(RF_s[NRS,n]==">") {
RF_s[NRS,n]="}";
for(n--; n>i && RF_s[NRS,n]=="-"; n--) {
RF_s[NRS,n]="R";
}
} else { # >-< # > is on the right and trumps
for(; i<= R_nf[NRS] && RF_s[NRS,i]=="-"; i++) {
RF_s[NRS,i]="R";
}
RF_s[NRS,i]="}";
}
}
}
}
} else i++;
}
debug_row(NRS, 3);
# ~ we need to test this with multi shifts (arrow/bar/arrow)
shift = 0;
for(i=1; i<= R_nf[NRS]+1; i++) {
if(RF_s[NRS,i-1] ~ RAFS && RF_s[NRS,i] !~ RAFS) shift++;
if(shift) RF_f[NRS,i-shift]=RF_f[NRS,i];
}
R_nf[NRS] = R_nf[NRS] - shift;
debug_row(NRS, 4);
# Trim empty trailing fields
for(i= R_nf[NRS]; i>0 && RF_f[NRS,i]==""; i--) R_nf[NRS]--;
debug_row(NRS, 5);
# Get event wlength and adjust the max length of each event
for(i=1; i<= R_nf[NRS]; i++) {
RF_l[NRS,i]= length(RF_f[NRS,i]);
if(RF_l[NRS,i] > E_ml[i]) E_ml[i] = RF_l[NRS,i];
}
# Adjust the max width of each column (headers/events)
if(MAXNF < R_nf[NRS]) MAXNF= R_nf[NRS]; # print MAXNF;
for(i=1; i<= MAXNF; i++) {
w = E_ml[i] + 2 * ARG["EP"] + ARROWS;
if (F_width[i] < w) F_width[i] = w;
printf("FW:"F_width[i]" W:"w" ");
}
print ""
}
END { ST=print_sd(ST); }
EOF
exit
fi
Usage()
{
cat - <<-EOF
use(v1.0): $0 file.sdml > sequence_diagram
This program will turn SDML into simple ascii text uml sequence
diagrams. SDML is an extremely simplistic uml Sequence Diagram
Markup Language. SDML is specified as:
.Lines starting with a [ are a comma separated list
of actors (bar headers)
.Events are defined easily by the following symbols:
> rightward event
< leftward event
- extension of the previous event
.Actors can be skipped with a |
.Text on a line after a # is a comment
.Lines starting with a @ are text lines
.Lines starting with a " are indented text lines
.Lines starting with a : are comma separated list of
parameter assignment lines. Parameters are:
E Event Padding (spaces on each side)
ES Event Spacing (lines below)
EA Events Above (put event text above arrows)
HP Header Padding (spaces on each side)
HS Header Spacing (lines below)
LM Left Margin (spaces on the left)
TSM Text Spacing Margin (lines above & below)
TD Text Dots (instead of bars in text margins)
SS Enable Single Arrow Spans (|---A-->|, not |-A-+-A>|)
Example SDML Input:
[Client, Proxy, DNS, Server
Query Name->
Answer IP<-
http GET >->
<<-html
Sequence Diagram Output:
Client Proxy DNS Server
| | | |
|----------Query Name-------->| |
|<---------Answer IP----------| |
|--http GET -->|----------http GET -------->|
|<----html-----|<-----------html------------|
Copyright: Martin Fick <mogulguy@yahoo.com>, Date: 2008-02-15
License: None. This is released into the public domain: do
as you wish.
EOF
exit
}
[ "$1" = "--help" -o "$1" = "-h" -o "$1" = "-u" ] && Usage
Hack to attempt to make this somewhat portable
AWK_PROG="`"$0" --awkprog`"
AWK=awk # default (should work most places)
[ -x /usr/bin/nawk ] && AWK=/usr/bin/nawk # solaris
$AWK "$AWK_PROG" "$@"
Martin Fick
awk [-v profiling=1] -f awklisp [optional-Lisp-source-files]
The -v profiling=1 option turns call-count profiling on.
If you want to use it interactively, be sure to include '-' (for the standard input) among the source files. For example:
gawk -f awklisp startup numbers lists -
This program arose out of one-upmanship. At my previous job I had to use MapBasic, an interpreter so astoundingly slow (around 100 times slower than GWBASIC) that one must wonder if it itself is implemented in an interpreted language. I still wonder, but it clearly could be: a bare-bones Lisp in awk, hacked up in a few hours, ran substantially faster. Since then I've added features and polish, in the hope of taking over the burgeoning market for stately language implementations.
This version tries to deal with as many of the essential issues in interpreter implementation as is reasonable in awk (though most would call this program utterly unreasonable from start to finish, perhaps...). Awk's impoverished control structures put error recovery and tail-call optimization out of reach, in that I can't see a non-painful way to code them. The scope of variables is dynamic because that was easier to implement efficiently. Subject to all those constraints, the language is as Schemely as I could make it: it has a single namespace with uniform evaluation of expressions in the function and argument positions, and the Scheme names for primitives and special forms.
The rest of this file is a reference manual. My favorite tutorial would be The Little LISPer (see section 5, References); don't let the cute name and the cartoons turn you off, because it's a really excellent book with some mind-stretching material towards the end. All of its code will work with awklisp, except for the last two chapters. (You'd be better off learning with a serious Lisp implementation, of course.)
For more details on the implementation, see the Implementation notes (below).
Code:
(define fib
(lambda (n)
(if (< n 2)
1
(+ (fib (- n 1))
(fib (- n 2))))))
(fib 20)
Comamnd line:
gawk -f awklisp startup numbers lists fib.lsp
Output:
10946
Here are the standard ELIZA dialogue patterns:
(define rules
'(((hello)
(How do you do -- please state your problem))
((I want)
(What would it mean if you got -R-)
(Why do you want -R-)
(Suppose you got -R- soon))
((if)
(Do you really think its likely that -R-)
(Do you wish that -R-)
(What do you think about -R-)
(Really-- if -R-))
((I was)
(Were you really?)
(Perhaps I already knew you were -R-)
(Why do you tell me you were -R- now?))
((I am)
(In what way are you -R-)
(Do you want to be -R-))
((because)
(Is that the real reason?)
(What other reasons might there be?)
(Does that reason seem to explain anything else?))
((I feel)
(Do you often feel -R-))
((I felt)
(What other feelings do you have?))
((yes)
(You seem quite positive)
(You are sure)
(I understand))
((no)
(Why not?)
(You are being a bit negative)
(Are you saying no just to be negative?))
((someone)
(Can you be more specific?))
((everyone)
(Surely not everyone)
(Can you think of anyone in particular?)
(Who for example?)
(You are thinking of a special person))
((perhaps)
(You do not seem quite certain))
((are)
(Did you think they might not be -R-)
(Possibly they are -R-))
(()
(Very interesting)
(I am not sure I understand you fully)
(What does that suggest to you?)
(Please continue)
(Go on)
(Do you feel strongly about discussing such things?))))
Command line:
gawk -f awklisp startup numbers lists eliza.lsp -
Interaction:
> (eliza) Hello-- please state your problem > (I feel sick) Do you often feel sick > (I am in love with awk) In what way are you in love with awk > (because it is so easy to use) Is that the real reason? > (I was laughed at by the other kids at space camp) Were you really? > (everyone hates me) Can you think of anyone in particular? > (everyone at space camp) Surely not everyone > (perhaps not tina fey) You do not seem quite certain > (I want her to laugh at me) What would it mean if you got her to laugh at me
Lisp evaluates expressions, which can be simple (atoms) or compound (lists).
An atom is a string of characters, which can be letters, digits, and most punctuation; the characters may -not- include spaces, quotes, parentheses, brackets, '.', '#', or ';' (the comment character). In this Lisp, case is significant ( X is different from x ).
A list is a '(', followed by zero or more objects (each of which is an atom or a list), followed by a ')'.
The special object nil is both an atom and the empty list. That is, nil = (). A non-nil list is called a -pair-, because it is represented by a pair of pointers, one to the first element of the list (its -car-), and one to the rest of the list (its -cdr-). For example, the car of ((a list) of stuff) is (a list), and the cdr is (of stuff). It's also possible to have a pair whose cdr is not a list; the pair with car A and cdr B is printed as (A . B).
That's the syntax of programs and data. Now let's consider their meaning. You can use Lisp like a calculator: type in an expression, and Lisp prints its value. If you type 25, it prints 25. If you type (+ 2 2), it prints 4. In general, Lisp evaluates a particular expression in a particular environment (set of variable bindings) by following this algorithm:
If the procedure's body has more than one expression -- e.g., (lambda () (write 'Hello) (write 'world!)) -- evaluate them each in turn, and return the value of the last one.
We still need the rules for special forms. They are:
It's possible to define new special forms using the macro facility provided in the startup file. The macros defined there are:
(let ((<var> <expr>)...) <body>...)Bind each <var> to its corresponding <expr> (evaluated in the current environment), and evaluate <body> in the resulting environment.
(cond (<test-expr> <result-expr>...)... (else <result-expr>...))where the final else clause is optional. Evaluate each <test-expr> in turn, and for the first non-nil result, evaluate its <result-expr>. If none are non-nil, and there's no else clause, return nil.
(and <expr>...)Evaluate each <expr> in order, until one returns nil; then return nil. If none are nil, return the value of the last <expr>.
(or <expr>...)Evaluate each <expr> in order, until one returns non-nil; return that value. If all are nil, return nil.
Since the code should be self-explanatory to anyone knowledgeable about Lisp implementation, these notes assume you know Lisp but not interpreters. I haven't got around to writing up a complete discussion of everything, though.
The code for an interpreter can be pretty low on redundancy -- this is natural because the whole reason for implementing a new language is to avoid having to code a particular class of programs in a redundant style in the old language. We implement what that class of programs has in common just once, then use it many times. Thus an interpreter has a different style of code, perhaps denser, than a typical application program.
Conceptually, a Lisp datum is a tagged pointer, with the tag giving the datatype and the pointer locating the data. We follow the common practice of encoding the tag into the two lowest-order bits of the pointer. This is especially easy in awk, since arrays with non-consecutive indices are just as efficient as dense ones (so we can use the tagged pointer directly as an index, without having to mask out the tag bits). (But, by the way, mawk accesses negative indices much more slowly than positive ones, as I found out when trying a different encoding.)
This Lisp provides three datatypes: integers, lists, and symbols. (A modern Lisp provides many more.)
For an integer, the tag bits are zero and the pointer bits are simply the numeric value; thus, N is represented by N*4. This choice of the tag value has two advantages. First, we can add and subtract without fiddling with the tags. Second, negative numbers fit right in. (Consider what would happen if N were represented by 1+N*4 instead, and we tried to extract the tag as N%4, where N may be either positive or negative. Because of this problem and the above-mentioned inefficiency of negative indices, all other datatypes are represented by positive numbers.)
The following is from an email discussion; it doesn't develop everything from first principles but is included here in the hope it will be helpful.
Hi. I just took a look at awklisp, and remembered that there's more to your question about why we need a stack -- it's a good question. The real reason is because a stack is accessible to the garbage collector.
We could have had apply() evaluate the arguments itself, and stash the results into variables like arg0 and arg1 -- then the case for ADD would look like
if (proc == ADD) return is(a_number, arg0) + is(a_number, arg1)
The obvious problem with that approach is how to handle calls to user-defined procedures, which could have any number of arguments. Say we're evaluating ((lambda (x) (+ x 1)) 42). (lambda (x) (+ x 1)) is the procedure, and 42 is the argument.
A (wrong) solution could be to evaluate each argument in turn, and bind the corresponding parameter name (like x in this case) to the resulting value (while saving the old value to be restored after we return from the procedure). This is wrong because we must not change the variable bindings until we actually enter the procedure -- for example, with that algorithm ((lambda (x y) y) 1 x) would return 1, when it should return whatever the value of x is in the enclosing environment. (The eval_rands()-type sequence would be: eval the 1, bind x to 1, eval the x -- yielding 1 which is *wrong* -- and bind y to that, then eval the body of the lambda.)
Okay, that's easily fixed -- evaluate all the operands and stash them away somewhere until you're done, and *then* do the bindings. So the question is where to stash them. How about a global array? Like
for (i = 0; arglist != NIL; ++i) {
global_temp[i] = eval(car[arglist])
arglist = cdr[arglist]
}
followed by the equivalent of extend_env(). This will not do, because the global array will get clobbered in recursive calls to eval(). Consider (+ 2 (* 3 4)) -- first we evaluate the arguments to the +, like this: global_temp[0] gets 2, and then global_temp[1] gets the eval of (* 3 4). But in evaluating (* 3 4), global_temp[0] gets set to 3 and global_temp[1] to 4 -- so the original assignment of 2 to global_temp[0] is clobbered before we get a chance to use it. By using a stack[] instead of a global_temp[], we finesse this problem.
You may object that we can solve that by just making the global array local, and that's true; lots of small local arrays may or may not be more efficient than one big global stack, in awk -- we'd have to try it out to see. But the real problem I alluded to at the start of this message is this: the garbage collector has to be able to find all the live references to the car[] and cdr[] arrays. If some of those references are hidden away in local variables of recursive procedures, we're stuck. With the global stack, they're all right there for the gc().
(In C we could use the local-arrays approach by threading a chain of pointers from each one to the next; but awk doesn't have pointers.)
(You may wonder how the code gets away with having a number of local variables holding lisp values, then -- the answer is that in every such case we can be sure the garbage collector can find the values in question from some other source. That's what this comment is about:
# All the interpretation routines have the precondition that their # arguments are protected from garbage collection.
In some cases where the values would not otherwise be guaranteed to be available to the gc, we call protect().)
Oh, there's another reason why apply() doesn't evaluate the arguments itself: it's called by do_apply(), which handles lisp calls like (apply car '((x))) -- where we *don't* want the x to get evaluated by apply().
Roger Rohrbach wrote a Lisp interpreter, in old awk (which has no procedures!), called walk . It can't do as much as this Lisp, but it certainly has greater hack value. Cooler name, too. It's available at http://www-2.cs.cmu.edu/afs/cs/project/ai-repository/ai/lang/lisp/impl/awk/0.html
Eval doesn't check the syntax of expressions. This is a probably-misguided attempt to bump up the speed a bit, that also simplifies some of the code. The macroexpander in the startup file would be the best place to add syntax- checking.
Darius Bacon dairus@wry.me
Copyright (c) 1994, 2001 by Darius Bacon.
Permission is granted to anyone to use this software for any purpose on any computer system, and to redistribute it freely, subject to the following restrictions:
Download from LAWKER.
"aaa" (the Amazing Awk Assembler) is a primitive assembler written entirely in awk and sed. It was done for fun, to establish whether it was possible. It is; it works. It's quite slow, the input syntax is eccentric and rather restricted, and error-checking is virtually nonexistent, but it does work. Furthermore it's very easy to adapt to a new machine, provided the machine falls into the generic "8-bit-micro" category. It is supplied "as is", with no guarantees of any kind. I can't be bothered to do any more work on it right now, but even in its imperfect state it may be useful to someone.
aaa is the mainline shell file.
aux is a subdirectory with machine-independent stuff. Anon, 6801, and 6809 are subdirectories with machine-dependent stuff, choice specified by a -m option (default is "anon"). Actually, even the stuff that is supposedly machine-independent does have some machine-dependent assumptions; notably, it knows that bytes are 8 bits (not serious) and that the byte is the basic unit of instructions (more serious). These would have to change for the 68000 (going to 16-bit "bytes" might be sufficient) and maybe for the 32016 (harder).
aaa thinks that the machine subdirectories and the aux subdirectory are in the current directory, which is almost certainly wrong.
abst is an abstract for a paper. "card", in each machine directory, is a summary card for the slightly-eccentric input language. There is no real manual at present; sorry.
try.s is a sample piece of 6809 input; it is semantic trash, purely for test purposes. The assembler produces try.a, try.defs, and try.x as outputs from "aaa try.s". try.a is an internal file that looks somewhat like an assembly listing. try.defs is another internal file that looks somewhat like a symbol table. These files are preserved because of possible usefulness; tmp[123] are non-preserved temporaries. try.x is the Intel-hex output. try.x.good is identical to try.x and is a saved copy for regression testing of new work.
01pgm.s is a self-programming program for a 68701, based on the one in the Motorola ap note. 01pgm.x.good is another regression-test file.
If your C library (used by awk) has broken "%02x" so it no longer means "two digits of hex, *zero-filled*" (as some SysV libraries have), you will have to fall back from aux/hex to aux/hex.argh, which does it the hard way. Oh yes, you'll note that aaa feeds settings into awk on the command line; don't assume your awk won't do this until you try it.
Henry Spencer
Jesus Galan (yiyus) (yiyu DOT jgl AT gmail DOT com) has updated his markdown system.
His new md2html.awk code adds several new functionality extensions and implements numerous bug fixes.
For more on this new code, see his history of a rewrite.
Download from LAWKER.
awk -f markdown.awk file.txt > file.html
Download from LAWKER.
(Note: this code was orginally called txt2html.awk by its author but that caused a name clash inside LAWKER. Hence, I've taken the liberty of renamining it. --Timm)
The following code implements a subset of John Gruber's Markdown langauge: a widely-used, ultra light-weight markup language for html.

Level 1 Header =============== Level 2 Header -------------- Level 3 Header ______________
Number of leading "#" codes the heading level:
# Level 1 Header #### Level 4 Header
- List item 1 - List item 2
Note: beginnging and end of list are automatically inferred, maybe not always correctly.
Denoted by a number at start-of-line.
1 A numbered list item
The following code demonstrates a "exception-style" of Awk programming. Note how all the processing relating to each mark-up tag is localized (exception, carrying round prior text and environments). The modularity of the following code should make it easily hackable.
BEGIN {
env = "none";
text = "";
}
/^!\[.+\] *\(.+\)/ {
split($0, a, /\] *\(/);
split(a[1], b, /\[/);
imgtext = b[2];
split(a[2], b, /\)/);
imgaddr = b[1];
print "<p><img src=\"" imgaddr "\" alt=\"" imgtext "\" title=\"\" /></p>\n";
text = "";
next;
}
/\] *\(/ {
do {
na = split($0, a, /\] *\(/);
split(a[1], b, "[");
linktext = b[2];
nc = split(a[2], c, ")");
linkaddr = c[1];
text = text b[1] "<a href=\"" linkaddr "\">" linktext "</a>" c[2];
for(i = 3; i <= nc; i++)
text = text ")" c[i];
for(i = 3; i <= na; i++)
text = text "](" a[i];
$0 = text;;
text = "";
}
while (na > 2);
}
/`/ {
while (match($0, /`/) != 0) {
if (env == "code") {
sub(/`/, "</code>");
env = pcenv;
}
else {
sub(/`/, "<code>");
pcenv = env;
env = "code";
}
}
}
/\*\*/ {
while (match($0, /\*\*/) != 0) {
if (env == "emph") {
sub(//, "</emph>");
env = peenv;
}
else {
sub(/\*\*/, "<emph>");
peenv = env;
env = "emph";
}
}
}
(Plus h3 with underscores.)
/^=+$/ {
print "<h1>" text "</h1>\n";
text = "";
next;
}
/^-+$/ {
print "<h2>" text "</h2>\n";
text = "";
next;
}
/^_+$/ {
print "<h3>" text "</h3>\n";
text = "";
next;
}
/^#/ {
match($0, /#+/);
n = RLENGTH;
if(n > 6)
n = 6;
print "<h" n ">" substr($0, RLENGTH + 1) "</h" n ">\n";
next;
}
/^[*-+]/ {
if (env == "none") {
env = "ul";
print "<ul>";
}
print "<li>" substr($0, 3) "</li>";
text = "";
next;
}
/^[0-9]./ {
if (env == "none") {
env = "ol";
print "<ol>";
}
print "<li>" substr($0, 3) "</li>";
next;
}
/^[ t]*$/ {
if (env != "none") {
if (text)
print text;
text = "";
print "</" env ">\n";
env = "none";
}
if (text)
print "<p>" text "</p>\n";
text = "";
next;
}
// {
text = text $0;
}
END {
if (env != "none") {
if (text)
print text;
text = "";
print "</" env ">\n";
env = "none";
}
if (text)
print "<p>" text "</p>\n";
text = "";
}
Does not implement the full Markdown syntax.
Jesus Galan (yiyus) 2006
<yiyu DOT jgl AT gmail DOT com>
gawk -f awkpp file-name-of-awk++-programThis command is platform independent and sends the translated program to standard output (stdout). See Running awk++ for variations.
This is an updated revision (#21), released August 1, 2009. In this new version:
Download awkpp21.zip from LAWKER
Awk++ is a preprocessor, that is it reads in a program written in the awk++ language and outputs a new program. However, it's different than awka. The output from the awk++ preprocessor is awk code, not C or an executable program. So, some version of AWK, such as awk or gawk, has to be used to run the preprocessed program. awka can be used, in a second step, to turn the preprocessed awk++ program into an executable, if desired.
The awk++ language provides object oriented programming for AWK that includes:
Awk++ adds new keywords to standard Awk:
a = class1.new[(optional parameters)] *** similar to Ruby
b = a.get("aProperty")
a.delete
class class1 {
property aProperty
method new([optional parameters]) {
# put initialization stuff here
}
method get(propName) {
if(propName = "aProperty")
return aProperty ### Note the use of 'return'. It behaves
### exactly the same as in an AWK function.
}
}
To define a class (similar to C++ but no public/private):
class class_name {.....}
To define a class with inheritance:
class class_name : inherited_class_name [ : inherited_class_name...] {.....}
To add local/private variables (persistent variables; syntax is unique to awk++):
class class_name {
attribute|attr|property|prop|element|elem|variable|var variable_name
..... }
To help programmers who are used to other OO languages, "attribute", "property", "element", and "variable", along with their 4-letter abbreviations, are interchangeable.
Note: these persistent variables cannot be accessed directly. The programmer must define method(s) to return them, if their values are to be made available to code that's outside the class.
To add methods
class class_name {
attribute variable_name1
method method_name(parameters) {
...any awk code....
}
..other method definitions...
}
To create an object
object_variable = class_name.new[(optional parameters)](runs the method named "new", if it exists; returns the object ID)
To call an object method
object_variable.method_name(parameters)
The dot isn't used for concatenation in awk/gawk, so it's a natural choice for the separator between the object and method.
To reclaim the memory used by an object, use the delete method, i.e.:
object_variable.delete
but don't define delete() in your classes. awk++ recognizes delete() as a special method and will take care of deleting the object. Deleting objects is only necessary, though, if they hold a lot of data. Overhead for objects themselves is insignificant.
OO syntax goals:
The OO syntax is based partly on C++, partly on Javascript, partly on Ruby and partly on the book "The Object-Oriented Thought Process". It isn't lifted in toto from one langauage because other languages provide features that gawk can't accomplish or have syntax that is hard to parse.
In awk++, if a method is called that isn't in the object's class and there are inherited classes (superclasses) specified, the inherited classes are called in left to right order until one of them returns a value. That value becomes the result of the method call. This is the way awk++ resolves the diamond problem. As a programmer, you control the sequence in which superclasses are called by the left to right order of the list of inherited classes in the class definition.
There are two important things to note.
Calls to undefined methods do nothing and return nothing, silently.
The command to preprocess an awk++ program looks like this:
gawk -f awkpp file-name-of-awk++-programor, if the "she-bang" line (line 1 in awkpp) has the right path to gawk, and awkpp is executable and in a directory in PATH,
awkpp file-name-of-awk++-programTo run the output program immediately,
gawk -f awkpp -r file-name-of-awk++-program [awk options] data-files-to-be-processedor
awkpp -r file-name-of-awk++-program [awk options] data-files-to-be-processedWhen running an awk++ program immediately, standard input (stdin) cannot be used for data. One or more data file paths must be listed on the command line.
There is a bug in the standard AWK distributions that affects the preprocessor. Additionally, the preprocessor uses the 3rd array option of the match() function. So, it's best to use GAWK to run the preprocessor.
On the other hand, the AWK code created by translating awk++ is intended to work with all versions of AWK. If you find otherwise, please notify the developer(s).
Copyright (c) 2008, 2009 Jim Hart, jhart@mail.avcnet.org All rights reserved. The awk++ code is licensed under the GNU Public license (GPL) any version. awk++ documentation, including this page, may be copied only in unmodified form, subject to fair use guidelines.
ooc is an awk program which reads class descriptions and performs the routine coding tasks necessary to do object-oriented coding in ANSI C.
The tool is exceptionally well documented in Object oriented programming with ANSI-C.
Download a 2002 copy of this code from LAWKER.
Or go to the author's web site.
ooc is a technique to do object-oriented programming (classes, methods, dynamic linkage, simple inheritance, polymorphisms, persistent objects, method existence testing, message forwarding, exception handling, etc.) using ANSI-C.
ooc is a preprocessor to simplify the coding task by converting class descriptions and method implementations into ANSI-C as required by the technique. You implement the algorithms inside the methods and the ooc preprocessor produces the boilerplate.
ooc consists of a shell script driving a modular awk script (with provisions for debugging), a set of reports -- code generation templates -- interpreted by the script, and the source of a root class to provide basic functionality. Everything is designed to be changed if desired. There are manual pages, lots of examples, among them a calculator based on curses and X11, and you can ask me about the book.
ooc as a technique requires an ANSI-C system -- classic C would necessitate substantial changes. The preprocessor needs a healthy Bourne-Shell and "new" awk as described in Aho, Weinberger, and Kernighan's book.
ooc was developed primarily to teach about object-oriented programming without having to learn a new language. If you see how it is done in a familiar setting, it is much easier to grasp the concepts and to know what miracles to expect from the technique and what not. Conceivably, the preprocessor can be used for production programming but this was not the original intent. Being able to roll your own object-oriented coding techniques has its possibilities, however...
Most sources should be viewed with tab stops set at 4 characters.
The original system ran on NeXTSTEP 3.2 and older, ESIX (System V) 4.0.4, and Linux 0.99.pl4-49. This rerelease was tested on MacOS X version 10.1.2 and Solaris version 5.8. You need to review paths in the script 'ooc/ooc' before running anything. Make sure the first line of this script points to a Bourne-style shell. Also make sure that the first line of '09/munch' points to a (new) awk.
The rereleased 'ooc' awk-programs have been tested with GNU awk versions 3.0.1 and 3.0.3. Previous versions did not support AWKPATH properly (but this is not essential).
The makefiles could be smarter but they are naive enough for all systems. This is a heterogeneous system -- set the environment variable $OSTYPE to an architecture-specific name. 'make' in the current directory will create everything by calling 'make' in the various subdirectories. Each 'makefile' includes 'make/Makefile.$OSTYPE', review your 'make/Makefile.$OSTYPE' before you start.
The following make calls are supported throughout:
make [all] create examples make test [make and] run examples make clean remove all but sources make depend make dependencies (if makefile.$OSTYPE supports it)
Make dependencies can be built with the -MM option of the GNU C compiler. They are stored in a file 'depend' in each subdirectory. They should apply to all systems. 'makefile.$OSTYPE' may include a target 'depend' to recreate 'depend' -- check 'makefile.darwin1.4' for an example.
The following is a walk through the file hierarchy in the order of the book:
Copyright (c) 1993
While you may use this software package, neither I nor my employers can be made responsible for whatever problems you might cause or encounter.
While you may give away this package and/or software derived with it, you should not charge for it, you should not claim that ooc is your work, and I have published my own book about ooc before you did.
The same restrictions apply to whoever might get this package from you.