About awk.info
» table of contents
» featured topics
» page tags
|
|
|
|
|
|
Mar 01: Michael Sanders demos an X-windows GUI for AWK.
Mar 01: Awk100#24: A. Lahm and E. de Rinaldis' patent search, in AWK
Feb 28: Tim Menzies asks this community to write an AWK cookbook.
Feb 28: Arnold Robbins announces a new debugger for GAWK.
Feb 28: Awk100#23: Premysl Janouch offers a IRC bot, In AWK
Feb 28: Updated: the AWK FAQ
Feb 28: Tim Menzies offers a tiny content management system, in Awk.
Jan 31: Comment system added to awk.info. For example, see discussion bottom of ?keys2awk
Jan 31: Martin Cohen shows that Gawk can handle massively long strings (300 million characters).
Jan 31: The AWK FAQ is being updated. For comments/ corrections/ extensions, please mail tim@menzies.us
Jan 31: Martin Cohen finds Awk on the Android platform.
Jan 31: Aleksey Cheusov released a new version of runawk.
Jan 31: Hirofumi Saito contributes a candidate Awk mascot.
Jan 31: Michael Sanders shows how to quickly build an AWK GUI for windows.
Jan 31: Hyung-Hwan Chung offers QSE, an embeddable Awk Interpreter.
These pages are focused on Functional Gawk (a.k.a. "Funky").
Funky is enabled by a new feature added to Gawk 3.2: indirect functions. For example:
function foo() { print "foo" }
function bar() { print "bar" }
BEGIN {
the_func = "foo"
@the_func() # calls foo()
the_func = "bar"
@the_func() # calls bar()
}
At the time of this writing, Gawk 3.2 is pre-release and indirect functions can be accessed using the gawk-devel CVS tree:
cvs -d:pserver:anonymous@cvs.sv.gnu.org:/sources/gawk co gawk-devel
Indirect functions enable a new view on library management in Gawk and, perhaps, a way to emulate functional abstraction in languages like Lisp.
So, anyone care to try, say:
In this exchange from comp.lang.awk, Jason Quinn discusses his super-for loop trick. Arnold Robbins then chimes in to say that, with indirect functions, super-for loops could become a generic tool.
Jason Quinn writes:
#shows an example of a superfor loop
BEGIN {
#define loop maximums
loopmax[1]=4
loopmax[2]=6
loopmax[3]=8
loopmax[4]=10
loopmax[5]=12
loopmax[6]=20
#call the loop
superfor(6)
}
function superfor(loopdepth, zz) { # zz is a local variable
currloopnum++
#start of prologue
#end of prologue
for(loopcounter[currloopnum]=1;
loopcounter[currloopnum]<=loopmax[currloopnum];
loopcounter[currloopnum]++) {
if ( loopdepth==1 ) {
#start of superfor body
for (zz=1;zz<=currloopnum;zz++) {
printf loopcounter[zz] FS
}
print ""
#end of superfor body
}
else if ( loopdepth>1 )
superfor(loopdepth-1)
}
#start of epilog
#end of epilog
loopdepth++ ; currloopnum--
}
Arnold Robbins replies:
function superfor(loopdepth, prologue, body, epilogue, zz)
{
currloopnum++
@prologue()
for(loopcounter[currloopnum]=1;
loopcounter[currloopnum]<=loopmax [currloopnum];
loopcounter[currloopnum]++) {
if ( loopdepth==1 ) {
@body()
}
else if ( loopdepth>1 )
superfor(loopdepth-1, proloogue,
body, epilogue)
}
@epilogue()
loopdepth++ ; currloopnum--
}
all( fun, array [,max]
collect( fun, array1, array2 [,max])
select( fun, array1, array2 [,max])
reject( fun, array1, array2 [,max])
detect( fun, array [,max])
inject( fun, array, carry [,max])
All these functions return the size of array or array2
An interesting new feature in Gawk 3.1.7 is indirect functions. This allows the function name to be a variable, passed as an argument to an array, and called using the syntax
@fun(arg1,arg2,...)
This enables a new kind of funcational programming style in Gawk. For example, generic enumeration patterns can be coded once, then called many different ways with different function names passed as arguments.
This document illustrates this style of programming.
For example, here are some standard enumeration functions:
Applies the function fun to all items in the array. If called with the max argument, then they are iterated in the order i=1 .. max, otherwise we use for(i in a).
Applies fun to each item in array1 and collects the results in array2.
Find all the items in array1 that satisfies fun and add them to array2.
Find all the items in array1 that do not satisfy fun and add them to array2.
Return the first item found in array that satisfies fun. If no such item is found, then return the magic global value Fail.
(This one is a little tricky.) The result of applying fun to each item in array is carried into the processing of the next item. Initially, the carried value is carry. This function returns the final carry.
To illusrate the above, consider the following functions. Each of these are defined for one array item.
function odd(x) { return (x % 2) == 1 }
function show(x) { print "[" x "]" }
function mult(x,y) { return x * y }
function halve(x) { return x/2 }
function do_all( arr) {
split("22 23 24 25 26 27 28",arr)
all("show",arr)
}
When we run this ...
eg/enum1
gawk317="$HOME/opt/gawk/bin/gawk"
$gawk317 -f ../enumerate.awk --source 'BEGIN { do_all() }'
we see every item in arr printed using the above show function ...
eg/enum1.out
[25] [26] [27] [28] [22] [23] [24]
function do_collect( max,arr1,arr2,i) {
max=split("22 23 24 25 26 27 28",arr1)
collect("halve",arr1,arr2,max)
for(i=1;i<=max;i++) print arr2[i]
}
When we run this ...
eg/enum2
gawk317="$HOME/opt/gawk/bin/gawk"
$gawk317 -f ../enumerate.awk --source 'BEGIN { do_collect() }'
we see every item in arr divided in two ...
eg/enum2.out
11 11.5 12 12.5 13 13.5 14
function do_select( all,less,arr1,arr2,i) {
all = split("22 23 24 25 26 27 28",arr1)
less = select("odd",arr1,arr2,all)
for(i=1;i<=less;i++) print arr2[i]
}
When we run this ...
eg/enum3
gawk317="$HOME/opt/gawk/bin/gawk"
$gawk317 -f ../enumerate.awk --source 'BEGIN { do_select() }'
we see every item in arr that satisfies odd....
eg/enum3.out
23 25 27
function do_reject( all,less,arr1,arr2,i) {
all = split("22 23 24 25 26 27 28",arr1)
less = reject("odd",arr1,arr2,all)
for(i=1;i<=less;i++) print arr2[i]
}
When we run this ...
eg/enum4
gawk317="$HOME/opt/gawk/bin/gawk"
$gawk317 -f ../enumerate.awk --source 'BEGIN { do_reject() }'
we see every item in arr that do not satisfies odd....
eg/enum4.out
22 24 26 28
function do_detect( all,arr1) {
all = split("22 23 24 25 26 27 28",arr1)
print detect("odd",arr1,all)
}
When we run this ...
eg/enum5
gawk317="$HOME/opt/gawk/bin/gawk"
$gawk317 -f ../enumerate.awk --source 'BEGIN { do_detect() }'
we see the first item in arr that satisfies odd....
eg/enum5.out
23
function do_inject( all,less,arr1,arr2,i) {
split("1 2 3 4 5",arr1)
print inject("mult",arr1,1)
}
When we run this ...
eg/enum6
gawk317="$HOME/opt/gawk/bin/gawk"
$gawk317 -f ../enumerate.awk --source 'BEGIN { do_inject() }'
we see every the result of multiplying every item in arr by its predecessor.
eg/enum6.out
120
Note one design principle in the following: any newly generated arrays have indexes 1..max where max is the number of elements in that array.
function all (fun,a,max, i) {
if (max)
for(i=1;i<=max;i++) @fun(a[i])
else
for(i in a) @fun(a[i])
}
function collect (fun,a,b,max, i) {
if (max)
for(i=1;i<=max;i++) {n++; b[i]= @fun(a[i]) }
else
for(i in a) {n++; b[i]= @fun(a[i])}
return n
}
function select (fun,a,b,max, i,n) {
if (max)
for(i=1;i<=max;i++) {
if (@fun(a[i])) {n++; b[n]= a[i] }}
else
for(i in a) {
if (@fun(a[i])) {n++; b[n]= a[i] }}
return n
}
function reject (fun,a,b,max, i,n) {
if (max)
for(i=1;i<=max;i++) {
if (! @fun(a[i])) {n++; b[n]= a[i] }}
else
for(i in a) {
if (! @fun(a[i])) {n++; b[n]= a[i] }}
return n
}
BEGIN {Fail="someUnLIKELYSymbol"}
function detect (fun,a,max, i) {
if (max)
for(i=1;i<=max;i++) {
if (@fun(a[i])) return a[i] }
else
for(i in a) {
if (@fun(a[i])) return a[i] }
return Fail
}
function inject (fun,a,carry,max, i) {
if (max)
for(i=1;i<=max;i++)
carry = @fun(a[i],carry)
else
for(i in a)
carry = @fun(a[i],carry)
return carry
}
The above code does not pass around any state information that the fum functions can use. So all their deliberations are either with the current array values (integers or strings) or with global state. It might be worthwhile writing new versions of the above with one more argument, to carry that sate.