Awk.Info

"Cause a little auk awk
goes a long way."

About awk.info
 »  table of contents
 »  featured topics
 »  page tags


About Awk
 »  advocacy
 »  learning
 »  history
 »  Wikipedia entry
 »  mascot
Implementations
 »  Awk (rarely used)
 »  Nawk (the-one-true, old)
 »  Gawk (widely used)
 »  Mawk
 »  Xgawk (gawk + xml + ...)
 »  Spawk (SQL + awk)
 »  Jawk (Awk in Java JVM)
 »  QTawk (extensions to gawk)
 »  Runawk (a runtime tool)
 »  platform support
Coding
 »  one-liners
 »  ten-liners
 »  tips
 »  the Awk 100
Community
 »  read our blog
 »  read/write the awk wiki
 »  discussion news group

Libraries
 »  Gawk
 »  Xgawk
 »  the Lawker library
Online doc
 »  reference card
 »  cheat sheet
 »  manual pages
 »  FAQ

Reading
 »  articles
 »  books:

WHAT'S NEW?

Mar 01: Michael Sanders demos an X-windows GUI for AWK.

Mar 01: Awk100#24: A. Lahm and E. de Rinaldis' patent search, in AWK

Feb 28: Tim Menzies asks this community to write an AWK cookbook.

Feb 28: Arnold Robbins announces a new debugger for GAWK.

Feb 28: Awk100#23: Premysl Janouch offers a IRC bot, In AWK

Feb 28: Updated: the AWK FAQ

Feb 28: Tim Menzies offers a tiny content management system, in Awk.

Jan 31: Comment system added to awk.info. For example, see discussion bottom of ?keys2awk

Jan 31: Martin Cohen shows that Gawk can handle massively long strings (300 million characters).

Jan 31: The AWK FAQ is being updated. For comments/ corrections/ extensions, please mail tim@menzies.us

Jan 31: Martin Cohen finds Awk on the Android platform.

Jan 31: Aleksey Cheusov released a new version of runawk.

Jan 31: Hirofumi Saito contributes a candidate Awk mascot.

Jan 31: Michael Sanders shows how to quickly build an AWK GUI for windows.

Jan 31: Hyung-Hwan Chung offers QSE, an embeddable Awk Interpreter.

[More ...]

Bookmark and Share

categories: Funky,Mar,2009,Timm

Funky: Functional Gawk

These pages are focused on Functional Gawk (a.k.a. "Funky").

Funky is enabled by a new feature added to Gawk 3.2: indirect functions. For example:

function foo() { print "foo" }
function bar() { print "bar" }

BEGIN {
                the_func = "foo"
                @the_func()     # calls foo()
                the_func = "bar"
                @the_func()     # calls bar()
}

At the time of this writing, Gawk 3.2 is pre-release and indirect functions can be accessed using the gawk-devel CVS tree:

cvs -d:pserver:anonymous@cvs.sv.gnu.org:/sources/gawk co gawk-devel

categories: Funky,Mar,2009,Timm

The Functional Challange

Indirect functions enable a new view on library management in Gawk and, perhaps, a way to emulate functional abstraction in languages like Lisp.

So, anyone care to try, say:


categories: Funky,Tips,Mar,2009,ArnoldR

Super-For Loops

In this exchange from comp.lang.awk, Jason Quinn discusses his super-for loop trick. Arnold Robbins then chimes in to say that, with indirect functions, super-for loops could become a generic tool.

Jason Quinn writes:

  • Frequently when programming, situations arise for me where I need a nested number of for-loops. Such case arose for me again just recently while I was inventing a dice game. Anyway, here is the implementation that I ended up using to create a "super-for" loop in AWK (a little trickier than C).
  • This simple example merely lists all possible outcomes of rolling 4, 6, 8, 10, 12, and 20 sided dice at once. A super-for loop requires an array to specify the loop indices... here we have 6 dice and the number of sides determines the indices. The code is easily modified for an arbitrary number of dice (which is the whole point).
  • I identify three parts of a super-for which I called the prologue, body, and epilog. Under most circumstances, I think the main body only would get used.
  • For example:
    #shows an example of a superfor loop
    BEGIN {
    	#define loop maximums
    	loopmax[1]=4
    	loopmax[2]=6
    	loopmax[3]=8
    	loopmax[4]=10
    	loopmax[5]=12
    	loopmax[6]=20
    	#call the loop
    	superfor(6)
    }
    function superfor(loopdepth, zz) { # zz is a local variable
            currloopnum++
    
            #start of prologue
            #end of prologue
    
            for(loopcounter[currloopnum]=1; 
                loopcounter[currloopnum]<=loopmax[currloopnum]; 
                loopcounter[currloopnum]++) {
                    if ( loopdepth==1 ) {
                            #start of superfor body
                            for (zz=1;zz<=currloopnum;zz++) {
                                    printf loopcounter[zz] FS
                                    }
                            print ""
                            #end of superfor body
                            }
                    else if ( loopdepth>1 )
                            superfor(loopdepth-1)
                    }
    
            #start of epilog
            #end of epilog
    
            loopdepth++ ; currloopnum--
            }
    

Arnold Robbins replies:

  • I think this would make a great application for indirect function calls. For example:
    function superfor(loopdepth, prologue, body, epilogue,     zz)
    {
            currloopnum++
    
            @prologue()
    
            for(loopcounter[currloopnum]=1; 
                loopcounter[currloopnum]<=loopmax [currloopnum]; 
                loopcounter[currloopnum]++) {
                    if ( loopdepth==1 ) {
                            @body()
                    }
                    else if ( loopdepth>1 )
                            superfor(loopdepth-1, proloogue, 
                                     body, epilogue)
                    }
    
            @epilogue()
    
            loopdepth++ ; currloopnum--
    }
    

categories: Funky,Mar,2009,Timm

Functional Enumeration in Gawk 3.1.7

Contents

Synopsis

all( fun, array [,max]

collect( fun, array1, array2 [,max])

select( fun, array1, array2 [,max])

reject( fun, array1, array2 [,max])

detect( fun, array [,max])

inject( fun, array, carry [,max])

All these functions return the size of array or array2

Description

An interesting new feature in Gawk 3.1.7 is indirect functions. This allows the function name to be a variable, passed as an argument to an array, and called using the syntax

@fun(arg1,arg2,...)    

This enables a new kind of funcational programming style in Gawk. For example, generic enumeration patterns can be coded once, then called many different ways with different function names passed as arguments.

This document illustrates this style of programming.

Enumerators

For example, here are some standard enumeration functions:

all(fun,array [,max]

Applies the function fun to all items in the array. If called with the max argument, then they are iterated in the order i=1 .. max, otherwise we use for(i in a).

collect(fun,array1,array2 [,max])

Applies fun to each item in array1 and collects the results in array2.

select(fun,array1,array2 [,max])

Find all the items in array1 that satisfies fun and add them to array2.

reject(fun,array1,array2 [,max])

Find all the items in array1 that do not satisfy fun and add them to array2.

detect(fun,array [,max])

Return the first item found in array that satisfies fun. If no such item is found, then return the magic global value Fail.

inject(fun,array,carry [,max])

(This one is a little tricky.) The result of applying fun to each item in array is carried into the processing of the next item. Initially, the carried value is carry. This function returns the final carry.

Sample Functions

To illusrate the above, consider the following functions. Each of these are defined for one array item.

function odd(x)    { return (x % 2) == 1 }
function show(x)   { print "[" x "]" }
function mult(x,y) { return x * y }
function halve(x)  { return x/2 }

Using the Functions

  • All-ing...
  • function do_all(   arr) { 
        split("22 23 24 25 26 27 28",arr)
        all("show",arr)
    }
    

    When we run this ...

    eg/enum1

    gawk317="$HOME/opt/gawk/bin/gawk"
    $gawk317 -f ../enumerate.awk --source 'BEGIN { do_all() }'
    

    we see every item in arr printed using the above show function ...

    eg/enum1.out

    [25]
    [26]
    [27]
    [28]
    [22]
    [23]
    [24]
    
  • Collect-ing...
  • function do_collect(        max,arr1,arr2,i) {
        max=split("22 23 24 25 26 27 28",arr1)
        collect("halve",arr1,arr2,max)
        for(i=1;i<=max;i++) print arr2[i]
    }
    

    When we run this ...

    eg/enum2

    gawk317="$HOME/opt/gawk/bin/gawk"
    $gawk317 -f ../enumerate.awk --source 'BEGIN { do_collect() }'
    

    we see every item in arr divided in two ...

    eg/enum2.out

    11
    11.5
    12
    12.5
    13
    13.5
    14
    
  • Select-ing...
  • function do_select(        all,less,arr1,arr2,i) {
        all  = split("22 23 24 25 26 27 28",arr1)
        less = select("odd",arr1,arr2,all)
        for(i=1;i<=less;i++) print arr2[i]
    }
    

    When we run this ...

    eg/enum3

    gawk317="$HOME/opt/gawk/bin/gawk"
    $gawk317 -f ../enumerate.awk --source 'BEGIN { do_select() }'
    

    we see every item in arr that satisfies odd....

    eg/enum3.out

    23
    25
    27
    
  • Reject-ing...
  • function do_reject(        all,less,arr1,arr2,i) {
        all  = split("22 23 24 25 26 27 28",arr1)
        less = reject("odd",arr1,arr2,all)
        for(i=1;i<=less;i++) print arr2[i]
    }
    

    When we run this ...

    eg/enum4

    gawk317="$HOME/opt/gawk/bin/gawk"
    $gawk317 -f ../enumerate.awk --source 'BEGIN { do_reject() }'
    

    we see every item in arr that do not satisfies odd....

    eg/enum4.out

    22
    24
    26
    28
    
  • Detect-ing
  • function do_detect(        all,arr1) {
        all  = split("22 23 24 25 26 27 28",arr1)
        print detect("odd",arr1,all)   
    }
    

    When we run this ...

    eg/enum5

    gawk317="$HOME/opt/gawk/bin/gawk"
    $gawk317 -f ../enumerate.awk --source 'BEGIN { do_detect() }'
    

    we see the first item in arr that satisfies odd....

    eg/enum5.out

    23
    
  • Inject-ing...
  • function do_inject(        all,less,arr1,arr2,i) {
        split("1 2 3 4 5",arr1)
        print inject("mult",arr1,1)
    }
    

    When we run this ...

    eg/enum6

    gawk317="$HOME/opt/gawk/bin/gawk"
    $gawk317 -f ../enumerate.awk --source 'BEGIN { do_inject() }'
    

    we see every the result of multiplying every item in arr by its predecessor.

    eg/enum6.out

    120
    

Code

Note one design principle in the following: any newly generated arrays have indexes 1..max where max is the number of elements in that array.

all

function all (fun,a,max,   i) {
	if (max) 
		for(i=1;i<=max;i++) @fun(a[i]) 
	else  
		for(i in a) @fun(a[i])
}

collect

function collect (fun,a,b,max,   i) {
	if (max)
	    for(i=1;i<=max;i++) {n++; b[i]= @fun(a[i]) }
	else
	    for(i in a) {n++; b[i]= @fun(a[i])}
	return n
}

select

function select (fun,a,b,max,   i,n) {
	if (max)
		for(i=1;i<=max;i++) {
		    if (@fun(a[i])) {n++; b[n]= a[i] }}
	else
		for(i in a) {
		    if (@fun(a[i])) {n++; b[n]= a[i] }}
	return n
}

reject

function reject (fun,a,b,max,   i,n) {
	if (max)
		for(i=1;i<=max;i++) {
		    if (! @fun(a[i])) {n++; b[n]= a[i] }}
	else
		for(i in a) {
		    if (! @fun(a[i])) {n++; b[n]= a[i] }}
	return n
}

detect

BEGIN {Fail="someUnLIKELYSymbol"}
function detect (fun,a,max,   i) {
	if (max)
		for(i=1;i<=max;i++) {
			if (@fun(a[i])) return a[i] }
	else	
		for(i in a) {
			if (@fun(a[i])) return a[i] }
	return Fail
}

inject

function inject (fun,a,carry,max,   i) {
	if (max)
		for(i=1;i<=max;i++)
			 carry = @fun(a[i],carry) 
	else
		for(i in a)
			 carry = @fun(a[i],carry) 
	return carry
}

Bugs

The above code does not pass around any state information that the fum functions can use. So all their deliberations are either with the current array values (integers or strings) or with global state. It might be worthwhile writing new versions of the above with one more argument, to carry that sate.

Author

Tim Menzies
blog comments powered by Disqus