"Cause a little auk awk
goes a long way."

 »  table of contents
 »  featured topics
 »  page tags

About Awk
 »  advocacy
 »  learning
 »  history
 »  Wikipedia entry
 »  mascot
 »  Awk (rarely used)
 »  Nawk (the-one-true, old)
 »  Gawk (widely used)
 »  Mawk
 »  Xgawk (gawk + xml + ...)
 »  Spawk (SQL + awk)
 »  Jawk (Awk in Java JVM)
 »  QTawk (extensions to gawk)
 »  Runawk (a runtime tool)
 »  platform support
 »  one-liners
 »  ten-liners
 »  tips
 »  the Awk 100
 »  read our blog
 »  read/write the awk wiki
 »  discussion news group

 »  Gawk
 »  Xgawk
 »  the Lawker library
Online doc
 »  reference card
 »  cheat sheet
 »  manual pages
 »  FAQ

 »  articles
 »  books:


Mar 01: Michael Sanders demos an X-windows GUI for AWK.

Mar 01: Awk100#24: A. Lahm and E. de Rinaldis' patent search, in AWK

Feb 28: Tim Menzies asks this community to write an AWK cookbook.

Feb 28: Arnold Robbins announces a new debugger for GAWK.

Feb 28: Awk100#23: Premysl Janouch offers a IRC bot, In AWK

Feb 28: Updated: the AWK FAQ

Feb 28: Tim Menzies offers a tiny content management system, in Awk.

Jan 31: Comment system added to For example, see discussion bottom of ?keys2awk

Jan 31: Martin Cohen shows that Gawk can handle massively long strings (300 million characters).

Jan 31: The AWK FAQ is being updated. For comments/ corrections/ extensions, please mail

Jan 31: Martin Cohen finds Awk on the Android platform.

Jan 31: Aleksey Cheusov released a new version of runawk.

Jan 31: Hirofumi Saito contributes a candidate Awk mascot.

Jan 31: Michael Sanders shows how to quickly build an AWK GUI for windows.

Jan 31: Hyung-Hwan Chung offers QSE, an embeddable Awk Interpreter.

[More ...]

Bookmark and Share

categories: Runawk,Project,Tools,Sept,2009,AlexC

New release: RUNAWK 0.17

What is RUNAWK?

RUNAWK is a small wrapper for the AWK interpreter that helps one write standalone AWK scripts. Its main feature is to provide a module/library system for AWK which is somewhat similar to Perl's "use" command. It also allows you to select a preferred AWK interpreter and to setup the environment for your scripts. RUNAWK makes programming AWK easy and efficient. RUNAWK also provides many useful AWK modules.


Major Changes

Version 0.17.0, by Aleksey Cheusov, Sat, 12 Sep 2009


  • ADDED: new option for runawk for #use'ing modules: -f. runawk can also be used for oneliners! ;-)
          runawk -f abs.awk -e 'BEGIN {print abs(-123); exit}'
  • In a multilined code passed to runawk using option -e, spaces are allowed before #directives.
  • After inventing alt_getopt.awk module there is no reason for heuristics that detects whether to add `-' to AWK arguments or not. So I've removed this heuristics. Use alt_getopt.awk module or other "smart" module for handling options correctly!

alt_getopt.awk and power_getopt.awk:

  • FIX: for "abc:" short options specifier BSD and GNU getopt(3) accept "-acb" and understand it as "-a -cb", they also accept "-ac b" and also translate it to "-a -cb". Now alt_getopt.awk and power_getopt.awk work the same way.


  • -h option doesn't print usage information, --help (and its short synonym) does.

New modules:

  • shquote.awk, implementing shquote() function.
      `shquote' transforms the string `str' by adding shell escape and quoting characters to include it to the system() and popen() functions as an argument, so that the arguments will have the correct values after being evaluated by the shell.
    Inspired by NetBSD's shquote(3) from libc.
  • runcmd.awk, implementing functions runcmd1() and xruncmd1()
    runcmd1(CMD, OPTS, FILE):
      wrapper for function system() that runs a command CMD with options OPTS and one filename FILE. Unlike system(CMD " " OPTS " " FILE) the function runcmd1() handles correctly FILE and CMD containing spaces, single quote, double quote, tilde etc.
  • xruncmd1(FILE):
      safe wrapper for 'runcmd(1)'. awk exits with error if running command failed.
  • isnum.awk, implementing trivial isnum() function, see the source code.
  • alt_join.awk, implementing the following functions:
    join_keys(HASH, SEP):
      returns string consisting of all keys from HASH separated by SEP.
    join_values(HASH, SEP):
      returns string consisting of all values from HASH separated by SEP.
    join_by_numkeys (ARRAY, SEP [, START [, END]]):
      returns string consisting of all values from ARRAY separated by SEP. Indices from START (default: 1) to END (default: +inf) are analysed. Collecting values is stopped on index absent in ARRAY.

categories: Mascot,Sept,2009,PanosP

Killer Awk Snake

Panos Papadopoulos offers the latest entry in our Awk mascot competition:

Scary, yes?

categories: Tips,Sept,2009,EdM

The Secret WHINY_USERS Flag

(Editor's note: On Nov 30'09, Hermann Peifer found and fixed bug in an older version of the test code at the end of this file.)

Writing in comp.lang.awk, Ed Morton reveals the secret WHINY_USERS flag.

"Nag" asked:


    I am creating a file like...

    awk '{
     printf"%4s %4s\n",$1,$2 > "file1"
    }' input

    How can I sort file1 within awk code?

Ed Morton writes:

    There's also the undocumented WHINY_USERS flag for GNU awk that allows for sorted processing of arrays:
    $ cat file
    $ gawk '{a[$0]}END{for (i in a) print i}' file
    $ WHINY_USERS=1 gawk '{a[$0]}END{for (i in a) print i}' file

Execution Cost

Your editor coded up the following test for the runtime costs of WHINY_USERS. The following code is called twice (once with, and once without setting WHINY_USERS):

runWhin() {
WHINY_USERS=1 gawk -v M=1000000 --source '
        BEGIN { 
                M = M ? M : 50
                N = M
                print N
                while(N-- > 0) {
                        key = rand()" "rand()" "rand()" "rand()" "rand() 
                        A[key] = M - N
                for(i in A)
runNoWhin() {
gawk -v M=1000000 --source '
        BEGIN { 
                M = M ? M : 50
                N = M
                print N
                while(N-- > 0) {
                        key = rand()" "rand()" "rand()" "rand()" "rand() 
                        A[key] = M - N
                for(i in A)
time runWhin
time runNoWhin

And the results? Sorted added 15% to runtimes:

% bash

real    0m18.897s
user    0m15.826s
sys     0m2.445s

real    0m16.345s
user    0m13.469s
sys     0m2.435s

categories: Graphics,Sept,2009,TedD




My boss wants to put NOAA weather radar images in a looping presentation that is displayed as 720 video on the 1040 LCD TV in the atrium. He couldn't figure out how to download the various layers needed, so he gave me the task. Of course, I had a sample composite image for him in half an hour. It looked terrible on the TV: the writing came out as just a blur and the county and state lines (single pixel mostly) were essentially invisible. Obviously, I could make my own 'cities' overlay, but no tools I had would convert the 'counties' image to any usable vector format for line resizing.

That afternoon, I wrote a gawk script that widens the lines in a 256 color BMP version of the image - I can convert it back to a transparent background GIF later.

That script was presented in July 30, 2009. is an updated and extended version

The script widens lines in .bmp files to make them more visible when converted to TV video images. For the complete conversion, it is also necessary to mung the line colors to get rid of interpolated colors and togive some lines more contrast, but that is done elsewhere.

This script is gawk specific.



This functions converts byte strings (binary numbers) into their corresponding numeric strings so that they can be processed as gawk numbers. The lookup table (CharString) is a global variable. This code assumes that binary numbers are big-endian (most significant byte first) - it is up to the calling program to order the bytes.

On the first use, the (global) LUT is created, then left for later use. It consists of a list of characters from \000 to \777 in order - the (index value minus 1) of a character multiplied by the power of 256 corresponding to its position in the string is the byte's numerical weight. The function doesn't care about the length of the byte string (within the integer limits of the gawk version and port).

function Bytes2Number( String,  x, y, z, Number ) {
	if( !CharString ) {
		for( x = 0; x <= 255; x++ ) CharString = CharString sprintf( "%c", x )
	x = split( String, Scratch, "" )
	Number = 0
	for( y = 1; y <= x; y++ ) {
		z = index( CharString, Scratch[ y ] ) -1

		Number = Number + z * (256^(x - y))
	return Number	# Note that Number is a regular gawk scalar variable.


Uses a brute force approach to factor the image size into width and height numbers that actually match the real image size. It searches around the nominal values for a pair of numbers that, when multiplied together, produce the known size of the image in pixels.

function RealSize( Wide, High, Pixels,  x, y ) {
	for( x = Wide - 5; x <= Wide +5; x++ ) {
		for( y = High - 5; y <= High + 5; y++ ) {
			if( x * y == Pixels ) {
				Width = x
				Height = y	


It is necessary to tell gawk to read/write the file as binary, especially under Windows where ^Z in files is a killer. Setting BINMODE to 3 will also work, but it throws error messages.

Setting FS to null causes gawk to make each byte a separate field.

Testing indicates that, in Windows at least, it is necessary to specify RS, even though it would appear redundant to set it to \n - not doing so results in 0A0D being replaced with 0A in the output, with the loss of one byte for each occurance. The value is arbitrary - it has been tested using one of the line colors.

	BINMODE = "rw"
	FS= ""
    # The next two lines are not strictly necessary- 
    # there are here for clarity.
	Header = ""
	ByteCount = 0
	RS = "\n"

For Each Record...

Read the file into an array. If there are multiple lines, that is, if RS appears in the file, insert the record separator back into the array at the end of each line for which RT exists.

	for( x = 1; x <= NF; x++ ) Bytes[ ++ByteCount ] = $(x)	
	if( RT ) { Bytes[ ++ByteCount ] = RT }


Closing FILENAME here allows overwriting the original file - if that is desired, comment out the next line (which creates a new filename for the output).

Regarding image parameters: Width and Height are in pixels; Depth is the number of bytes per pixel; Data is the zero based index of the actual image in the file; Size refers to the bytes in the file, not the image; ImgSize is the number of pixels in the image. Unfortunately, Width and Height may be wrong: RealSize() calculates the actual values as found from the data block.

Once the image parameters are set, the two arrays for the image can be built: one to contain an unmodified copy (A) and one to contain a copy to be modified (B). These arrays are indexed by line and dots (Height, Width); data are complete pixels. The C array is used to determine the background color: it uses the pixel data as indexes and the count of the number of copies of that pixel as values - the largest value represents the most common color, and assuming that the image is mostly background, therefore the background color. This assumption will be true for almost all line art.

When performing line widening: for each pixel that is not part of the background, copy its color to the four surrounding pixels, provided that they are background. This approach prevents one line from encroaching on another, but does not prevent the ends of lines that do not intersect other lines from growing by one pixel on each pass through the program for each free end. u, v, w, and z (z has been reused) are the coordinates of the four pixels surrounding the one in work (defined by x and y).

	if( !OutFile ) OutFile = FILENAME
	close( FILENAME )
	sub( /[bB][mM][pP]$/, "widened.bmp" Arr[1], OutFile )
	Width = Bytes2Number( Bytes[ 22 ] Bytes[ 21 ] Bytes[ 20 ] Bytes[ 19 ] )
	Height = Bytes2Number( Bytes[ 26 ] Bytes[ 25 ] Bytes[ 24 ] Bytes[ 23 ] )
	Data = Bytes2Number( Bytes[ 14 ] Bytes[ 13 ] Bytes[ 12 ] Bytes[ 11 ] )
	Size = Bytes2Number( Bytes[ 6 ] Bytes[ 5 ] Bytes[ 4 ] Bytes[ 3 ] )
	Depth = Bytes2Number( Bytes[ 30 ] Bytes[ 29 ] ) / 8
	ImgSize = Bytes2Number( Bytes[ 38 ] Bytes[ 37 ] Bytes[ 36 ] Bytes[ 35 ] )
	RealSize( Width, Height, ImgSize / Depth )
    # Output the header in its original form to the target file.
	for( x = 1; x <= Data; x++ ) Header = Header Bytes[ x ]
	printf( "%s", Header ) > OutFile
    # Build the two arrays
	for( x = 1; x <= Height; x++) {
		for( y = 1; y <= Width; y++ ) {
			S = ""
            # Values for the A & B array entries are strings of 
            # bytes representing the color of the pixel, either directly or 
            # as a pointer into a palette.
			for( z = 1; z <= Depth; z++ ) S = S Bytes[ ++Data ]
			A[x,y] = S
			B[x,y] = S
			C[ S ]++
	z = 0
    # Bkg is the (assumed) background color.  
    # The code is a simple maximum value loop.
	for( x in C ) {
		y = C[x]
		if( y > z ) {
			Bkg = x
			z = y
   # Begin the actual line widenning code.
	for( x = 1; x <= Height; x++) {
		for( y = 1; y <= Width; y++ ) {
			if( A[x,y] !~ Bkg ) {
					u = x + 1
					v = x - 1
					w = y + 1
					z = y - 1
					if( B[u,y] ~ Bkg ) B[u,y] = A[x,y]
					if( B[v,y] ~ Bkg ) B[v,y] = A[x,y]
					if( B[x,w] ~ Bkg ) B[x,w] = A[x,y]
					if( B[x,z] ~ Bkg ) B[x,z] = A[x,y]
					if( B[u,w] ~ Bkg ) B[u,w] = A[x,y]
					if( B[u,z] ~ Bkg ) B[u,z] = A[x,y]
					if( B[v,w] ~ Bkg ) B[v,w] = A[x,y]
					if( B[v,z] ~ Bkg ) B[v,z] = A[x,y]
	for( x = 1; x <= Height; x++) {
		for( y = 1; y <= Width; y++ ) {
			printf( "%s", B[x,y] ) > OutFile

Note the final nested for loops in the above code. After the B array has been modified, the target file can be completed by reading that array out to the file pixel by pixel. The array cannot be output during processing because pixels that have already been through the processor can still be changed.


Ted Davis

categories: Sept,2009,Admin

Ethiopian Multiplication

Here is some Awk code from the Rosetta Code wiki hat multiplyes integers using only addition, doubling, and halving.


  1. Take two numbers to be multiplied and write them down at the top of two columns.
  2. In the left-hand column repeatedly halve the last number, discarding any remainders, and write the result below the last in the same column, until you write a value of 1.
  3. In the right-hand column repeatedly double the last number and write the result below. stop when you add a result in the same row as where the left hand column shows 1.
  4. Examine the table produced and discard any row where the value in the left column is even.
  5. Sum the values in the right-hand column that remain to produce the result of multiplying the original two numbers together

For example: 17 X 34

       17    34
Halving the first column:
       17    34
Doubling the second column:
       17    34
        8    68
        4   136 
        2   272
        1   544
Strike-out rows whose first cell is even:
       17    34
        8    -- 
        4   --- 
        2   --- 
        1   544
Sum the remaining numbers in the right-hand column:
       17    34
        8    -- 
        4   --- 
        2   --- 
        1   544
So 17 multiplied by 34, by the Ethiopian method is 578.

The task is to define three functions/methods/procedures/subroutines:

  1. one to halve an integer,
  2. one to double an integer, and
  3. one to state if an integer is even.


function halve(x)  { return(int(x/2)) }
function double(x) { return(x*2) }
function iseven(x) { return((x%2) == 0) }

function ethiopian(plier, plicand) {
  r = 0
  while(plier >= 1) {
    if ( !iseven(plier) ) {
      r += plicand
    plier = halve(plier)
    plicand = double(plicand)

BEGIN { print ethiopian(17, 34) }

categories: Sept,2009,Admin

A Tale of Two TAWKs

In the Awk-verse, there are two TAWKs.

TAWK #1 is the TAWK Compiler from Thompson Automation Software (no longer trading)

  • Is 100% compatible with Awk.
  • Generates executable
  • Comes with an interactive debugger
  • In some test cases, code written 4 to 15 times faster runs as fast as "C", or better.

TAWK #2 was a ultra-cut down version of AWK written in C++ by Bruce Eckel in 1989. Eckel writes:

  • The program is called TAWK for "tiny awk," since the problem it solves is vaguely reminiscent of the "awk" pattern-matching language found on Unix (versions have also been created for DOS).
  • It demonstrates one of the thornier problems in computer science: parsing and executing a programming language.
  • The data-encapsulation features of C++ prove most useful here, and a recursive-descent technique is used to read arbitrarily long fields and records.
blog comments powered by Disqus