"Cause a little auk awk
goes a long way."

 »  table of contents
 »  featured topics
 »  page tags

About Awk
 »  advocacy
 »  learning
 »  history
 »  Wikipedia entry
 »  mascot
 »  Awk (rarely used)
 »  Nawk (the-one-true, old)
 »  Gawk (widely used)
 »  Mawk
 »  Xgawk (gawk + xml + ...)
 »  Spawk (SQL + awk)
 »  Jawk (Awk in Java JVM)
 »  QTawk (extensions to gawk)
 »  Runawk (a runtime tool)
 »  platform support
 »  one-liners
 »  ten-liners
 »  tips
 »  the Awk 100
 »  read our blog
 »  read/write the awk wiki
 »  discussion news group

 »  Gawk
 »  Xgawk
 »  the Lawker library
Online doc
 »  reference card
 »  cheat sheet
 »  manual pages
 »  FAQ

 »  articles
 »  books:


Mar 01: Michael Sanders demos an X-windows GUI for AWK.

Mar 01: Awk100#24: A. Lahm and E. de Rinaldis' patent search, in AWK

Feb 28: Tim Menzies asks this community to write an AWK cookbook.

Feb 28: Arnold Robbins announces a new debugger for GAWK.

Feb 28: Awk100#23: Premysl Janouch offers a IRC bot, In AWK

Feb 28: Updated: the AWK FAQ

Feb 28: Tim Menzies offers a tiny content management system, in Awk.

Jan 31: Comment system added to For example, see discussion bottom of ?keys2awk

Jan 31: Martin Cohen shows that Gawk can handle massively long strings (300 million characters).

Jan 31: The AWK FAQ is being updated. For comments/ corrections/ extensions, please mail

Jan 31: Martin Cohen finds Awk on the Android platform.

Jan 31: Aleksey Cheusov released a new version of runawk.

Jan 31: Hirofumi Saito contributes a candidate Awk mascot.

Jan 31: Michael Sanders shows how to quickly build an AWK GUI for windows.

Jan 31: Hyung-Hwan Chung offers QSE, an embeddable Awk Interpreter.

[More ...]

Bookmark and Share

categories: Databases,Jul,2009,Admin

Awk and Databases

These pages focus on databases and Awk.

categories: Databases,Tips,Jul,2009,VictorA

Using Awk for Databases



Download all the following example code and support data files from LAWKER

General Information


This page contains a set of sample Awk scripts to manage different kinds of databases. In all cases, we'll use a text editor such as edit.exe to create and edit the data files, and Awk scripts will be used to query and manipulate the data.

OK, so it's not a fancy GUI-based system, but this method is flexible and the scripts execute relatively quickly. Also, your data won't be locked in some company's proprietary binary file format. There is also the benefit of portability: If your PC can run DOS, you can also run these scripts on your PC. Awk is also available on Linux and on other operating systems.

This page assumes that you are already familiar with database terms like 'record', 'field', and 'search keyword'.

Introduction to Awk

Awk is an interpreted programming language that is designed for managing and converting data files and generating reports from the data.

Awk will automatically read an input file and parse it into records and fields, one record at a time. A typicall Awk script will then manipulate the fields using predefined variables like $1 (the first field), $2 (the second field), etc.

To use Awk, you create an Awk script, and then run it with the Awk program (gawk.exe in this case). Many Awk scripts are small, and it lends itself to writing "one-time use" programs.

Using the Scripts

All the files on this page are available in the ZIP archive at this link. Feel free to reuse and customize them.

You will need the GNU Awk program gawk.exe to be installed on your QuickPAD Pro. See the programming page for instructions on installing GNU Awk.

Here is the general format of a gawk command line:

where SCRIPT is the name of the file that contains the Awk script and DATAFILE is the name of the text file that contains the input data.

That command line will not modify the input file and all the output will be directed to the screen.

If a script creates a new data file (for example, a sort script), the command line will be:

where NEWFILE is the name of the new data file that will be created.

If you use a particular script often and get tired of typing in a long command line, you can create a batch file to execute the long command line for you.

are currently limited to 64K files for our data. We can work around this restriction by using the chop utility program that is described in the software page.

Index Card Databases

Card File

In this section we demonstrate some Awk scripts to manage This type of database can be used for any type of simple text lists, like lists of books, music CDs, recipes, quotations, etc.

Our information will be stored into 'cards'. Each card will have a 'title' and a 'body':

	Title of Card
	Free-formatted field of 
	information about this 
	particular card, but
	without any blank lines.
Let's take this information and store it in a text file. To keep things simple, the cards within the file are separated with a blank line, and the first line of each card will be the title.

For example, let's create a sample card file called 'cards.txt' and use it to store a list of our goals.

	Write a book and become famous
	This is a long range
	goal. I need a good book
	idea first. And writing

	Solve the problems of society
	This might take
	a little longer
	than expected.

	Take out the garbage
	It's stinking up
	the garage.

Let's begin with an Awk script to print out the titles of all the cards in the file. Here is the script called 'titles':

	# titles - Print the titles of all the cards in the
	# index card file.

	BEGIN { RS = ""; FS = "\n" }
	        { print $1 }

Here is a sample run:

	[B:\] gawk -f titles cards.txt
	Write a book and become famous
	Solve the problems of society
	Take out the garbage

Another useful script is one that can be used for searching the data file, ignoring uppercase and lowercase distinctions. The following script called 'search' will display the cards that contain the keyword 'write'.

	# search - Print the index card that contains a string

	BEGIN   { RS = ""; FS = "\n"; IGNORECASE=1 }

	/write/ { print $0, "\n" }

Here is a sample run:

	[B:\] gawk -f search cards.txt
	Write a book and become famous
	This is a long range
	goal. I need a good book
	idea first. And writing


To search for other strings, edit the 'search' script and replace 'write' with another search keyword.

Sorting the cards based on the titles would also be a useful operation. Here is a script called 'sort' which reads the entire data file into and array and then uses the QuickSort algorithm to sort it:

	# sort - Sort index card file by the card titles

	BEGIN { RS = ""; FS = "\n" }

	      { A[NR] = $0 } 

	END   {
		qsort(A, 1, NR)
		for (i = 1; i <= NR; i++) {
			print A[i]
			if (i == NR) break
			print ""

	# QuickSort
	# Source: "The AWK Programming Language", by Aho,, p.161
	function qsort(A, left, right,   i, last) {
		if (left >= right)
		swap(A, left, left+int((right-left+1)*rand()))
		last = left
		for (i = left+1; i <= right; i++)
			if (A[i] < A[left])
				swap(A, ++last, i)
		swap(A, left, last)
		qsort(A, left, last-1)
		qsort(A, last+1, right)
	function swap(A, i, j,   t) {
		t = A[i]; A[i] = A[j]; A[j] = t

And here is a sample run:

	[B:\] awk -f sort cards.txt > new.txt
	[B:\] rename cards.txt cards.bak
	[B:\] rename new.txt cards.txt
	[B:\] type cards.txt
	Solve the problems of society
	This might take
	a little longer
	than expected.

	Take out the garbage
	It's stinking up
	the garage.

	Write a book and become famous
	This is a long range
	goal. I need a good book
	idea first. And writing
Note that we renamed our old data file to cards.bak, instead of deleting the file. It's always good to keep backups of old databases.

However, the 'sort' script had some trouble with large files because it reads in all the cards into an array in RAM. In my tests, the largest file I was able to sort was only about 100K.

"Flash Cards" for Memorization

Index cards can also be used for memorization. The title of the card can contain a question and the body of the card contains the answer that you want to memorize.

Let's write a program that randomly chooses a card from our 'cards.txt' file, displays its title, asks the user to press the 'Enter' key, and then displays the body of that card.

First, we need a text file which contains the questions and answers that we want to memorize. Let's name the file 'question.txt'. Note that the answer can contain multiple lines:

	What is your name?
	My name is
	Sir Lancelot
	of Camelot.

	What is your quest?
	To seek the
	Holy Grail.

	What is your favorite color?

Here is the Awk script called 'memorize'. It will read the data file into an array, randomly shuffle the array, and then it will loop through the array and display each question and answer.

	# memorize - randomly display an index card title, ask user to
	# press return, then display the corresponding body of the card

	BEGIN { RS=""; FS="\n" }

	      { A[NR] = $0 } 

	END   {
		RS="\n"; FS=" "
		shuffle(A, NR)
		for (i = 1; i <= NR; i++) {
			print "\nQUESTION: ", substr(A[i], 1, index(A[i], "\n")-1)
			printf "\nPress return for the answer: "
			getline < "-"
			print "\nANSWER: "
			print substr(A[i], index(A[i], "\n")+1)
			if (i == NR) break
			printf "\nPress return to continue, or 'q' to quit: "
			getline < "-"
			if ($1 == "q") break

	# Shuffle the array
	function shuffle(A, n,   t) {
		# Moses/Oakford shuffle algorithm
		for (i = n; i > 1; i--) {
			j = int((i-1) * rand()) + 1
			t = A[j]; A[j] = A[i]; A[i] = t

Here is a sample run. The script will randomly choose cards until it either finishes going through all the cards, or until the user enters a 'q' to quit.

	[B:\] gawk -f memorize question.txt

	QUESTION:  What is your quest?

	Press return for the answer:

	To seek the
	Holy Grail.

	Press return to continue, or 'q' to quit:

	QUESTION:  What is your favorite color?

	Press return for the answer:


	Press return to continue, or 'q' to quit:

	QUESTION:  What is your name?

	Press return for the answer:

	My name is
	Sir Lancelot
	of Camelot.
	[B:\] gawk -f memorize question.txt
	QUESTION:  What is your favorite color?
	Press return for the answer:


	Press return to continue, or 'q' to quit: q

Custom Databases

Address Book

The databases above used a simple 'index card' analogy. That data model works fine for simple lists with free form data, but there are also cases where we need to manage records with specialized data fields.

Let's create a data file and some scripts for an 'address book' database. Our data file will be a text file where every line is one record. Within a line of the file, the data will be separated into fields.

When choosing a delimiter for our fields, we need to make sure that it won't appear accidentally within a field itself. For example, an address book has fields like name, company name, address, etc., and in this case, each of those fields can contain spaces within them (e.g. "ACME Mail Order Company"). Therefore, we can't use a space to separate the fields of the line.

Instead, let's use commas to separate the fields, and we'll need a rule that commas cannot appear within a field.

Here is a sample data file called 'address.txt':

John Robinson,Koren Inc.,978 4th Ave,Boston,MA 01760,617-696-0987
Phyllis Chapman,GVE Corp.,34 Sea Drive,Amesbury,MA 01881,781-879-0900
Here is the script called 'labels' which will print all the data and format it like mailing labels:
	# labels - Format the addresses for printing labels
	# Source: blocklist.awk from "Sed & Awk", by Dale Dougherty, p.148

	BEGIN { FS = "," }

	        print ""        # blank line
	        print $1        # name
	        print $2        # company
	        print $3        # street
	        print $4, $5    # city, state zip
This is the sample run:
	[B:\] gawk -f labels address.txt
	John Robinson
	Koren Inc.
	978 4th Ave
	Boston MA 01760
	Phyllis Chapman
	GVE Corp.
	34 Sea Drive
	Amesbury MA 01881	

It may also be useful to extract just the phone numbers from our data file. Here is the script called 'phones' which will extract only the names and phone numbers from the data file:

	# phones
	# Source: phonelist.awk, from "Sed & Awk", by Dale Dougherty, p.148

	BEGIN { FS="," }

	{ print $1 ", " $6 }
Here is a sample run:
	[B:\] gawk -f phones address.txt
	John Robinson, 617-696-0987
	Phyllis Chapman, 781-879-0900
We'll also need a script to search our data file for a name. Here is a script called 'searchad' with will search for the string 'robinson':
	# searchad - Return the record that matches a string


	/robinson/ {
	        print ""        # blank line
	        print $1        # name
	        print $2        # company
	        print $3        # street
	        print $4, $5    # city, state zip

Here is a sample run:

	[B:\] gawk -f searchad address.txt

	John Robinson
	Koren Inc.
	978 4th Ave
	Boston MA 01760

Grading Program

Awk can also be used for mathematical computation of fields. Let's demonstrate this with a data file called 'grades.txt' that contains grades of students.

	Allen Mona 70 77 85 83 70 89
	Baker John 85 92 78 94 88 91
	Jones Andrea 89 90 85 94 90 95
	Smith Jasper 84 88 80 92 84 82
	Turner Dunce 64 80 60 60 61 62
	Wells Ellis 90 98 89 96 96 92

Here is a longer script that will take all the grades, average them equally, and compute the final average and the final grade for each student. At the end, it will compute some statistics about the entire class. Here is the script called 'grades'.

	# grades -- average student grades and determine
	# letter grade as well as class averages
	# Source: "Sed & Awk", by Dale Dougherty, p.192

	# set output field separator to tab.
	BEGIN { OFS = "\t" }

	# action applied to all input lines
		# add up the grades
		total = 0
		for (i = 3; i <= NF; ++i)
			total += $i
		# calculate average
		avg = total / (NF - 2)
		# assign student's average to element of array
		class_avg[NR] = avg
		# determine letter grade
		if (avg >= 90) grade="A"
		else if (avg >= 80) grade="B"
		else if (avg >= 70) grade="C"
		else if (avg >= 60) grade="D"
		else grade="F"
		# increment counter for letter grade array
		# print student name, average, and letter grade
		print $1 " " $2, avg, grade

	# print out class statistics
	END  {
		# calculate class average
		for (x = 1; x <= NR; x++)
			class_avg_total += class_avg[x]
		class_average = class_avg_total / NR
		# determine how many above/below average
		for (x = 1; x <= NR; x++)
			if (class_avg[x] >= class_average)
		# print results
		print ""
		print "Class Average: ", class_average
		print "At or Above Average: ", above_average
		print "Below Average: ", below_average
		# print number of students per letter grade
		for (letter_grade in class_grade)
			print letter_grade ":", class_grade[letter_grade]

Here is a sample run:

	[B:\] gawk -f grades grades.txt
	Allen Mona      79      C
	Baker John      88      B
	Jones Andrea    90.5    A
	Smith Jasper    85      B
	Turner Dunce    64.5    D
	Wells Ellis     93.5    A

	Class Average:  83.4167
	At or Above Average:    4
	Below Average:  2
	A:      2
	B:      2
	C:      1
	D:      1

Another useful script is the following program that computes a histogram of the grades. It is hardcoded to only read the third column ($3), but you can edit it and change it to read any of the columns in the input file. Here is the script called 'histo':

	# histogram
	# Source: "The AWK Programming Language", by Aho,, p.70

	     { x[int($3/10)]++ } # use the third column of input data

	END  {
	        for (i = 0; i < 10; i++)
	                printf(" %2d - %2d: %3d %s\n",
	                       10*i, 10*i+9, x[i], rep(x[i],"*"))
	        printf("100:      %3d %s\n", x[10], rep(x[10],"*"))

	function rep(n, s,   t) {   # return string of n s's
	        while (n--> 0)
	                t = t s
	        return t
And here is the sample run:
	[B:\] gawk -f histo grades.txt
	  0 -  9:   0
	 10 - 19:   0
	 20 - 29:   0
	 30 - 39:   0
	 40 - 49:   0
	 50 - 59:   0
	 60 - 69:   1 *
	 70 - 79:   1 *
	 80 - 89:   3 ***
	 90 - 99:   1 *
	100:        0	

The output shows that there were six grades, and most of them were in the 80-89 range.

Checkbook Program

This program takes a data file which lists your checkbook entries and your deposits, and calculates the totals.

Here is what a sample input file called 'checks.txt' looks like:

	check	1021
	to	Champagne Unlimited
	amount	123.10
	date	1/1/87

	amount	500.00
	date	1/1/87

	check	1022
	date	1/2/87
	amount	45.10
	to	Getwell Drug Store
	tax	medical

	check	1023
	amount	125.00
	to	International Travel
	date	1/3/87

	check	1024
	amount	50.00
	to	Carnegie Hall
	date	1/3/87
	tax	charitable contribution

	check	1025
	to	American Express
	amount	75.75
	date	1/5/87

Here is the script called 'check' which will calculate the totals:

	# check - print total deposits and checks
	# Source: "The AWK Programming Language", by Aho,, p.87

	BEGIN { RS=""; FS="\n" }

	/(^|\n)deposit/ { deposits += field("amount"); next }
	/(^|\n)check/   { checks += field("amount"); next }

	END   { printf("Deposits: $%.2f, Checks: $%.2f\n", 
		       deposits, checks)

	function field(name,   i, f) {
		for (i = 1; i <= NF; i++) {
			split($i, f, "\t")
			if (f[1] == name)
				return f[2]
		printf("Error: no field %s in record\n%s\n", name, $0)

And this is a sample run:

	[B:\] gawk -f check checks.txt
	Deposits: $500.00, Checks: $418.95

Importing and Exporting Data

Importing Data for use by Awk

Awk works well with data files that are stored in text files. Awk assumes that the data file is organized into records, within each record the data is divided into fields, and there are unique characters in the file that are used as the field separators and record separators.

By default, Awk assumes that newline characters are the record separators and whitespace characters (spaces and tabs) are the field separators. It is also possible to redefine the field separators to other characters, like a comma or a tab character, which means that Awk can process the commonly used "comma separated" and "tab separated" format for data files.

But note that if a file uses newline characters as record separators, it means that a newline cannot appear within a field. For example, a data file file with one record per line cannot contain a text field (e.g. a "notes" field) that contains free form text with newline characters within it. That would confuse Awk unless we added special code to handle that notes field.

The same restrictions apply to the field separators. If a file is defined to be comma separated, it means that no field is allowed to contain comma characters within it (e.g. a Name field that contains "Alvarado, Victor") because Awk would parse that as two fields, not one.

That is why tab separated files tend to be used more often. That way, the fields are allowed to contain spaces and commas.

Another way to format data for use by Awk is to use the "multiline" format, which is what we used for our index card databases above. Awk will treat each line as a field, and a blank line is the record separator.

Exporting Data to Microsoft Excel

To export data to Excel, all we need to do is to convert the data file into tab-delimited format, and store it in a text file with a *.xls extension. When that file is opened in Microsoft Windows, Excel will open it automatically as if it were a spreadsheet.

As an example, let's export our grades.txt file to Excel. Here is our 'grades.txt' file:

	Allen Mona 70 77 85 83 70 89
	Baker John 85 92 78 94 88 91
	Jones Andrea 89 90 85 94 90 95
	Smith Jasper 84 88 80 92 84 82
	Turner Dunce 64 80 60 60 61 62
	Wells Ellis 90 98 89 96 96 92

The file uses spaces as the field separator, so we'll need a script that will convert the field separators into tabs. Here is a script called 'conv2xls':

	# conv2xls - Convert a data file into tab-separated format

	        IFS=" "    # input field separator is a space
	        OFS="\t"   # output field separator is a tab

	      { print $1, $2, $3, $4, $5, $6, $7, $8 }

And here is the sample run, where we store the tab-delimited output into a text file called grades.xls:

	[B:\] gawk -f conv2xls grades.txt > grades.xls
Here is the contents of the 'grades.xls' text file:
	Allen   Mona    70      77      85      83      70      89
	Baker   John    85      92      78      94      88      91
	Jones   Andrea  89      90      85      94      90      95
	Smith   Jasper  84      88      80      92      84      82
	Turner  Dunce   64      80      60      60      61      62
	Wells   Ellis   90      98      89      96      96      92

We can then copy the grades.xls text file to a Windows PC, double-click on it, and Excel will open it as if it were a spreadsheet:

You can then do a "Save As" in Excel to save it as the regular Excel binary format.

Exporting Data to a Web Page

To export our data to a web page, we will need a script that will input our data file and generate HTML.

Let's start with our 'grades.txt' data file:

	Allen Mona 70 77 85 83 70 89
	Baker John 85 92 78 94 88 91
	Jones Andrea 89 90 85 94 90 95
	Smith Jasper 84 88 80 92 84 82
	Turner Dunce 64 80 60 60 61 62
	Wells Ellis 90 98 89 96 96 92

Here is a script called 'html' that will do the conversion. Note that the data will appear as rows of a table in HTML.

	# html - Convert a data file into an HTML web page with a table
		print "<HTML><HEAD><TITLE>Grades Database</TITLE></HEAD>"
		print "<BODY BGOLOR=\"#ffffff\">"
		print "<CENTER><H1>Grades Database</H1></CENTER>"
		print "<HR noshade size=4 width=75%>"
		printf "<TR><TH>Last<TH>First"
		print "<TH>G1<TH>G2<TH>G3<TH>G4<TH>G5<TH>G6"
	      { # Print the data in table rows
		printf "<TR><TD>" $1 "<TD>" $2 
		printf "<TD>" $3 "<TD>" $4 "<TD>" $5 
		print  "<TD>" $6 "<TD>" $7 "<TD>" $8 
	END   {
		print "</TABLE></CENTER><P>"
		print "<HR noshade size=4 width=75%>"
		print "</BODY></HTML>"

Here is the sample run. The output will be placed in a file called 'grades.htm'.

	[B:\] gawk -f html grades.txt > grades.htm

This is what the resulting 'grades.htm' file looks like:

	<HTML><HEAD><TITLE>Grades Database</TITLE></HEAD>
	<BODY BGOLOR="#ffffff">
	<CENTER><H1>Grades Database</H1></CENTER>
	<HR noshade size=4 width=75%>
	<HR noshade size=4 width=75%>

And here is a link to the grades.htm file so you can see what the web page looks like in your browser.

Exporting Data to a Palm Pilot

First, we will need to install a database program on the Palm. There are several database programs to choose from, but let's use the freeware database program called Pilot-DB (available here from PalmGear).

Next, we will need the freeware DOS tools that come with Pilot-DB to help us create the PDB data file. The DB-tools package is available here at PalmGear. You can download it and install it on your Windows PC. Those are DOS tools, but they were compiled to run in DOS under Windows, so we can't run them on the QuickPAD Pro. (Note: DB-tools is an open source project, so the source code is available.)

The DB-tools package contains a program called 'csv2pdb.exe'. It will do the conversion into a Palm PDB file.

Let's use the 'grades.txt' data file as an example:

	Allen Mona 70 77 85 83 70 89
	Baker John 85 92 78 94 88 91
	Jones Andrea 89 90 85 94 90 95
	Smith Jasper 84 88 80 92 84 82
	Turner Dunce 64 80 60 60 61 62
	Wells Ellis 90 98 89 96 96 92

Before we can run the 'csv2pdb.exe' program we first need to convert our data into "csv" (comma separated values) format. We can do that with the following awk script called 'conv2csv':

	# conv2csv - Convert a data file into comma-separated format

	        IFS=" "    # input field separator is a space
	        OFS=","    # output field separator is a comma

	      { print $1, $2, $3, $4, $5, $6, $7, $8 }

Here is the command line to create the comma-delimited data file, which we will call 'grades.csv':

	[B:\] gawk -f conv2csv grades.txt > grades.csv

This is what the 'grades.csv' file looks like:


Next, we need to create an "info" file which will describe the format of our data. The 'csv2pdb.exe' program will need this information for the conversion to Palm format.

The info file will give our database a title and describe the fields of each record. In grades.csv, the first field is the student's last name, the second field is the student's first name, and the other six fields are the grades. Here is the resulting info file called 'grades.ifo':

	title "GradesDB"
	field "Last" string 38
	field "First" string 38
	field "G1" integer 14
	field "G2" integer 14
	field "G3" integer 14
	field "G4" integer 14
	field "G5" integer 14
	field "G6" integer 14
	option backup on

The numbers at the end of the lines are the field widths in pixels; we can make a guess for the field widths, and then fine-tune them on the Palm Pilot. The last line will set the backup bit on the PDB file so that it will be backed up at every hotsync.

From this point on, the rest of the steps must be done on your Windows PC.

On Your Windows PC

Now we create the PDB file on our PC with this command line:

C:\> csv2pdb -i grades.ifo grades.csv grades.pdb C:\>

It will create a new file called 'grades.pdb' in the current directory. This is the Palm database file.

The last step is to install the PDB file to the Palm Pilot: in the Windows Explorer double-click on the PDB file and then hotsync your Palm Pilot as usual.

Here is a screen shot of the Palm Pilot running Pilot-DB with our grades database. (Make sure you have selected the blank unnamed view from menu at the top-right corner of the screen):

As you can see, storing data as text files gives you a lot of flexibility in manipulating the data and exporting it to other formats.


Victor Alvarado

categories: Databases,Spawk,Dec,2009,PanosP

Spawk for SUSE Linux

I've just installed the openSUSE Milestone 8 (11.2) in a virtual machine in my PC.

In about half an hour, I've also downloaded MySQL, gawk sources and SPAWK (SQL + AWK) sources, compiled and build the SPAWK libraries (/usr/lib/ and /usr/lib/

I've tested the module and worked just fine, so I've uploaded the binary tarball for this distro in SPAWK project (

Have a Happy New Year!
-- Panos Papadopoulos to tim

categories: Databases,Spawk,Jul,2009,PanosP

SPAWK moves to GoogleCode

Panos I. Papadopoulos reports that he has moved the SPAWK project (SQL and AWK) to Mercurial and

He has also written extensive tutorial notes at the SPAWK wiki.

categories: Spawk,Databases,Jul,2009,PanosP

SQL Powered AWK



Panos I. Papadopoulos (


SPAWK is an elegant collection of functions for accessing and updating MySQL databases from within GNU awk programs. The SPAWK module consists of a single awk extension library, namely, which may be loaded in awk programs using the standard extension awk function:

   extension("", "dlload")

A Short Example

Here's a short example of using SPAWK (for more details, see

When calling spawk_select, SPAWK sends the query already given (maybe some spawk_query calls preceeded the spawk_select) to the current server (remind you that "server" in SPAWK's point of view is a connection to the actual MySQL server mysqld). After calling spawk_select, the server is ready to return the results to the awk process via spawk_data, spawk_first or spawk_last calls. Alternatively, at any time we can clear the results' set and release the server with a spawk_clear function call.

The main data receiver is spawk_data function. This function is usually called with one or two arguments. The first argument is an array to be used as a data transfer vehicle, while the second argument may be used optionally to hold the null valued columns. spawk_data returns the number of columns of each returned data row or zero if there are no more data to return (EOD). spawk_first function's arguments and return values are exactly the same as those of spawk_data arguments and returns values, but the rest of the data will be lost, that is get the next available data row and release the server. Similar is the spawk_last function, but the row returned is the last row of the results' set. By the way, the spawk_last function is less efficient than spawk_first; actually, there is no particular reason to call spawk_last at all! Let's see some examples:

     extension("", "dlload")
     SPAWKINFO["database"] = "information_schema"
     while (spawk_data(data))
          print data[0]

Things need to be explained:

  • extension is used to load the SPAWK module.
  • SPAWKINFO array is used to specify the default database (schema) to connect. Index "database" denotes default database.
  • spawk_select is used to execute the desired query.
  • spawk_data is used repeatedly to get the results, row by row. spawk_data returns 2 as there are results to be retuned and 0 on EOD.

categories: Databases,Oct,2009,ScottS

A MySql Client



Download from LAWKER.


Set Up

    if (!mysql["path"]) {
        mysql["path"] = "/usr/bin/mysql"
    if (mysql["user"]) mysql["user"] = "-u" mysql["user"]
    if (mysql["pass"]) mysql["pass"] = "-p" mysql["pass"]

    if (!mysql["tempfile_command"]) {
        mysql["tempfile_command"] = "mktemp /tmp/__mysql.awk.XXXXXX"
    mysql["resource_id"] = 1
    __mysql_dequote["r"]  = "\r"
    __mysql_dequote["n"]  = "\n"
    __mysql_dequote["t"]  = "\t"
    __mysql_dequote["\\"] = "\\"

Main Functions

function mysql_db (db)      { mysql["database"] = db    }
function mysql_path (path)  { mysql["path"]     = path  }

function mysql_tempfile_command (command) {
    mysql["tempfile_command"] = command
function mysql_login (username, password, host, args) {
    mysql["user"] = "-u" username
    mysql["pass"] = "-p" password
        if (host) mysql["host"] = "-h" host
        if (args) mysql["args"] = args
function mysql_query (query    ,input,key,i,call,resource) {
    resource = mysql["resource_id"]++
    mysql["tempfile_command"] | getline mysql[resource]
    call = sprintf("%s %s %s %s %s %s > %s",
            mysql["path"], mysql["user"], mysql["pass"], mysql["host"],
                        mysql["args"], mysql["database"],
    print query | call
    if (getline input < mysql[resource]) {
        for (i = split(input, key, "\t"); i > 0; i--)
            mysql[resource, i] = key[i]
    return resource
function mysql_fetch_assoc (resource,row  ,input,i,fields) {
    fields = 0
    if (getline input < mysql[resource]) {
        fields = mysql_split(row, input)
        for (i = 1; i <= fields; i++)
            row[mysql[resource, i]] = row[i]
    return fields
function mysql_split (row, input,   r,i) {
     r = split(input, row, "\t")
     for (i = 0; i <= r; i++) {
         row[i] = mysql_dequote(row[i])
     return r
function mysql_fetch_row (resource,row  ,input,r,i) {
    if (getline input < mysql[resource]) {
        return mysql_split(row, input)
    return 0
function mysql_index (resource, id) {
    return mysql[resource, id]
function mysql_finish (resource, i) {
    system(sprintf("rm %s", mysql[resource]))
    delete mysql[resource]
    i = 1
    while (mysql[resource,i])
        delete mysql[resource, i++]
function mysql_cleanup (  i) {
    for (i = 1; i < mysql["resource_id"]; i++)
        if (mysql[i]) {
            system(sprintf("rm %s", mysql[i]))
            delete mysql[resource]
            i = 1
            while (mysql[resource,i])
                delete mysql[resource, i++]

Support Utils

Scan a string for mysql escaped tokens and replace them with the appropriate character. This is a fairly slow operation for large strings but it's necessary.

function mysql_dequote (string, result,i,l,c) {
    result = ""
    l = length(string)
    for (i = 1; i <= l; i++) {
        c = substr(string, i, 1)
        if (c == "\\") {
            # This simply shouldn't happen...
            ## if ((i + 1) == l) continue;
            c = substr(string, ++i, 1)
            result = result __mysql_dequote[c]
        else {
            result = result c
    return result
function mysql_quote (string,   result) {
    gsub(/\\/, "\\\\", string)
    gsub(/'/, "\\'", string)
    return "'" string "'"


"THE BEER-WARE LICENSE" (Revision 43) borrowed from FreeBSD's jail.c: wrote this file. As long as you retain this notice you can do whatever you want with this stuff. If we meet some day, and you think this stuff is worth it, you can buy me a beer in return.


Scott S. McCoy

categories: Databases,Jul,2009,CarloS


By Carlo Strozzi (

NoSQL is a fast, portable, relational database management system without arbitrary limits, (other than memory and processor speed) that runs under, and interacts with, the UNIX Operating System. It uses the "Operator-Stream Paradigm" described in Unix Review (March, 1991, page 24, "A 4GL Language") where there are a number of "operators" that each perform a unique function on the data. These operators are written in Awk and C, designed to be lightweight Operators will have to be lightweight ones (have a small memory footprint and allows fast startup of the command).

The main reason why NoSQL decided to turn an original RDB system into NoSQL is precisely that the former is entirely written in Perl. Perl is a good programming language for writing self-contained programs, but its pre-compilation phase and long start-up time are worth paying only if once the program has loaded it can do everything in one go. This contrasts sharply with the Operator-stream Paradigm, where operators are chained together in pipelines of two, three or more programs. The overhead associated with initializing Perl at every stage of the pipeline makes pipelining Perl inefficient. A better way of manipulating structured ASCII files is to use the AWK programming language, which is much smaller than Perl, is more specialized for this task, and is very fast at startup.

For more information on NoSQL, see the NoSQL home page.

blog comments powered by Disqus