Awk.Info

"Cause a little auk awk
goes a long way."

About awk.info
 »  table of contents
 »  featured topics
 »  page tags


About Awk
 »  advocacy
 »  learning
 »  history
 »  Wikipedia entry
 »  mascot
Implementations
 »  Awk (rarely used)
 »  Nawk (the-one-true, old)
 »  Gawk (widely used)
 »  Mawk
 »  Xgawk (gawk + xml + ...)
 »  Spawk (SQL + awk)
 »  Jawk (Awk in Java JVM)
 »  QTawk (extensions to gawk)
 »  Runawk (a runtime tool)
 »  platform support
Coding
 »  one-liners
 »  ten-liners
 »  tips
 »  the Awk 100
Community
 »  read our blog
 »  read/write the awk wiki
 »  discussion news group

Libraries
 »  Gawk
 »  Xgawk
 »  the Lawker library
Online doc
 »  reference card
 »  cheat sheet
 »  manual pages
 »  FAQ

Reading
 »  articles
 »  books:

WHAT'S NEW?

Mar 01: Michael Sanders demos an X-windows GUI for AWK.

Mar 01: Awk100#24: A. Lahm and E. de Rinaldis' patent search, in AWK

Feb 28: Tim Menzies asks this community to write an AWK cookbook.

Feb 28: Arnold Robbins announces a new debugger for GAWK.

Feb 28: Awk100#23: Premysl Janouch offers a IRC bot, In AWK

Feb 28: Updated: the AWK FAQ

Feb 28: Tim Menzies offers a tiny content management system, in Awk.

Jan 31: Comment system added to awk.info. For example, see discussion bottom of ?keys2awk

Jan 31: Martin Cohen shows that Gawk can handle massively long strings (300 million characters).

Jan 31: The AWK FAQ is being updated. For comments/ corrections/ extensions, please mail tim@menzies.us

Jan 31: Martin Cohen finds Awk on the Android platform.

Jan 31: Aleksey Cheusov released a new version of runawk.

Jan 31: Hirofumi Saito contributes a candidate Awk mascot.

Jan 31: Michael Sanders shows how to quickly build an AWK GUI for windows.

Jan 31: Hyung-Hwan Chung offers QSE, an embeddable Awk Interpreter.

[More ...]

Bookmark and Share

categories: Sitemap,Apr,2009,Admin

Featured Topics

These pages are grouped into the topics, listed below (latest one shown first):


categories: Sitemap,Apr,2009,Admin

Table of Contents

210 pages.

3300 million characters (in an Awk string)
999 bottles of beer
AA Tale of Two Tawks
A Web Server in Awk
Advocacy
Amazing Awk Assembler
Amazing Awk Formatter
An Awk Dungeon Adventure Game
Argcol
Arnold Robbins
array.awk
Automated Results Verification
Awk + Ansi-C = OO
Awk and Mail
Awk Cookbook Project
Awk for AI
Awk for Chemical Engineers
Awk for Engineering
Awk for Mechanical Engineers
Awk for system programming
Awk Games
Awk Mug
Awk on Android
Awk snake
Awk's Equivalent to VI's J
Awk++
Awk-Linux
Awk.info
Awk.info Gaining Popularity
Awk100
Awkbot:
Awklisp
awkwords
BBasebase Sim
Brainfuck to C
Building Interpreters with A*
CCheckers
Coding
Columnate
Community
Contact
Convert Code Comments to Latex
Correlation between numbers
Credits
DDatabases
Davinci mascot
Debugger and Assertion Checker
Domain-Specific Langauges
EEd Morton
Eliza
Errata: WHINY_USERS slows down Awk
Ethiopian Multiplication
Explaining Awk One Liners
FFast Clustering
Faster Hashing in Mawk
Finite State Machine Generator
Forloops
Format Shell Scripts
Four Keys to Awk
Functional Challenge
Functional Enumeration
Functional Gawk
GGenerating random sigs
Get YouTube Vids
Getline
GetXML.awk
Graph
Great Auks
HHandy One Liners
Hiding Email Address
History
Holiday date routines
How to call Awk form "C" with Libmawk
How to contribute
How to Read Minds
IIn praise of scripting
Interpreters
Interview with Arnold Robbins
Intrusion alert normalization
IRC agent in AWK
Issue report mining
JJawk = Java + Awk
Jim Hart
join.awk
LLanguage Analysis
Learning Awk
levenshtein.awk
Lexical and Grammar Analysis
List of Tags
Mm1 : simple macros
m5 : macro processor
Macro pre-processors
Mail sort
Markdown
Mascot
Mastermind
Mastermind (again)
Mawk: faster than C, C++, Java, Perl, Ruby....
md2html : Update to Markdown.awk
MicroTracer
Mike Langman
Monty Hall Problem
Moving Files with Awk
Music analysis workbench
Music tools and Awk
MySql
NNaive Bayes Classifier
Negotiate
Network monitoring in Awk
New AWK debugger
New Awk Mascot ('AWK-eye the Dwarf)?
New mascot
NoSQL
OOne Liners
OO tools in Awk
Operating Systems and Awk
PParallel Awk
Parser Generator
Patent search for genes or proteins
Playing music
Postscript tricks
Predicting Gender
Pretty print
Print an Array
Print Some Postscript Pages
Printing ranges
Processing Bitmaps in Gawk
Project Tools
QQSE: an embeddable Awk Interpreter
QTawk
Quicksort
Quicksort2
RRandom Numbers in Gawk
Reading RSS feeds
Regular Expression Matching Can Be Simple And Fast
Resistor Calculation
Reverse Postscript pages
Ronald Loui
Rot13 in Awk
runawk
Runawk 0.16
Runawk 0.17
Runawk 0.18
Runawk 0.19
SSamples of AWK
Sed in Awk
Sed to Awk
Sed-clones (in Awk)
Shorten my pipes
Shuffle.awk
Simple Awk GUIs in Windows
Simple Stream Editor
Simulations Unicast Applications
Soccer
Sorting
Sorting Arrays Via the Shell
Spam Filtering
Spawk for SuSE Linux
Spawk in GoogleCode
spell.awk
spellcheck.awk
Spreadsheets and Awk.
SQL Powered AWK
Steffen Schuler
Sudoku
Super-For loops
Sys Admin tricks in Awk
SysAdmins: Awk is Your Friend
TTable of Contents
Teaching Awk
Template-driven programming
Ten Liners
Tex-to-bilingual Dictionary
Text Mining
Text Munging
The Awk Book's Code
The Secret WHINY_YSERS flag
The TinyTim Content Management System
Tic-tac-toe
Tim Menzies
Top 10 pages
Top 10 posters
Top 10 posters last year
Top 10 subjects
Top 10 subjects last year
Topics
Towers of Hanoi
UUML: sequence diagrams
Unit tests
Using Awk for Databases
Using field names to reference columns
VVerification
Visual Awk
WWaclaw Sierpinski's Triangle
Why Gawk?
Widen bitmaps, using Gawk
Word-processing in Awk
Writing SciFi
XXgawk for Windows
XML and Awk
XML: Checking for Well-Formednes
XML: Dealing with DTDs
XML: Display components
XML: printing an outline
XML: pulling out data
XMLgawk
xmlparse.awk
Xmonth: Gawk+X-windows GUI
YYawk
ZZipf's Law

categories: Sitemap,Apr,2009,Admin

Page Tags

NumberTag
42 Admin
24 Tools
24 Awk100
19 Timm
13 Tips
13 TenLiners
12 ArnoldR
11 Top10
11 Papers
10 XML
10 Ronl
10 Misc
10 June
10 Games
9 Who
9 Dsl
8 Learn
7 Wp
7 Project
7 EdM
7 Databases
6 TextMining
6 Sept
6 Interpreters
5 Xgawk
5 WhyAwk
5 Steffen
5 Runawk
5 Os
5 Mascot
5 JurgenK
5 AlexC
4 Verification
4 SysAdmin
4 Sorting
4 Sed
4 PanosP
4 Newsgroup
4 Jimh
4 HenryS
4 Funky
4 Engineering
3 Spawk
3 Sitemap
3 Ps
3 Oo
3 OneLiners
3 Music
3 Mawk
3 Mail
3 Macros
3 Contribute
3 Arrays
2 Ysa
2 TedD
2 Spell
2 Sigs
2 ScottS
2 MichealS
2 MichaelS
2 MartinC
2 JonB
2 JesusG
2 Graphics
2 GrantC
2 GUI
2 Function
2 Eliza
2 DonaldM
2 DavidL
2 DariusB
2 BrianK
2 AwkLisp
2 Anon
2 AaronH
1 Zazzle
1 YungC
1 Yawk
1 YasumasaS
1 WolfganZ
1 WmM
1 WimVB
1 WillW
1 WilhelmW
1 Web
1 WWW
1 VictorA
1 VenkatesanS
1 TimS
1 TimM
1 TiborP
1 TerryB
1 Sudoku
1 StevenH
1 SteveL
1 SteveJ
1 SteveC
1 StephenJ
1 Stats
1 ScottP
1 SallyF
1 RussC
1 Rss
1 PremyslJ
1 PierreG
1 PhilipB
1 PeterW
1 PeterK
1 PeterI
1 PPuri
1 OsamuA
1 News
1 NelsonB
1 Negotiate
1 Name
1 MikhailA
1 Mikel
1 MartinF
1 MarkB
1 M0J0
1 LotharS
1 Libmawk
1 KimD
1 KennyM
1 JuergenK
1 JohnF
1 JohnD
1 JiirL
1 JanisP
1 JanW
1 JamesL
1 JMellander
1 Irc
1 HyungC
1 HiroS
1 HermannP
1 GregoryG
1 Getline
1 GerardH
1 Forloop
1 Errata
1 EricP
1 EisaA
1 DickL
1 DebbieF
1 DavidH
1 Dates
1 DataMining
1 DanN
1 Dab
1 Cookbook
1 CarloS
1 CMS
1 BrianJ
1 BrendanO
1 Boris
1 BobO
1 BillP
1 Baseballsim
1 BalkhisB
1 Awk
1 Argcol
1 April
1 Android
1 AlfredA
1 AlexS
1 AlexR
1 AlanL
1 ALahm

categories: Awk100,Jan,2009,Admin

The Awk 100

Goals

Awk is being used all around the world for real programming problems, but the news is not getting out.

We are aiming to create a database of at least one hundred Awk programs which will:

  • Identify the tasks that Awk is really being used for
  • Enable analysis of the benefits of the language for practical programming
  • Serve as an information exchange for applications

Contribute

If you, or your colleagues or friends have written a program which has been used for purposes small or large, why not take five minutes to record the facts, so that others can see what you've done?

To contribute, fill in this template and mail it to mail@awk.info with the subject line Awk 100 contribution.

Current Listing

(Recent additions are shown first.)

  1. A. Lahm and E. de Rinaldis' Patent Matrix
    • PatentMatrix is an automated tool to survey patents related to large sets of genes or proteins. The tool allows a rapid survey of patents associated with genes or proteins in a particular area of interest as defined by keywords. It can be efficiently used to evaluate the IP-related novelty of scientific findings and to rank genes or proteins according to their IP position.
  2. P Janouch's AWK IRC agent:
    • VitaminA IRC bot is an experiment on what can be done with GNU AWK. It's a very simple though powerful scripting language. Using the coprocess feature, plugins can be implemented very easily and in a language-independent way as a side-effect. The project runs only on Unix-derived systems.
  3. Stephen Jungels' music player:
    • Plaiter (pronounced "player") is a command line front end to command line music players. What does Plaiter do that (say) mpg123 can't already? It queues tracks, first of all. Secondly, it understands commands like play, plause, stop, next and prev. Finally, unlike most of the command line music players out there, Plaiter can handle a play list with more than one type of audio file, selecting the proper helper app to handle each type of file you throw at it.
  4. Dan at sourceforge's Jawk system:
    • Awk, impelemeneted in the Java virtual machine. Very useful for extending lightweight scripting in Awk with (e.g.) network and GUI facilities from Java.
  5. Axel T. Schreiner's OOC system:
    • ooc is an awk program which reads class descriptions and performs the routine coding tasks necessary to do object-oriented coding in ANSI C.
  6. Ladd and Raming's Awk A-star system:
    • Programmers often take awk "as is", never thinking to use it as a lab in which we can explore other language extensions. This is of course, only one way to treat the Awk code base. An alternate approach is to treat the Awk code base as a reusable library of parsers, regular expression engines, etc etc and to make modifications to the lanugage. This second approach was take by David Ladd and J. Christopher Raming in their A* system.
  7. Henry Spencer's Amazing Awk Syntax Language system:
    • Aaslg and aaslr implement the Amazing Awk Syntax Language, AASL (pro- nounced ``hassle''). Aaslg (pronounced ``hassling'') takes an AASL specification from the concatenation of the file(s) (default standard input) and emits the corresponding AASL table on standard output.
    • The AASL implementation is not large. The scanner is 78 lines of awk,the parser is 61 lines of AASL (using a fairly low-density paragraphing style and a good manycomments), and the semantics pass is 290 lines of awk. The table interpreter is 340 lines, about half of which (and most of the complexity) can be attributed to the automatic error recovery.
    • As an experiment with a more ambitious AASL specification, one for ANSI C was written. This occupies 374 lines excluding comments and blank lines, and with the exception of the messy details of C declarators is mostly a fairly straightforward transcription of the syntax given in the ANSI standard.
  8. Jurgen Kahrs (and others) XMLgawk system:
    • XMLgawk is an experimental extension of the GNU Awk interpreter. It includes a small XML parsing library which is built upon the Expat XML parser.
    • The same tool that can load the XML shared library can also add other libraries (e.g. PostgreSQL).
  9. Henry Spencer's Amazing Awk Assembler
    • "aaa" (the Amazing Awk Assembler) is a primitive assembler written entirely in awk and sed. It was done for fun, to establish whether it was possible. It is; it works. Using "aaa", it's very easy to adapt to a new machine, provided the machine falls into the generic "8-bit-micro" category.
  10. Ronald Loui's AI programming lab.
    • For many years, Ronald Loui has taugh AI using Awk. He writes:
      • Most people are surprised when I tell them what language we use in our undergraduate AI programming class. That's understandable. We use GAWK.
      • A repeated observation in this class is that only the scripting programmers can generate code fast enough to keep up with the demands of the class. Even though students were allowed to choose any language they wanted, and many had to unlearn the Java ways of doing things in order to benefit from scripting, there were few who could develop ideas into code effectively and rapidly without scripting.
      • What I have found not only surprising but also hopeful, is that when I have approached the AI people who still enjoy programming, some of them are not the least bit surprised.
  11. Henry Spencer's Amazing Awk Formatter.
    • Awf may not be lightning fast, and it has certain restrictions, but it does a decent job on most manual pages and simple -ms documents, and isn't subject to AT&T's brain-damaged licensing that denies many System V users any text formatter at all. It is also a text formatter that is simple enough to be tinkered with, for people who want to experiment.
  12. Yung-Pin Cheng's Awk-Linux Course ware.
    • The stable and cross-platform nature of Awk enabled the simple creation of a robust toolkit for teaching operating system concepts to university students. The toolkit is much simpler/ easier to port to new platforms, than alternative and more elaborate course ware tools.
    • This work was the basis for a quite prestigious publication in the IEEE Transactions on Education journal, 2008, Vol 51, Issue 4. Who said Awk was an old-fashioned tool?
  13. Jon Bentley's m1 micro macro processor.
    • Supports the essential operations of defining strings and replacing strings in text by their definitions. All in 110 lines. A little awk goes a long way.
  14. Arnold Robbins and Nelson Beebe's classic spell checker
    • A powerful spell checker, and a case-study on how to best write applications using hundreds of lines of Awk.
  15. Jim Hart's awk++
    • An object-oriented Awk.
  16. Wolfgan Zekol's Yawk
    • WIKI written in Awk
  17. Darius Bacon: AwkLisp
    • LISP written in Awk
  18. Bill Poser: Name
    • Generate TeX code for a bilingual dictionary.
  19. Ronald Loui: Faster clustering
    • Demonstration to DoD of a clustering algorithm suitable for streaming data
  20. Peter Krumin: Get YouTube videos
    • Download YouTube videos
  21. Jim Hart: Sudoku
    • Solve sudoku puzzles using the same strategies as a person would, not by brute force.
  22. Ronald Loui: Anne's Negotiation Game
    • Research on a model of negotiation incorporating search, dialogue, and changing expectations.
  23. Ronald Loui: Baseball Sim
    • A baseball simulator for investigating the efficiency of batting lineups.
  24. Ronald Loui: Argcol
    • A tool inspired by fmt that could be used while working in vi to maintain a multi-column pro-con argument format.

categories: SysAdmin,Oct,2009,Admin

Sys Admin

These pages focus on sys admin tools in Awk.


categories: Top10,Mar,2009,Admin

Top 10

The Awk.info Top 10 pages highlights the "best" (most impressive, most insightful, most fun, most visited) pages on this site.


categories: Who,Feb,2009,Admin

Credits

Awk.info is maintained by the international awk community. There are many ways you can contribute and get listed below.


categories: Who,Feb,2009,Admin

Great Auks: awk.info's ringmasters

Some must lead, some must follow, and some have to fix the typos.

A Great Auk is someone with write permission to our repository. Since the source for this web site is stored in that repoistory, it also means that they are webmasters of this site. So they (try) to:

  1. keep the code and pages in a (somewhat) consistent form,
  2. encourage code documentation and test suites,
  3. watch comp.lang.awk for cool stuff to add to this site,
  4. write little demo programs,
  5. handle queries about this site,
  6. work the issue reports,
  7. etc.

If you want to be a Great Auk, please start contributing to this site using any of the usual methods. Once it is clear that you know what you are doing and that you play nice with others, then you should ask a current Great Auk to nominate you. Then, all the current Great Auks will vote about giving your write access.

The current Great Auks are


categories: Misc,WhyAwk,Jan,2009,Admin

Awk Advocacy

"Because easy is not wrong." - Anon

From various sources:

Quotes:

  • "Listen to people who program, not to people who want to tell you how to program."
    - Ronald P. Loui
  • "Good design is as little design as possible."
    - Dieter Rams
  • "When we have on occasion rewritten an Awk program in a conventional programming language like C or C++, the result was usually much longer, and much harder to debug."
    - Arnold Robbins & Nelson Beebe

From Project Management Advice:

  • More programming theory does not make better programmers.
  • Don't let old/compiler people tell you what language to use.
  • If there is already a way of doing something, do not invent a harder way.

From Awk programming:

  • Awk is a simple and elegant pattern scanning and processing language.
  • Awk is also the most portable scripting language in existence.
  • But why use it rather than Perl (or PHP or Ruby or...):
    • Awk is simpler (especially important if deciding which to learn first);
    • Awk syntax is far more regular (another advantage for the beginner, even without considering syntax-highlighting editors);
    • You may already know Awk well enough for the task at hand;
    • You may have only Awk installed;
    • Awk can be smaller, thus much quicker to execute for small programs.

From Awk as a Major Systems Programming Language:

  • Effective use of its data structures and its stream-oriented structure takes some adjustment for C programmers, but the results can be quite striking.

According to Ramesh Natarajan:

  • AWK is a superb language for testing algorithms and applications with some complexity, especially where the problem can be broken into chunks which can streamed as part of a pipe. It's an ideal tool for augmenting the features of shell programming as it is ubiquitous; found in some form on almost all Unix/Linux/BSD systems. Many problems dealing with text, log lines or symbol tables are handily solved or at the very least prototyped with awk along with the other tools found on Unix/Linux systems.

From the NoSQL pages:

  • (Other languages like Perl is) a good programming language for writing self-contained programs, but pre-compilation and long start-up time are worth paying only if once the program has loaded it can do everything in one go. This contrasts sharply with the Operator-stream Paradigm, where operators are chained together in pipelines of two, three or more programs. The overhead associated with initializing (say) Perl at every stage of the pipeline makes pipelining inefficient. A better way of manipulating structured ASCII files is to use the AWK programming language, which is much smaller, more specialized for this task, and is very fast at startup.

categories: Misc,Jan,2009,Admin

Community

To join our community, consider contributing to this site.

For a list of authors of this site, see our credits pages.

The Awk Wiki.

USENET discussion group: comp.lang.awk.


categories: Misc,Jan,2009,Admin

Contact

For discussions on Awk, see the Awk discussion group.

For comments/ complaints/ corrections/ extensions to this site, contact mail@awk.info.


categories: Misc,Jan,2009,Admin

Welcome to the Awk Community Portal

Awk is a stable, cross platform computer language named for its authors Alfred Aho, Peter Weinberger & Brian Kernighan. They write: "Awk is a convenient and expressive programming language that can be applied to a wide variety of computing and data-manipulation tasks".

In Classic Shell Scripting, Arnold Robbins & Nelson Beebe confess their Awk bias: "We like it. A lot. The simplicity and power of Awk often make it just the right tool for the job."

Besides the Bourne shell, Awk is the only other scripting language available in the standard Unix environment. Implementations of AWK exist as installed software for almost all other operating systems.

Awk is a mature language- it was first implemented in the 1970s. As a tool from the golden age, it is sometimes called primitive. It is more accurate to call it elemental, so tightly focused is the language on what it does best: quickly converting this into that.

Consequently, throughout history, Awk has been the language of choice for many famous scientists such as Leonardo daVinci.



categories: Misc,Jan,2009,Admin

Code

LAWKER is a repository of Awk code divided into:

fridge
Fresh code (for the current trunk). Best place to start is fridge/gawk.
block
Place to chop up and experiment with code. Usually, avoid this one.
freezer
Frozen code. place to store tags. Currently, empty. But we plan to grow this one.
wiki
Wiki pages. Useful for documentation but, where possible, use the in-line pretty print method, described below.

How to contribute to LAWKER

See How to Contribute.

How to report bug

Use our issue tracking system.


categories: Mascot,Misc,Jan,2009,Admin

Mascot

Missing: the Awk Mascot

Many communities have a mascot, a banner that they proudly wave high. So where's the Awk mascot?

I made on up, but you gotta say, it is kinda lame:

So you have any ideas for such a mascot, please email mail@awk.info with the subject line "suggestion for mascot".

Not to stiffle anyone's creativity but the mascot might be based on the mantra "less, but better" or "easy is not wrong" or "a little awk goes a long way".

Current Offerings

Chris Johnson

Chris writes "more of a logo rather than a mascot":

Other Mascots

Lisp: Aliens

Perl: Camel

Linux: Tux

Java: Duke


categories: Verification,Jul,2009,Admin

Awk and Verification

These pages focus on program verification tools, written in Awk.


categories: Databases,Jul,2009,Admin

Awk and Databases

These pages focus on databases and Awk.


categories: Games,Apr,2009,Admin

Awk Games

These pages focus on games, written in Awk.


categories: Nov,2009,Admin

Awk.info Gaining Popularity

Nov 28, 2009

This site is moving up the page rankings:

  • Four months ago, typing "awk" into Google resulted in pages of output where the first mention of awk.info did not appear till half-way down page five.
  • Today, the same query finds "awk.info" on page two (in position 18).

Other indicators also look good. Since the site was launched (Feb 15, 2009), the number of visits has been steadily increasing:

These 19,268 visits come from 2,765 cities:

Apart from Granville West Virginia (where this site is administrated), the three cities with the most visits are:
  • London, England: 389 visits;
  • Thessaloniki, Greece: 491 visits;
  • Athens, Greece: 640 visits.

(BTW: Anyone got any ideas why these cities visit here so often?)

In other news, Website Outlook reports that:

  • This site is now worth $1423.5 USD.
  • And the daily ad revenue stream from awk.info would be $1.95.

To put that report in perspective, the same source notes that:

  • rottentomatoes.com is worth $3,300,000.
  • And the daily revenue stream from that site would be $4523.19.

categories: News,Mar,2009,Admin

The Awk Book's Code

Brian Kernighan has granted permission for this site to host the code from the original Awk book:

  • The AWK Programming Language
  • by Alfred V. Aho, Brian W. Kernighan, and Peter J. Weinberger,
  • Addison-Wesley, 1988.
  • ISBN 0-201-07981-X.

The code can be viewed here.


categories: Wp,Apr,2009,Admin

Word Processing in Awk

These pages focus on word processing tools in Awk.


categories: Interpreters,Apr,2009,Admin

Writing Interpreters

These pages focus on language interpreters, written in Awk.


categories: Oo,May,2009,Admin

OO tools in AWK

These pages focus on object-oriented tools in Awk.


categories: Dsl,Mar,2009,Admin

Domain-Specific Langauges

These pages focus on domain-specific languages (a.k.a. "little langauges") written in Awk.

These little languages can range from the simple to the quite intricate. For example, LAWKER contains code for

  • Simple:
    • Graph- a simple ascii graph generator;
    • Markdown- an ultra lightweight HTML markup language;
  • Intricate:
    • Awk++- enables object-oriented programming in Awk;
    • AwkLisp- a fully functioning LISP interpreter, written in Awk.

Interestingly, without comments, the LISP interpreter is only three times longer than the HTML markup language. This comments either on the power of Awk, the regularity of LISP's core semantics, or both.


categories: Sed,Tips,Apr,2009,Admin

Sed-clones (in Awk)

These pages focus on Sed-like stream editors, written in Awk.


categories: Tips,Jul,2009,Admin

Random Numbers in Gawk

(Summarized and extended from a recent discussion at comp.lang.awk.)

Background

A standard idiom in Gawk is to reset the random number generator in a BEGIN block.

BEGIN {srand() }

Sadly, when called with no arguments, this "reseeding" uses time-in-seconds. So if the same "random" task runs multiple times in the same second, it will get the same random number seed.

Houston, We Have a Problem

"Ben" writes:

I have a Gawk script that puts random comments into a file. It is run 3 times in a row in quick succession. I found that seeding the random number generator using gawk did not work because all 3 times it was run was done within the same second (and it uses the time).

I was wondering if anyone could give me some suggestions as to what can be done to get around this problem.

Solution #1: Persistent Memory

Kenny McCormack writes:

When last I ran into this problem, what I did was to save the last value returned by rand() to a file, then on the next run, read that in and use that value as the arg to srand(). Worked well.

(Editor's comment: Kenny's solution does work well but incurs the cost of maintaining and reading/writing that "last value" file.)

Solution #2: Use Bash

Tim Menzies writes:

How about setting the seed using the BASH $RANDOM variable:

gawk -v Seed=$RANDOM --source 'BEGIN { srand(Seed ? Seed : 1) }' 

If referenced multiple times in a second, it always generates a different number.

In the above usage, if we have a seed, use it. Else, no seed so start all "random" at the same place. If you prefer to use the default "seed from time-in-seconds" then use:

BEGIN { if (Seed) { srand(Seed) } else { srand() } }

(Editor's comment: Tim's solution incurs the overhead of additional command-line syntax. However, it does allow the process calling Gawk to control the seed. This is important when trying to, say, debug code by recreating the sequence of random numbers that lead to the bug.)

Solution #3: Query the OS

Thomas Weidenfeller writes:

Is that good enough (random enough) for your task?

BEGIN {
        "od -tu4 -N4 -A n /dev/random" | getline
        srand(0+$0)
}

(Editor's comment: Nice. Thomas' solution reminds us that "Gawk" can access a whole host of operating system facilities.)

Solution #4: Use the Process Id

Aharon Robbins writes:

You could so something like add PROCINFO["pid"] to the value of the time, or use that as the seed.

$ gawk 'BEGIN { srand(systime() + PROCINFO["pid"]); print rand() }'
0.405889
$ gawk 'BEGIN { srand(systime() + PROCINFO["pid"]); print rand() }'
0.671906

(Editor's comment: Aharon's solution is the fastest of all the ones shown here. For example, on Mac OS/X, his solution takes 6ms to run:

$ time gawk 'BEGIN { srand(systime() + PROCINFO["pid"]) }'

real    0m0.006s
user    0m0.002s
sys     0m0.004s

while Thomas' solution is somewhat slower:

$ time gawk 'BEGIN { "od -tu4 -N4 -A n /dev/random" | getline; srand($0+0) }'

real    0m0.039s
user    0m0.004s
sys     0m0.034s

Note that while Aharon's solution is the fastest, it does not let some master process set the seed for the Gawk process (e.g. as in Tim's approach).)

Conclusion

If you want raw speed, use Aharon's approach.

If you want seed control, see Tim's approach.


categories: Contribute,Jan,2009,Admin

How to Contribute

This web site is a front end to a repository of Awk code. The site, and the code, is maintained by the international awk community (which includes you) so there are many ways you can contribute:

Link to this site from your home page

Using this logo, link to http://awk.info:

(By the way, our current logo is pretty lame. Want to contribute a better one? Please, be our guest!)

Improve a Page

Found a Typo? A Rendering Problem? Want to clarify something?

Want to add some links?

See the above instructions.

How to Write Pages for this Site

  1. Write the page.
  2. Test the page by placing it on a publicly readable site, then see if it renders ok.
  3. Email the url of that page to mail@awk.info. Do NOT send the page.

When writing a page, please follow these guidelines:

  • Do not use <hr> tags: these are reserved for dividing pages in a multi-page view.
  • Use only one <h1> tag at the top of page. Everything else should <h2> or below.
  • Try to avoid using tricky CSS/HTML styling tricks. Vanilla HMTL is best.
  • The page you write will end up being rendered as the middle pane of this site (around 550 pixels wide). So don't write wide pages.
  • If you include code samples, note that our CSS wraps pre-formatted code if it gets too wide. For example, at the time of this writing, the following pre-formatted texts gets ugly after about 75 characters:
          1         2         3         4         5         6         7
012345678901234567890123456789012345678901234567890123456789012345678901234567890

Contributing Code

To contribute code, zip up the directory and mail it to

Coding Standards

All function and file names are global to our code so please ensure your new function/file name does not clobber an old one.

Optionally, you might considering adding:

Add a Library Function Files

In the language of this site, a function file is a 100% standalone file containing one or more functions with no dependancies on other files. Note that if your function file depends on other files, then it becomes a package (see below).

Functions are stored in a file caled myfunc.awk.

Add a Package

In the language of this site, a package is a file that depends on other files (and the other files may depend on yet others, recursively).

Following a recent discussion in comp.lang.awk, we say that these dependancies are commented with

#use file.awk 

where file.awk is some file (e.g. a file in the current directory).

Note that : file.awk will be loaded before the file containing the reference to #use file.awk.


categories: Learn,Jan,2009,Admin

Learning Awk

Short Overviews

The following list is sorted by newbie-ness (so best to start at the top):

Longer Tutorials

The following list is sorted by the number of times this material is tagged at delicious.com (most tagged at top):

Other Stuff


categories: OneLiners,Learn,Jan,2009,Admin

Awk one-liners

Awk is famous for how much it can do in one line.

This site has many samples of that capability. And if you have any more to add, please send them in.


categories: OneLiners,Learn,Jan,2009,Admin

Explaining Pemet's One Liners

Peteris Krumins explaining Eric Pement's Awk one-liners:


categories: TenLiners,Learn,Jan,2009,Admin

Awk ten-liners

Awk is famous for how much it can do in (around) 101 lines. Here are some samples of that capability.

(And if you have any more to add, please send them in.)


categories: Arrays,Function,Feb,2009,Admin

array

Synopsis

arrray(a)

Description

Ensure that an array is empty

Arguments

a
input array

Example

gawk/array/eg/array »

gawk -f array.awk --source '
BEGIN { array(A);
        A[1]=2;
	print length(A);
	array(A);
	print length(A);
}'

gawk/array/eg/array.out »

1
0

Source

function array(a) { split("",a,"") }

categories: Tools,Nov,2009,Admin

Columnate

Contents

Synopsis

Download

About

Code

Author

Synopsis

#e.g.
gawk -F: -f columnate.awk /etc/passwd

Download

Download from LAWKER.

About

This script columnates the input file, so that columns line up like in the GNU column(1) command. Its output is like that of column -t. First, awk reads the whole file, keeps track of the maximum width of each field, and saves all the lines/records. At the END, the lines are printed in columnated format. If your terminal is not too narrow, you'll get a handsome display of the file.

Code

{   line[NR] = $0    # saves the line
    for (f=1; f<=NF; f++) {
        len = length($f)
        if (len>max[f])
            max[f] = len }  # an array of maximum field widths
}
END {
    for(nr=1; nr<=NR; nr++) {
        nf = split(line[nr], fields)
        for (f=1; f<nf; f++)
            printf "%-*s", max[f]+2, fields[f]
        print fields[f] }     # the last field need not be padded
}

Author

h-67-101-152-180.nycmny83.dynamic.covad.net


categories: ,Music,Tools,June,2009,Admin

Music and Awk

These pages focus on muic players and music analysis tools in Awk.


categories: Project,Tools,Mar,2009,Admin

Project Tools

These pages focus on tools for larger Gawk programs; e.g. ways to load multiple files or auto-generate documentation straight from the source code.


categories: Ps,Apr,2009,Admin

Postscript Tricks

These pages focus on postscript tricks, written in Awk.


categories: Os,Apr,2009,Admin

Awk and Operating Systems

These pages focus on Awk and operating systems.


categories: XML,Apr,2009,Admin

XML

These pages focus on XML tools and Awk.


categories: Sept,2009,Admin

Ethiopian Multiplication

Here is some Awk code from the Rosetta Code wiki hat multiplyes integers using only addition, doubling, and halving.

How?

  1. Take two numbers to be multiplied and write them down at the top of two columns.
  2. In the left-hand column repeatedly halve the last number, discarding any remainders, and write the result below the last in the same column, until you write a value of 1.
  3. In the right-hand column repeatedly double the last number and write the result below. stop when you add a result in the same row as where the left hand column shows 1.
  4. Examine the table produced and discard any row where the value in the left column is even.
  5. Sum the values in the right-hand column that remain to produce the result of multiplying the original two numbers together

For example: 17 X 34

       17    34
Halving the first column:
       17    34
        8
        4
        2
        1
Doubling the second column:
       17    34
        8    68
        4   136 
        2   272
        1   544
Strike-out rows whose first cell is even:
       17    34
        8    -- 
        4   --- 
        2   --- 
        1   544
Sum the remaining numbers in the right-hand column:
       17    34
        8    -- 
        4   --- 
        2   --- 
        1   544
           ====
            578
So 17 multiplied by 34, by the Ethiopian method is 578.

The task is to define three functions/methods/procedures/subroutines:

  1. one to halve an integer,
  2. one to double an integer, and
  3. one to state if an integer is even.

Code

function halve(x)  { return(int(x/2)) }
function double(x) { return(x*2) }
function iseven(x) { return((x%2) == 0) }

function ethiopian(plier, plicand) {
  r = 0
  while(plier >= 1) {
    if ( !iseven(plier) ) {
      r += plicand
    }
    plier = halve(plier)
    plicand = double(plicand)
  }
  return(r)
}

BEGIN { print ethiopian(17, 34) }

categories: Sept,2009,Admin

A Tale of Two TAWKs

In the Awk-verse, there are two TAWKs.

TAWK #1 is the TAWK Compiler from Thompson Automation Software (no longer trading)

  • Is 100% compatible with Awk.
  • Generates executable
  • Comes with an interactive debugger
  • In some test cases, code written 4 to 15 times faster runs as fast as "C", or better.

TAWK #2 was a ultra-cut down version of AWK written in C++ by Bruce Eckel in 1989. Eckel writes:

  • The program is called TAWK for "tiny awk," since the problem it solves is vaguely reminiscent of the "awk" pattern-matching language found on Unix (versions have also been created for DOS).
  • It demonstrates one of the thornier problems in computer science: parsing and executing a programming language.
  • The data-encapsulation features of C++ prove most useful here, and a recursive-descent technique is used to read arbitrarily long fields and records.

categories: TextMining,Mar,2009,Admin

Text Mining

Some of the code at awk.info is somewhat historical in nature. For example, Scott Pakin's gender predictor was written in 1991. Given that, it might be mistakenly concluded that Awk is somehow old-fashioned and not suitable for modern tasks.

Text mining, on the other hand, could be the killer app for Awk in the 21st century. The language excels at creating one-off reports that handle the quirks of a particular file format.

There is a growing interest in using Awk for this kind of work. All the examples presented below come from work conducted in 2007, 2008:

Why Text Mining?

If we could properly understand unstructured text, this would be a result of tremendous practical importance. A recent study concluded that:

  • 80 percent of business is conducted on unstructured information;
  • 85 percent of all data stored is held in an unstructured format;
  • Unstructured data doubles every three months;

That is, if we can tame the text mining problem, it would be possible to reason and learn from a much wider range of business data than ever before.

Results (with Awk)

Note that, in the Menzies/Marcus and Schmitt/Christianson tool kits, Awk by itself was not enough. The two data mining toolkits mentioned above were all intricate combinations of Awk and sed and bash and etc end etc. Within that combination, Awk was very useful for handling the specifics not managed by the other tools.


categories: Mail,Apr,2009,Admin

Awk and Mail

These pages focused on using Awk to implement filters on Unix mail files.


categories: Engineering,June,2009,Admin

Awk for Engineering

These pages focused on using Awk for analysis in engineering domains.

blog comments powered by Disqus