About awk.info
» table of contents
» featured topics
» page tags
|
|
|
|
|
|
Mar 01: Michael Sanders demos an X-windows GUI for AWK.
Mar 01: Awk100#24: A. Lahm and E. de Rinaldis' patent search, in AWK
Feb 28: Tim Menzies asks this community to write an AWK cookbook.
Feb 28: Arnold Robbins announces a new debugger for GAWK.
Feb 28: Awk100#23: Premysl Janouch offers a IRC bot, In AWK
Feb 28: Updated: the AWK FAQ
Feb 28: Tim Menzies offers a tiny content management system, in Awk.
Jan 31: Comment system added to awk.info. For example, see discussion bottom of ?keys2awk
Jan 31: Martin Cohen shows that Gawk can handle massively long strings (300 million characters).
Jan 31: The AWK FAQ is being updated. For comments/ corrections/ extensions, please mail tim@menzies.us
Jan 31: Martin Cohen finds Awk on the Android platform.
Jan 31: Aleksey Cheusov released a new version of runawk.
Jan 31: Hirofumi Saito contributes a candidate Awk mascot.
Jan 31: Michael Sanders shows how to quickly build an AWK GUI for windows.
Jan 31: Hyung-Hwan Chung offers QSE, an embeddable Awk Interpreter.
These pages are grouped into the topics, listed below (latest one shown first):
;
210 pages.
Awk is being used all around the world for real programming problems, but the news is not getting out.
We are aiming to create a database of at least one hundred Awk programs which will:
If you, or your colleagues or friends have written a program which has been used for purposes small or large, why not take five minutes to record the facts, so that others can see what you've done?
To contribute, fill in this template and mail it to mail@awk.info with the subject line Awk 100 contribution.
(Recent additions are shown first.)
These pages focus on sys admin tools in Awk.
The Awk.info Top 10 pages highlights the "best" (most impressive, most insightful, most fun, most visited) pages on this site.
Awk.info is maintained by the international awk community. There are many ways you can contribute and get listed below.
Some must lead, some must follow, and some have to fix the typos.
A Great Auk is someone with write permission to our repository. Since the source for this web site is stored in that repoistory, it also means that they are webmasters of this site. So they (try) to:
If you want to be a Great Auk, please start contributing to this site using any of the usual methods. Once it is clear that you know what you are doing and that you play nice with others, then you should ask a current Great Auk to nominate you. Then, all the current Great Auks will vote about giving your write access.
The current Great Auks are
"Because easy is not wrong." - Anon
From various sources:
Quotes:
From Project Management Advice:
From Awk programming:
From Awk as a Major Systems Programming Language:
According to Ramesh Natarajan:
From the NoSQL pages:
To join our community, consider contributing to this site.
For a list of authors of this site, see our credits pages.
The Awk Wiki.
USENET discussion group: comp.lang.awk.
For discussions on Awk, see the Awk discussion group.
For comments/ complaints/ corrections/ extensions to this site, contact mail@awk.info.
Awk is a stable, cross platform computer language named for its
authors
Alfred Aho,
Peter Weinberger &
Brian Kernighan. They write:
"Awk is a convenient and expressive programming language that can be
applied to a wide variety of computing and data-manipulation tasks".
In Classic Shell Scripting, Arnold Robbins & Nelson Beebe confess their Awk bias: "We like it. A lot. The simplicity and power of Awk often make it just the right tool for the job."
Besides the Bourne shell, Awk is the only other scripting language available in the standard Unix environment. Implementations of AWK exist as installed software for almost all other operating systems.
Awk is a mature language- it was first implemented in the 1970s. As a tool from the golden age, it is sometimes called primitive. It is more accurate to call it elemental, so tightly focused is the language on what it does best: quickly converting this into that.
Consequently, throughout history, Awk has been the language of choice for many famous scientists such as Leonardo daVinci.
|
|
LAWKER is a repository of Awk code divided into:
See How to Contribute.
Use our issue tracking system.
Many communities have a mascot, a banner that they proudly wave high. So where's the Awk mascot?
I made on up, but you gotta say, it is kinda lame:
So you have any ideas for such a mascot, please email mail@awk.info with the subject line "suggestion for mascot".
Not to stiffle anyone's creativity but the mascot might be based on the mantra "less, but better" or "easy is not wrong" or "a little awk goes a long way".
Chris writes "more of a logo rather than a mascot":
These pages focus on program verification tools, written in Awk.
These pages focus on databases and Awk.
These pages focus on games, written in Awk.
Nov 28, 2009
This site is moving up the page rankings:
Other indicators also look good. Since the site was launched (Feb 15, 2009), the number of visits has been steadily increasing:
These 19,268 visits come from 2,765 cities:
(BTW: Anyone got any ideas why these cities visit here so often?)
In other news, Website Outlook reports that:
To put that report in perspective, the same source notes that:
Brian Kernighan has granted permission for this site to host the code from the original Awk book:
The code can be viewed here.
These pages focus on word processing tools in Awk.
These pages focus on language interpreters, written in Awk.
These pages focus on object-oriented tools in Awk.
These pages focus on domain-specific languages (a.k.a. "little langauges") written in Awk.
These little languages can range from the simple to the quite intricate. For example, LAWKER contains code for
Interestingly, without comments, the LISP interpreter is only three times longer than the HTML markup language. This comments either on the power of Awk, the regularity of LISP's core semantics, or both.
These pages focus on Sed-like stream editors, written in Awk.
(Summarized and extended from a recent discussion at comp.lang.awk.)
A standard idiom in Gawk is to reset the random number generator in a BEGIN block.
BEGIN {srand() }
Sadly, when called with no arguments, this "reseeding" uses time-in-seconds. So if the same "random" task runs multiple times in the same second, it will get the same random number seed.
"Ben" writes:
I have a Gawk script that puts random comments into a file. It is run 3 times in a row in quick succession. I found that seeding the random number generator using gawk did not work because all 3 times it was run was done within the same second (and it uses the time).
I was wondering if anyone could give me some suggestions as to what can be done to get around this problem.
Kenny McCormack writes:
When last I ran into this problem, what I did was to save the last value returned by rand() to a file, then on the next run, read that in and use that value as the arg to srand(). Worked well.
(Editor's comment: Kenny's solution does work well but incurs the cost of maintaining and reading/writing that "last value" file.)
Tim Menzies writes:
How about setting the seed using the BASH $RANDOM variable:
gawk -v Seed=$RANDOM --source 'BEGIN { srand(Seed ? Seed : 1) }'
If referenced multiple times in a second, it always generates a different number.
In the above usage, if we have a seed, use it. Else, no seed so start all "random" at the same place. If you prefer to use the default "seed from time-in-seconds" then use:
BEGIN { if (Seed) { srand(Seed) } else { srand() } }
(Editor's comment: Tim's solution incurs the overhead of additional command-line syntax. However, it does allow the process calling Gawk to control the seed. This is important when trying to, say, debug code by recreating the sequence of random numbers that lead to the bug.)
Thomas Weidenfeller writes:
Is that good enough (random enough) for your task?
BEGIN {
"od -tu4 -N4 -A n /dev/random" | getline
srand(0+$0)
}
(Editor's comment: Nice. Thomas' solution reminds us that "Gawk" can access a whole host of operating system facilities.)
Aharon Robbins writes:
You could so something like add PROCINFO["pid"] to the value of the time, or use that as the seed.
$ gawk 'BEGIN { srand(systime() + PROCINFO["pid"]); print rand() }'
0.405889
$ gawk 'BEGIN { srand(systime() + PROCINFO["pid"]); print rand() }'
0.671906
(Editor's comment: Aharon's solution is the fastest of all the ones shown here. For example, on Mac OS/X, his solution takes 6ms to run:
$ time gawk 'BEGIN { srand(systime() + PROCINFO["pid"]) }'
real 0m0.006s
user 0m0.002s
sys 0m0.004s
while Thomas' solution is somewhat slower:
$ time gawk 'BEGIN { "od -tu4 -N4 -A n /dev/random" | getline; srand($0+0) }'
real 0m0.039s
user 0m0.004s
sys 0m0.034s
Note that while Aharon's solution is the fastest, it does not let some master process set the seed for the Gawk process (e.g. as in Tim's approach).)
If you want raw speed, use Aharon's approach.
If you want seed control, see Tim's approach.
This web site is a front end to a repository of Awk code. The site, and the code, is maintained by the international awk community (which includes you) so there are many ways you can contribute:
Using this logo, link to http://awk.info:
(By the way, our current logo is pretty lame. Want to contribute a better one? Please, be our guest!)
When writing a page, please follow these guidelines:
1 2 3 4 5 6 7
012345678901234567890123456789012345678901234567890123456789012345678901234567890
To contribute code, zip up the directory and mail it to
All function and file names are global to our code so please ensure your new function/file name does not clobber an old one.
Optionally, you might considering adding:
In the language of this site, a function file is a 100% standalone file containing one or more functions with no dependancies on other files. Note that if your function file depends on other files, then it becomes a package (see below).
Functions are stored in a file caled myfunc.awk.
In the language of this site, a package is a file that depends on other files (and the other files may depend on yet others, recursively).
Following a recent discussion in comp.lang.awk, we say that these dependancies are commented with
#use file.awk
where file.awk is some file (e.g. a file in the current directory).
Note that : file.awk will be loaded before the file containing the reference to #use file.awk.
The following list is sorted by newbie-ness (so best to start at the top):
The following list is sorted by the number of times this material is tagged at delicious.com (most tagged at top):
Awk is famous for how much it can do in one line.
This site has many samples of that capability. And if you have any more to add, please send them in.
Peteris Krumins explaining Eric Pement's Awk one-liners:
Awk is famous for how much it can do in (around) 101 lines. Here are some samples of that capability.
(And if you have any more to add, please send them in.)
arrray(a)
Ensure that an array is empty
gawk -f array.awk --source '
BEGIN { array(A);
A[1]=2;
print length(A);
array(A);
print length(A);
}'
1 0
function array(a) { split("",a,"") }
#e.g. gawk -F: -f columnate.awk /etc/passwd
{ line[NR] = $0 # saves the line
for (f=1; f<=NF; f++) {
len = length($f)
if (len>max[f])
max[f] = len } # an array of maximum field widths
}
END {
for(nr=1; nr<=NR; nr++) {
nf = split(line[nr], fields)
for (f=1; f<nf; f++)
printf "%-*s", max[f]+2, fields[f]
print fields[f] } # the last field need not be padded
}
h-67-101-152-180.nycmny83.dynamic.covad.net
These pages focus on muic players and music analysis tools in Awk.
These pages focus on tools for larger Gawk programs; e.g. ways to load multiple files or auto-generate documentation straight from the source code.
These pages focus on postscript tricks, written in Awk.
These pages focus on Awk and operating systems.
These pages focus on XML tools and Awk.
Here is some Awk code from the Rosetta Code wiki hat multiplyes integers using only addition, doubling, and halving.
For example: 17 X 34
17 34
Halving the first column:
17 34
8
4
2
1
Doubling the second column:
17 34
8 68
4 136
2 272
1 544
Strike-out rows whose first cell is even:
17 34
8 --
4 ---
2 ---
1 544
Sum the remaining numbers in the right-hand column:
17 34
8 --
4 ---
2 ---
1 544
====
578
So 17 multiplied by 34, by the Ethiopian method is 578.
The task is to define three functions/methods/procedures/subroutines:
function halve(x) { return(int(x/2)) }
function double(x) { return(x*2) }
function iseven(x) { return((x%2) == 0) }
function ethiopian(plier, plicand) {
r = 0
while(plier >= 1) {
if ( !iseven(plier) ) {
r += plicand
}
plier = halve(plier)
plicand = double(plicand)
}
return(r)
}
BEGIN { print ethiopian(17, 34) }
In the Awk-verse, there are two TAWKs.
TAWK #1 is the TAWK Compiler from Thompson Automation Software (no longer trading)
TAWK #2 was a ultra-cut down version of AWK written in C++ by Bruce Eckel in 1989. Eckel writes:
Some of the code at awk.info is somewhat historical in nature. For example, Scott Pakin's gender predictor was written in 1991. Given that, it might be mistakenly concluded that Awk is somehow old-fashioned and not suitable for modern tasks.
Text mining, on the other hand, could be the killer app for Awk in the 21st century. The language excels at creating one-off reports that handle the quirks of a particular file format.
There is a growing interest in using Awk for this kind of work. All the examples presented below come from work conducted in 2007, 2008:
If we could properly understand unstructured text, this would be a result of tremendous practical importance. A recent study concluded that:
That is, if we can tame the text mining problem, it would be possible to reason and learn from a much wider range of business data than ever before.
Note that, in the Menzies/Marcus and Schmitt/Christianson tool kits, Awk by itself was not enough. The two data mining toolkits mentioned above were all intricate combinations of Awk and sed and bash and etc end etc. Within that combination, Awk was very useful for handling the specifics not managed by the other tools.
These pages focused on using Awk to implement filters on Unix mail files.
These pages focused on using Awk for analysis in engineering domains.
blog comments powered by Disqus