Mar 01: Michael Sanders demos an X-windows GUI for AWK.
Mar 01: Awk100#24: A. Lahm and E. de Rinaldis' patent search, in AWK
Feb 28: Tim Menzies asks this community to write an AWK cookbook.
Feb 28: Arnold Robbins announces a new debugger for GAWK.
Feb 28: Awk100#23: Premysl Janouch offers a IRC bot, In AWK
Feb 28: Updated: the AWK FAQ
Feb 28: Tim Menzies offers a tiny content management system, in Awk.
Jan 31: Comment system added to awk.info. For example, see discussion bottom of ?keys2awk
Jan 31: Martin Cohen shows that Gawk can handle massively long strings (300 million characters).
Jan 31: The AWK FAQ is being updated. For comments/ corrections/ extensions, please mail email@example.com
Jan 31: Martin Cohen finds Awk on the Android platform.
Jan 31: Aleksey Cheusov released a new version of runawk.
Jan 31: Hirofumi Saito contributes a candidate Awk mascot.
Jan 31: Michael Sanders shows how to quickly build an AWK GUI for windows.
Jan 31: Hyung-Hwan Chung offers QSE, an embeddable Awk Interpreter.
Aharon Robbins, the maintainer for GNU Awk maintainer, answers some questions from Tim Menzies.
Q: What is your favorite programming language (besides gawk)? And why?
A: It depends for what. A long time ago I was a big Korn shell junkie, although these days I would do most high level things in a mixture of bash and awk, with awk doing the heavy lifting.
For lower level things I prefer C++, although I have something of a love/hate relationship with the language. It's possible to write completely unreadable and unmaintainable code in it. It's also possible to write beautiful, clear, absolutely amazing code in it.
I find that going back to C after working daily in C++ is hard, although I do it for gawk maintenance. For new programs I would work in C++, not C. For something big, I'd use the Qt framework for support and portability.
I've been recently living in the C# world for my day job. The development environment is very addictive, but C# hasn't seduced me away from C++.
Q: The open source world is a fascinating development paradigm. I'm therefore very curious to know what prompted you to write gawk?
A: I didn't write it from scratch. I got involved shortly after picking up and reading the Aho, Weinberger & Kernighan book in late 1987 when it came out.
New awk wasn't widely available. I had been involved with USENET since around 1983, and knew about the GNU project. I also had a strong interest in compilers and interpreters, so I got in touch with the GNU project to see if they had an awk clone and to see if I could get involved in upgrading it to "new" awk.
It turned out that they already had a volunteer, David Trueman, who was working on it, but he was happy to have help. He and I worked together until circa 1993 or 1994 when he had to stop being involved, and I became the sole maintainer.
It was a lot of fun. The number of emails of the "I could not get my work done without gawk" sort was amazing; Unix awk would often roll over and die on some of the data sets people were running though gawk.
Things really got shaken down when gawk became part of GNU/Linux distributions; then people were using it as the only awk, instead of alongside Unix awk.
Q: In retrospect, what are the best/worst features of gawk?
A: The best feature is the pattern/action paradigm. The implicit read-a-record loop is wonderful. This is the language's data-driven nature, as opposed to the imperative nature of most languages.
Associative arrays rank second; they are quite powerful.
There are some warts inherited from Unix awk and left unspecified by POSIX. These are relatively minor.
The lack of an explicit concatenation operator is an obvious one.
The lack of real multi-dimensional arrays is another.
There are features just in gawk that in retrospect seem to have been a waste of time, such as bringing out to the awk level the possibility to internationalize a program. I don't think anyone uses that.
IGNORECASE was a huge pain to get right; if I'd known how long it would take, I wouldn't have bothered.
The biggest "lack" is that there isn't an easy, standard way to provide extensibility; there are way too many things in the C library today (and even yesterday) that the awk programmer just can't get to. (Like the chdir system call!) I hope to eventually provide some better mechanisms for this, but I don't know how much actual filling in I can do also.
Q: Under what circumstances would you recommend/not recommend it?
A: Gawk is good for small to medium level programs that have to process text and/or do simple numeric work (summing up columns, averaging, VERY simple statistics work). It has a central place in traditional Unix / Linux shell scripting when portability is a must.
But I wouldn't care to try to write a military air traffic command and control system in gawk, for example. :-)
Q: Gawk has a reputation of being slow...
A: "Slow" compared to what? As far as I've seen, gawk is always faster than Unix awk. Michael Brennan's mawk is even faster, but until recently it has been unmaintained, and it lacks many important, modern features.
Relative to C? Of course. So what? You have to write 5 - 10 times as much C as you do awk to do the same or less. (I remember one program I wrote in C at around 1200 lines and rewrote in under 300 lines of awk, and the awk was clearer and did more.)
Relative to perl? It depends. I have had emails telling me that gawk was faster than perl for what the users were doing. And if not, do I care? Not really - perl is a write-only language, and don't get me started on Perl 6. :-)
All that said, this got me to thinking about a possible bottleneck that I'll be investigating in the near future.
Q: Awk also has a reputation of not being suitable for "real" projects. Is that reputation deserved?
A: I don't think that contention is true: it may be that scripting languages in general have such a reputation - Ronald Loui has written about this, but I don't think the contention is true for scripting languages either.
As is always the case, the answer is "it depends". What is the scale of what you're trying to do? Who is the customer? When Rick Adams was still running UUNET, he used a suite of awk programs to do his accounting. That's as "real" a project as you can get: billing your (hundreds or thousands of) customers for their resource usage. And he used gawk, since Unix awk would just roll over and die. (Unix awk has gotten better as a result of the "competition", but that's a different story. :-)
Q: Are you aware of any landmark projects that use gawk?
A: GNU/Linux. :-)
Not really. Gawk "just works", and that in and of itself is a testimony to its quality and value.
Q: Looking a decade into the future, can you see gawk disappearing? Why (not)?
A: I don't think so. The bigger question is will I still be involved with it 10 years from now? I don't know.
I still have some things I'd like to see happen with it that are interesting and valuable and may even end up being relatively unique. I just have to find the time (or some other volunteers :-) to work on them.
Q: Currently, how are you filling your time?
A: I have a full time job as a software engineer with Intel. I have a wife and four wonderful children, as well as a dog. That's enough right there to keep me busy.
I am the series editor for the Prentice Hall Open Source Software Development Series which also takes some of my time.
And I still try to do some gawk work in between everything else!