"Cause a little auk awk
goes a long way."

 »  table of contents
 »  featured topics
 »  page tags

About Awk
 »  advocacy
 »  learning
 »  history
 »  Wikipedia entry
 »  mascot
 »  Awk (rarely used)
 »  Nawk (the-one-true, old)
 »  Gawk (widely used)
 »  Mawk
 »  Xgawk (gawk + xml + ...)
 »  Spawk (SQL + awk)
 »  Jawk (Awk in Java JVM)
 »  QTawk (extensions to gawk)
 »  Runawk (a runtime tool)
 »  platform support
 »  one-liners
 »  ten-liners
 »  tips
 »  the Awk 100
 »  read our blog
 »  read/write the awk wiki
 »  discussion news group

 »  Gawk
 »  Xgawk
 »  the Lawker library
Online doc
 »  reference card
 »  cheat sheet
 »  manual pages
 »  FAQ

 »  articles
 »  books:


Mar 01: Michael Sanders demos an X-windows GUI for AWK.

Mar 01: Awk100#24: A. Lahm and E. de Rinaldis' patent search, in AWK

Feb 28: Tim Menzies asks this community to write an AWK cookbook.

Feb 28: Arnold Robbins announces a new debugger for GAWK.

Feb 28: Awk100#23: Premysl Janouch offers a IRC bot, In AWK

Feb 28: Updated: the AWK FAQ

Feb 28: Tim Menzies offers a tiny content management system, in Awk.

Jan 31: Comment system added to For example, see discussion bottom of ?keys2awk

Jan 31: Martin Cohen shows that Gawk can handle massively long strings (300 million characters).

Jan 31: The AWK FAQ is being updated. For comments/ corrections/ extensions, please mail

Jan 31: Martin Cohen finds Awk on the Android platform.

Jan 31: Aleksey Cheusov released a new version of runawk.

Jan 31: Hirofumi Saito contributes a candidate Awk mascot.

Jan 31: Michael Sanders shows how to quickly build an AWK GUI for windows.

Jan 31: Hyung-Hwan Chung offers QSE, an embeddable Awk Interpreter.

[More ...]

Bookmark and Share

categories: TextMining,Mar,2009,LotharS

Awk and Sed for Language Analysis


Lothar M. Schmitt and Kiel T. Christianson:


The authors show how to construct tools for language analysis in research and teaching using the Awk, the Bourne-shell, and sed under UNIX. Applications include the following:
  • searches for words, phrases, grammatical patterns and phonemic patterns in text;
  • statistical evaluation of texts in regard to such searches;
  • transformation of phonetic, phonemic or typographic transcriptions;
  • comparison of texts in various respects;
  • lexical-etymological analysis;
  • concordance;
  • assistance in translating text;
  • assistance in learning languages;
  • assistance in teaching languages;
  • and text processing and formatting. This latter includes the generation of on-line dictionaries for the Internet from files that were generated with what-you-see-is-what-you-get editors representing only the linear structure of the dictionary (i.e., the book).
All of the above can be achieved with particularly simple and short code. In that regard, they illustrate how sed and awk can be combined in the pipe mechanism of UNIX to create very powerful processing devices.

Their notes include a short introduction to programming the Bourne-shell and rather short, but complete descriptions of sed and awk customized in regard to language analysis.

blog comments powered by Disqus