Awk.Info

"Cause a little auk awk
goes a long way."

About awk.info
 »  table of contents
 »  featured topics
 »  page tags


About Awk
 »  advocacy
 »  learning
 »  history
 »  Wikipedia entry
 »  mascot
Implementations
 »  Awk (rarely used)
 »  Nawk (the-one-true, old)
 »  Gawk (widely used)
 »  Mawk
 »  Xgawk (gawk + xml + ...)
 »  Spawk (SQL + awk)
 »  Jawk (Awk in Java JVM)
 »  QTawk (extensions to gawk)
 »  Runawk (a runtime tool)
 »  platform support
Coding
 »  one-liners
 »  ten-liners
 »  tips
 »  the Awk 100
Community
 »  read our blog
 »  read/write the awk wiki
 »  discussion news group

Libraries
 »  Gawk
 »  Xgawk
 »  the Lawker library
Online doc
 »  reference card
 »  cheat sheet
 »  manual pages
 »  FAQ

Reading
 »  articles
 »  books:

WHAT'S NEW?

Mar 01: Michael Sanders demos an X-windows GUI for AWK.

Mar 01: Awk100#24: A. Lahm and E. de Rinaldis' patent search, in AWK

Feb 28: Tim Menzies asks this community to write an AWK cookbook.

Feb 28: Arnold Robbins announces a new debugger for GAWK.

Feb 28: Awk100#23: Premysl Janouch offers a IRC bot, In AWK

Feb 28: Updated: the AWK FAQ

Feb 28: Tim Menzies offers a tiny content management system, in Awk.

Jan 31: Comment system added to awk.info. For example, see discussion bottom of ?keys2awk

Jan 31: Martin Cohen shows that Gawk can handle massively long strings (300 million characters).

Jan 31: The AWK FAQ is being updated. For comments/ corrections/ extensions, please mail tim@menzies.us

Jan 31: Martin Cohen finds Awk on the Android platform.

Jan 31: Aleksey Cheusov released a new version of runawk.

Jan 31: Hirofumi Saito contributes a candidate Awk mascot.

Jan 31: Michael Sanders shows how to quickly build an AWK GUI for windows.

Jan 31: Hyung-Hwan Chung offers QSE, an embeddable Awk Interpreter.

[More ...]

Bookmark and Share

categories: Top10,Boris,Awk100,Feb,2009,Ronl

Boris

Purpose

Demonstration to DoD of a clustering algorithm suitable for streaming data.

Source code

gawk/awk100/boris

Live demo

http://www.cse.wustl.edu/~loui/boris.cgi.

Developers

Ronald Loui and a programmer named Boris.

Organization

Washington University in St. Louis, CS Dept.

Country

USA

Domain

This is an evolutionary algorithm and visualization of a clustering algorithm that could be turned from O(n^4) to O(nlogn) with a few judicious uses of constants. Later developments added other interactive devices, including progress meters and mouse-and-click behavior.

Contact

Ronald Loui

Email

r.p.loui@gmail.com

Description

The code is an excellent example of the power of Awk as a prototyping tool: after getting the code running, with the least development time, a quirk was observed in the code that allowed a reduction from O(n^4) to O(nlogn).

  • Two of the n's are lost (n^2) by noticing that when there is a swap, the delta in the scoring function falls off by the squared distance from the point of a swap. So if you just set a constant, such as 10 or 20, or 100, based on the expected size of your clusters, then you can stop calculating the scoring function when you get past that constant.
  • The other n comes from either fixing the size of the matrix, and occasionally flushing new candidates in and out, or else by sampling over a subset of the n when you calculate the score.
  • The nlogn remains because there is a sort every now and then.

Awk

Gawk

Platform

Intended for fast servers, 1+ ghz.

Uses

Html.

Lines

158.

Development Effort

One weekend.

Maintenance Effort

None.

Current

2=Evaluation.

Use

2=in-House use.

Users

5

DateDeployed

2004.

Dated

Feb 2009.

References

Streaming Hierarchical Clustering for Concept Mining Looks, M.; Levine, A.; Covington, G.A.; Loui, R.P.; Lockwood, J.W.; Cho, Y.H. Aerospace Conference, 2007 IEEE Volume , Issue , 3-10 March 2007 Page(s):1 - 12 Digital Object Identifier 10.1109/AERO.2007.352792

blog comments powered by Disqus