Awk.Info

"Cause a little auk awk
goes a long way."

About awk.info
 »  table of contents
 »  featured topics
 »  page tags


About Awk
 »  advocacy
 »  learning
 »  history
 »  Wikipedia entry
 »  mascot
Implementations
 »  Awk (rarely used)
 »  Nawk (the-one-true, old)
 »  Gawk (widely used)
 »  Mawk
 »  Xgawk (gawk + xml + ...)
 »  Spawk (SQL + awk)
 »  Jawk (Awk in Java JVM)
 »  QTawk (extensions to gawk)
 »  Runawk (a runtime tool)
 »  platform support
Coding
 »  one-liners
 »  ten-liners
 »  tips
 »  the Awk 100
Community
 »  read our blog
 »  read/write the awk wiki
 »  discussion news group

Libraries
 »  Gawk
 »  Xgawk
 »  the Lawker library
Online doc
 »  reference card
 »  cheat sheet
 »  manual pages
 »  FAQ

Reading
 »  articles
 »  books:

WHAT'S NEW?

Mar 01: Michael Sanders demos an X-windows GUI for AWK.

Mar 01: Awk100#24: A. Lahm and E. de Rinaldis' patent search, in AWK

Feb 28: Tim Menzies asks this community to write an AWK cookbook.

Feb 28: Arnold Robbins announces a new debugger for GAWK.

Feb 28: Awk100#23: Premysl Janouch offers a IRC bot, In AWK

Feb 28: Updated: the AWK FAQ

Feb 28: Tim Menzies offers a tiny content management system, in Awk.

Jan 31: Comment system added to awk.info. For example, see discussion bottom of ?keys2awk

Jan 31: Martin Cohen shows that Gawk can handle massively long strings (300 million characters).

Jan 31: The AWK FAQ is being updated. For comments/ corrections/ extensions, please mail tim@menzies.us

Jan 31: Martin Cohen finds Awk on the Android platform.

Jan 31: Aleksey Cheusov released a new version of runawk.

Jan 31: Hirofumi Saito contributes a candidate Awk mascot.

Jan 31: Michael Sanders shows how to quickly build an AWK GUI for windows.

Jan 31: Hyung-Hwan Chung offers QSE, an embeddable Awk Interpreter.

[More ...]

Bookmark and Share

categories: Games,Top10,TenLiners,Mar,2009,BrianK

Story.awk

Contents

Synopsis

echo Goal | gawk -f story.awk [ -v Grammar=FILE ] [ -v Seed=NUMBER ] 
echo Goal | gawk -f storyp.awk [ -v Grammar=FILE ] [ -v Seed=NUMBER ] 

Download

Download from LAWKER.

Description

This code inputs a set of productions and outputs a string of words that satisfy the production rules.

This page describes two versions of that system: story.awk and storyp.awk. The former selects productions at random with equal probability. The latter allows the user to bias the selection by adding weights at the end of line, after each production.

Options

-v Grammar=FILE
Sets the FILE containing the productions. Defaults to "grammar".
-v Seed=NUM
Sets the seed for the random number generator. Defaults to "1". A useful idiom for generating random text is to use Seed=$RANDOM

Examples

A Short Example

This grammar..

Sentence -> Nounphrase Verbphrase   
Nounphrase -> the boy              
Nounphrase -> the girl           
Verbphrase -> Verb Modlist Adverb 
Verb -> runs                    
Verb -> walks                  
Modlist ->                    
Modlist -> very Modlist      
Adverb -> quickly           
Adverb -> slowly           
... and this input ...
for i in 1 2 3 4 5 6 7 8 9 10;do
	echo Sentence | 
	gawk -f ../story.awk -v Grammar=english.rules -v Seed=$i | 
	fmt
done
... generates these sentences:
the boy runs very slowly
the girl runs slowly
the boy runs very slowly
the girl walks very very quickly
the boy runs quickly
the girl walks very very slowly
the boy walks very very very very very very quickly
the boy walks very quickly
the girl runs slowly
the girl runs very quickly

A Longer Example

Here is Gahan Wilson's sci-fi plot generator ...

Using the above, we can generate the following stories:


 Earth scientists invent giant bugs who want Our Women,  And Take
 A Few And Leave

 Earth is Attacked By tiny lunar superbeings who  Under Stand and
 Are Not radioactive and can not be killed by the Navy but They Die
 From Catching A Cold

 Earth scientists invent enormous bugs who are Friendly and and
 They Get Married And Live Happily Forever After

 Earth is Struck By A Giant cloud and Magically Saved

 Earth scientists invent giant bugs who  Under Stand and Are Not
 radioactive and can not be killed by the Air Force so They Kill
 Us

 Earth is Attacked By enormous extra Galactic blobs who  Under Stand
 and Are Not radioactive and can be killed by the Air Force

 Earth scientists discover enormous blobs who  Under Stand and Are
 Not radioactive and can be killed by a Crowd Of Peasants

 Earth falls Into Sun and  Some  Resuced

 Earth is Struck By A Giant comet but Is Saved

 Earth is Struck By A Giant comet and Is Destroyed

This is generated from the following code:

for i in 1 2 3 4 5 6 7 8 9 10;do
	echo
	echo Start | 
	gawk -f ../story.awk -v Grammar=scifi.rules -v Seed=$i | 
	fmt
done

running on the following grammar:

Start      -> Earth IsStressed
IsStressed -> Catestrophes 
IsStressed -> Science 
IsStressed -> Attack 
IsStressed -> Collision

Catestrophes -> Catestrophe and PossibleMegaDeath

Catestrophe -> burnsUp 
Catestrophe -> freezes
Catestrophe -> fallsIntoSun

Collision -> isStruckByAGiant Floater AndThen

Floater -> comet
Floater -> asteroid
Floater -> cloud

AndThen -> butIsSaved
AndThen -> andIsDestroyed
AndThen -> andMagicallySaved


PossibleMegaDeath -> everybodyDies
PossibleMegaDeath -> Some GoOn 

SomeSaved ->  somePeople
SomeSaved ->  everybody
SomeSaved ->  almostEverybody
  
GoOn -> dies
GoOn -> Resuced
GoOn -> Saved
 
Rescued -> isRescuedBy Sizes Extraterestrial Beings
Saved   -> butIsSavedBy SomeOne scientists the  Science

SomeOne -> earth
SomeOne -> extraterestrial

Science -> scientists DoSomething Sizes Beings Whichetc

DoSomething -> invent
DoSomething -> discover

Attack -> isAttackedBy Sizes Extraterestrial Beings Whichetc

Sizes -> tiny 
Sizes -> giant 
Sizes -> enormous
 
Extraterestrial -> martian
Extraterestrial -> lunar
Extraterestrial -> extraGalactic

Beings -> bugs
Beings -> reptiles
Beings -> blobs
Beings -> superbeings

Whichetc -> who WantSomething

WantSomething -> WantWomen
WantSomething -> areFriendly  and DenoumentOrHappyEnding
WantSomething -> UnderStand ButEtc

Understand -> areFriendly butMisunderstood
Understand -> misunderstandUs
Understand -> understandUsAllTooWell
Understand -> hungry

DenoumentOrHappyEnding -> Denoument
DenoumentOrHappyEnding -> HappyEnding
 
Dine -> Hungry and eat us Denoument?

WhichEtc -> 
Hungry -> lookUponUsAsASourceOfNourishment

WantWomen -> wantOurWomen, AndTakeAFewAndLeave

ButEtc -> AndAre radioactive and TryToKill

AndAre -> andAre
AndAre -> andAreNot

Killers -> Killer 
Killers -> Killer and Killer

Killer -> aCrowdOfPeasants
Killer -> theArmy
Killer -> theNavy
Killer -> theAirForce
Killer -> theMarines
Killer -> theCoastGuard
Killer -> theAtomBomb

TryToKill -> can be killed by Killers
TryToKill -> can not be killed by Killers SoEtc

SoEtc -> butTheyDieFromCatchingACold
SoEtc -> soTheyKillUs
SoEtc -> soTheyPutUsUnderABenignDictatorShip
SoEtc -> soTheyEatUs
SoEtc -> soScientistsInventAWeapon Which
SeEtc -> but Denoument

Which -> whichTurnsThemIntoDisgustingLumps
Which -> whichKillsThem
Which -> whichFails SoEtc

Denomument? ->  
Denomument? -> Denoument  

Denoument ->  aCuteLittleKidConvincesThemPeopleAreOk Ending
Denoument -> aPriestTalksToThemOfGod Ending
Denoument -> theyFallInLoveWithThisBeautifulGirl EndSadOrHappy

EndSadOrHappy -> Ending
EndSadOrHappy -> HappyEnding

Ending -> andTheyDie
Ending -> andTheyLeave
Ending -> andTheyTurnIntoDisgustingLumps

HappyEnding -> andTheyGetMarriedAndLiveHappilyForeverAfter

Biasing the Story

Here is a grammar suitable for storyp.awk. Note that number at end of line that biases how often a production is selected. For example, "runs" and "slowly" are nine times more likely than other Verbs and Adverbs.

Sentence -> Nounphrase Verbphrase   1
Nounphrase -> the boy               0.75
Nounphrase -> the girl              0.25
Verbphrase -> Verb Modlist Adverb   1
Verb -> runs                        0.9
Verb -> walks                       0.1
Modlist ->                          0.5
Modlist -> very Modlist             0.5
Adverb -> quickly                   0.1
Adverb -> slowly                    0.9
The following code executes the biases story generation:
for((i=1;i<=10;i++)); do echo Sentence ;  done |
gawk -f ../storyp.awk -v Grammar=englishp.rules 

This produces the following output. Note that, usually, we run slowly.

the boy runs very slowly 
the boy runs slowly 
the girl runs very slowly 
the boy runs slowly 
the boy runs slowly 
the girl walks very slowly 
the boy walks slowly 
the girl runs slowly 
the boy runs slowly 
the boy runs slowly 

Code

Story.awk

BEGIN { 
    srand(Seed ? Seed : 1) 
	Grammar = Grammar ? Grammar : "grammar"
	while (getline < Grammar > 0)
	    if ($2 == "->") {
		    i = ++lhs[$1]              # count lhs
		    rhscnt[$1, i] = NF-2       # how many in rhs
		    for (j = 3; j <= NF; j++)  # record them
		        rhslist[$1, i, j-2] = $j
	    } else
		     if ($0 !~ /^[ \t]*$/)
        	    print "illegal production: " $0
}
{   if ($1 in lhs) {  # nonterminal to expand
        gen($1)
        printf("\n")
    } else 
        print "unknown nonterminal: " $0   
}
function gen(sym,    i, j) {
    if (sym in lhs) {       # a nonterminal
        i = int(lhs[sym] * rand()) + 1   # random production
        for (j = 1; j <= rhscnt[sym, i]; j++) # expand rhs's
            gen(rhslist[sym, i, j])
    } else {
        gsub(/[A-Z]/," &",sym)
        printf("%s ", sym) }
}

Storyp.awk

Storyp.awk is almost the same as story.awk but it is assumed that each line ends in a number that will bias how often that production gets selected.

BEGIN {
    srand(Seed ? Seed : 1) 
    Grammar = Grammar ? Grammar : "grammar"
    while ((getline < Grammar) > 0)
        if ($2 == "->") {
            i = ++lhs[$1]              # count lhs
            rhsprob[$1, i] = $NF       # 0 <= probability <= 1
            rhscnt[$1, i] = NF-3       # how many in rhs
            for (j = 3; j < NF; j++)   # record them
               rhslist[$1, i, j-2] = $j
        } else
            print "illegal production: " $0
    for (sym in lhs)
         for (i = 2; i <= lhs[sym]; i++)
            rhsprob[sym, i] += rhsprob[sym, i-1]
}
{   if ($1 in lhs) {  # nonterminal to expand
         gen($1)
         printf("\n")
     } else 
         print "unknown nonterminal: " $0   
}
function gen(sym,    i, j) {
    if (sym in lhs) {       # a nonterminal
        j = rand()          # random production
        for (i = 1; i <= lhs[sym] && j > rhsprob[sym, i]; i++) ;       
        for (j = 1; j <= rhscnt[sym, i]; j++) # expand rhs's
            gen(rhslist[sym, i, j])
    } else
        printf("%s ", sym)
}

Author

The code comes from Alfred Aho, Brian Kernighan, and Peter Weinberger from the book "The AWK Programming Language", Addison-Wesley, 1988.

The scifi grammar was written by Tim Menzies, 2009, and is based on Gahan Wilson's sci-fi plot generator: "The Science Fiction Horror Movie Pocket Computer" ( in "The Year's Best Science Fiction No. 5", edited by Harry Harrison and Brian Aldiss, Sphere, London, 1972).


categories: TenLiners,Mar,2009,DonaldM

The Monty Hall Problem

Donald 'Paddy' McCarthy has a nice Awk solution to the Monty Hall Problem, which he describes as follow:

  • The contestant in in front of three doors that he cannot see behind..
  • The three doors conceal one prize and the rest being booby prizes, arranged randomly.
  • The Host asks the contestant to choose a door.
  • The host then goes behind the doors where only he can see what is concealed, then always opens one door, out of the other s not chosen by the contestant, that must reveal a booby prize to the contestant.
  • The host then asks the contestant if he would like either to stick with his previous choice, or switch and choose the other remaining closed door.

It turns out that if the contestant follows a strategy of always switching when asked, then he will maximise his chances of winning. Donald's simulator shows that:

  • A strategy of never switching wins 1/3rd of the time.
  • A strategy of randomly switching wins 1/2 of the time.
  • A strategy of always switching wins 2/3rds of the time.

Code

BEGIN {
	srand()
	doors = 3
	iterations = 10000
	# Behind a door: 
	EMPTY = "empty"; PRIZE = "prize"
	# Algorithm used
    KEEP = "keep"; SWITCH="switch"; RAND="random"; 
}
function monty_hall( choice, algorithm ) { # Set up doors
  for ( i=0; i<doors; i++ ) {
		door[i] = EMPTY
	}
	door[int(rand()*doors)] = PRIZE # One door with prize

  chosen = door[choice]
  del door[choice]

  #if you didn't choose the prize first time around then
  # that will be the alternative
	alternative = (chosen == PRIZE) ? EMPTY : PRIZE 

	if( algorithm == KEEP) {
		return chosen
	} 
	if( algorithm == SWITCH) {
		return alternative
	} 
	return rand() <0.5 ? chosen : alternative
}
function simulate(algo){
	prizecount = 0
	for(j=0; j< iterations; j++){
		if( monty_hall( int(rand()*doors), algo) == PRIZE) { 
			prizecount ++ 
		}
	}
	printf "  Algorithm %7s: prize count = %i, = %6.2f%%\n", \
		algo, prizecount,prizecount*100/iterations
}
BEGIN {
	print "\nMonty Hall problem simulation:"
	print doors, "doors,", iterations, "iterations.\n"
	simulate(KEEP)
	simulate(SWITCH)
	simulate(RAND)
}

Sample Output

gawk -f montyHall.awk

Monty Hall problem simulation:
3 doors, 10000 iterations.

  Algorithm    keep: prize count = 3411, =  34.11%
  Algorithm  switch: prize count = 6655, =  66.55%
  Algorithm  random: prize count = 4991, =  49.91%

categories: Top10,TenLiners,Mar,2009,ScottP

Predicting Gender

Contents

Synopsis

echo name | gawk -f gender.awk

Download

Download from LAWKER

Description

The following code predicts gender, given a first name.

This code is an excellent example of rule-based programming in Awk.

For a full description of the code, see

Code

                                          { sex = "m" } # Assume male.

/^.*[aeiy]$/                              { sex = "f" }  # Female names endng in a/e/i/y.
/^All?[iy]((ss?)|z)on$/                   { sex = "f" }  # Allison (and variations)
/^.*een$/                                 { sex = "f" }  # Cathleen, Eileen, Maureen,...
/^[^S].*r[rv]e?y?$/                       { sex = "m" }  # Barry, Larry, Perry,...
/^[^G].*v[ei]$/                           { sex = "m" }  # Clive, Dave, Steve,...
/^[^BD].*(b[iy]|y|via)nn?$/               { sex = "f" }  # Carolyn,Gwendolyn,Vivian,...
/^[^AJKLMNP][^o][^eit]*([glrsw]ey|lie)$/  { sex = "m" }  # Dewey, Stanley, Wesley,...
/^[^GKSW].*(th|lv)(e[rt])?$/              { sex = "f" }  # Heather, Ruth, Velvet,...
/^[CGJWZ][^o][^dnt]*y$/                   { sex = "m" }  # Gregory, Jeremy, Zachary,...
/^.*[Rlr][abo]y$/                         { sex = "m" }  # Leroy, Murray, Roy,...
/^[AEHJL].*il.*$/                         { sex = "f" }  # Abigail, Jill, Lillian,...
/^.*[Jj](o|o?[ae]a?n.*)$/                 { sex = "f" }  # Janet, Jennifer, Joan,...
/^.*[GRguw][ae]y?ne$/                     { sex = "m" }  # Duane, Eugene, Rene,...
/^[FLM].*ur(.*[^eotuy])?$/                { sex = "f" }  # Fleur, Lauren, Muriel,...
/^[CLMQTV].*[^dl][in]c.*[ey]$/            { sex = "m" }  # Lance, Quincy, Vince,...
/^M[aei]r[^tv].*([^cklnos]|([^o]n))$/     { sex = "f" }  # Margaret, Marylou, Miriam,...
/^.*[ay][dl]e$/                           { sex = "m" }  # Clyde, Kyle, Pascale,...
/^[^o]*ke$/                               { sex = "m" }  # Blake, Luke, Mike,...
/^[CKS]h?(ar[^lst]|ry).+$/                { sex = "f" }  # Carol, Karen, Sharon,...
/^[PR]e?a([^dfju]|qu)*[lm]$/              { sex = "f" }  # Pam, Pearl, Rachel,...
/^.*[Aa]nn.*$/                            { sex = "f" }  # Annacarol, Leann, Ruthann,...
/^.*[^cio]ag?h$/                          { sex = "f" }  # Deborah, Leah, Sarah,...
/^[^EK].*[grsz]h?an(ces)?$/               { sex = "f" }  # Frances, Megan, Susan,...
/^[^P]*([Hh]e|[Ee][lt])[^s]*[ey].*[^t]$/  { sex = "f" }  # Ethel, Helen, Gretchen,...
/^[^EL].*o(rg?|sh?)?(e|ua)$/              { sex = "m" }  # George, Joshua, Theodore,..
/^[DP][eo]?[lr].*se$/                     { sex = "f" }  # Delores, Doris, Precious,...
/^[^JPSWZ].*[denor]n.*y$/                 { sex = "m" }  # Anthony, Henry, Rodney,...
/^K[^v]*i.*[mns]$/                        { sex = "f" }  # Karin, Kim, Kristin,...
/^Br[aou][cd].*[ey]$/                     { sex = "m" }  # Bradley, Brady, Bruce,...
/^[ACGK].*[deinx][^aor]s$/                { sex = "f" }  # Agnes, Alexis, Glynis,...
/^[ILW][aeg][^ir]*e$/                     { sex = "m" }  # Ignace, Lee, Wallace,...
/^[^AGW][iu][gl].*[drt]$/                 { sex = "f" }  # Juliet, Mildred, Millicent,...
/^[ABEIUY][euz]?[blr][aeiy]$/             { sex = "m" }  # Ari, Bela, Ira,...
/^[EGILP][^eu]*i[ds]$/                    { sex = "f" }  # Iris, Lois, Phyllis,...
/^[ART][^r]*[dhn]e?y$/                    { sex = "m" }  # Randy, Timothy, Tony,...
/^[BHL].*i.*[rtxz]$/                      { sex = "f" }  # Beatriz, Bridget, Harriet,...
/^.*oi?[mn]e$/                            { sex = "m" }  # Antoine, Jerome, Tyrone,...
/^D.*[mnw].*[iy]$/                        { sex = "m" }  # Danny, Demetri, Dondi,...
/^[^BG](e[rst]|ha)[^il]*e$/               { sex = "m" }  # Pete, Serge, Shane,...
/^[ADFGIM][^r]*([bg]e[lr]|il|wn)$/        { sex = "f" }  # Angel, Gail, Isabel,...

                                          { print sex }  # Output prediction

Author

by Scott Pakin, August 1991

categories: Games,TenLiners,Eliza,Apr,2009,JuergenK

Eliza

Contents

Synopsis

gawk -f eliza.awk

Download

Download from LAWKER.

Description

From Wikipedia: ELIZA was a computer program and an early example of (by modern standards) primitive natural language processing. ELIZA operated by processing users' responses to scripts, the most famous of which was DOCTOR, a simulation of a Rogerian psychotherapist. In this mode, ELIZA mostly rephrased the user's statements as questions and posed those to the 'patient.' ELIZA was written by Joseph Weizenbaum between 1964 to 1966.

Example dialogue

This is a Gawk implementation of the famous ELIZA program by Joseph Weizenbaum.
To end, type SHUT UP

HI, IM ELIZA, TELL ME YOUR PROBLEM
I hate dogs
CAN YOU ELABORATE ON THAT ?
Dogs remind me of Mylie Cyrus
CAN YOU ELABORATE ON THAT ?
You're kidding, right? Mylie Cyrus is one of the biggest stars on the planet
WHAT MAKES YOU THINK I AM KIDDING, RIGHT? MYLIE CYRUS IS ONE OF THE BIGGEST STARS ON THE PLANET ?
Cause you did not know about Mylie
WE WERE DISCUSSING YOU -- NOT ME
No,  I was discussing Mylie
Oh, I ID NOT KNOW ABOUT MYLIE
Well aren't you stupid
Oh, I TUPID
can you get her number?
WE WERE DISCUSSING YOU -- NOT ME
sorry
PLEASE DON'T APOLOGIZE !
but you sound so angry
YOU'RE NOT REALLY TALKING ABOUT ME, ARE YOU ?
i think you are  a robot
Oh, I RE  A ROBOT
yes
I UNDERSTAND
no you don't
YOU'RE NOT REALLY TALKING ABOUT ME, ARE YOU ?
hey- is this eliza?
WHAT'S ABOUT ME ?
I want to take you to dinner
YOU'RE NOT REALLY TALKING ABOUT ME, ARE YOU ?
I feel you are really stupid
WE WERE DISCUSSING YOU -- NOT ME
SHUT UP
WELL, PLEASE PAY YOUR BILL. ITS EXACTLY ... $101

Code

Set up

BEGIN {
	SetUpEliza() 
	print "This is a Gawk implementation of the "\
          "famous ELIZA program by Joseph Weizenbaum. "\
	      "To end, type SHUT UP\n";
	print ElizaSays("");
}
{ print ElizaSays($0) }

ElizaSays

function ElizaSays(YouSay) {
 if (YouSay == "") {
   cost = 0
   answer = "HI, IM ELIZA, TELL ME YOUR PROBLEM"
 } else {
   q = toupper(YouSay)
   gsub("'", "", q)
   if(q == qold) {
     answer = "PLEASE DONT REPEAT YOURSELF !"
   } else {
     if (index(q, "SHUT UP") > 0) {
       answer = "WELL, PLEASE PAY YOUR BILL. ITS EXACTLY ... $"\
                int(100*rand()+30+cost/100)
		1;
     } else {
       qold = q
       w = "-"                 # no keyword recognized yet
       for (i in k) {          # search for keywords
         if (index(q, i) > 0) {
           w = i
           break
         }
       }
       if (w == "-") {         # no keyword, take old subject
         w    = wold
         subj = subjold
       } else {                # find subject
         subj = substr(q, index(q, w) + length(w)+1)
         wold = w
         subjold = subj        #  remember keyword and subject
       }
       for (i in conj)
          gsub(i, conj[i], q)   # conjugation
       # from all answers to this keyword, select one randomly
       answer = r[indices[int(split(k[w], indices) * rand()) + 1]]
       # insert subject into answer
       gsub("_", subj, answer)
     }
   }
 }
 cost += length(answer) # for later payment : 1 cent per character
 return answer
}

SetUpEliza

function SetUpEliza() {
 srand()
 wold = "-"
 subjold = " "

 # table for conjugation
 conj[" ARE "     ] = " AM "
 conj["WERE "     ] = "WAS "
 conj[" YOU "     ] = " I "
 conj["YOUR "     ] = "MY "
 conj[" IVE "     ] =\
 conj[" I HAVE "  ] = " YOU HAVE "
 conj[" YOUVE "   ] =\
 conj[" YOU HAVE "] = " I HAVE "
 conj[" IM "      ] =\
 conj[" I AM "    ] = " YOU ARE "
 conj[" YOURE "   ] =\
 conj[" YOU ARE " ] = " I AM "

 # table of all answers
 r[1]   = "DONT YOU BELIEVE THAT I CAN  _"
 r[2]   = "PERHAPS YOU WOULD LIKE TO BE ABLE TO _ ?"
 r[3]   = "YOU WANT ME TO BE ABLE TO _ ?"
 r[4]   = "PERHAPS YOU DONT WANT TO _ "
 r[5]   = "DO YOU WANT TO BE ABLE TO _ ?"
 r[6]   = "WHAT MAKES YOU THINK I AM _ ?"
 r[7]   = "DOES IT PLEASE YOU TO BELIEVE I AM _ ?"
 r[8]   = "PERHAPS YOU WOULD LIKE TO BE _ ?"
 r[9]   = "DO YOU SOMETIMES WISH YOU WERE _ ?"
 r[10]  = "DONT YOU REALLY _ ?"
 r[11]  = "WHY DONT YOU _ ?"
 r[12]  = "DO YOU WISH TO BE ABLE TO _ ?"
 r[13]  = "DOES THAT TROUBLE YOU ?"
 r[14]  = "TELL ME MORE ABOUT SUCH FEELINGS"
 r[15]  = "DO YOU OFTEN FEEL _ ?"
 r[16]  = "DO YOU ENJOY FEELING _ ?"
 r[17]  = "DO YOU REALLY BELIEVE I DONT _ ?"
 r[18]  = "PERHAPS IN GOOD TIME I WILL _ "
 r[19]  = "DO YOU WANT ME TO _ ?"
 r[20]  = "DO YOU THINK YOU SHOULD BE ABLE TO _ ?"
 r[21]  = "WHY CANT YOU _ ?"
 r[22]  = "WHY ARE YOU INTERESTED IN WHETHER OR NOT I AM _ ?"
 r[23]  = "WOULD YOU PREFER IF I WERE NOT _ ?"
 r[24]  = "PERHAPS IN YOUR FANTASIES I AM _ "
 r[25]  = "HOW DO YOU KNOW YOU CANT _ ?"
 r[26]  = "HAVE YOU TRIED ?"
 r[27]  = "PERHAPS YOU CAN NOW _ "
 r[28]  = "DID YOU COME TO ME BECAUSE YOU ARE _ ?"
 r[29]  = "HOW LONG HAVE YOU BEEN _ ?"
 r[30]  = "DO YOU BELIEVE ITS NORMAL TO BE _ ?"
 r[31]  = "DO YOU ENJOY BEING _ ?"
 r[32]  = "WE WERE DISCUSSING YOU -- NOT ME"
 r[33]  = "Oh, I _"
 r[34]  = "YOU'RE NOT REALLY TALKING ABOUT ME, ARE YOU ?"
 r[35]  = "WHAT WOULD IT MEAN TO YOU, IF YOU GOT _ ?"
 r[36]  = "WHY DO YOU WANT _ ?"
 r[37]  = "SUPPOSE YOU SOON GOT _"
 r[38]  = "WHAT IF YOU NEVER GOT _ ?"
 r[39]  = "I SOMETIMES ALSO WANT _"
 r[40]  = "WHY DO YOU ASK ?"
 r[41]  = "DOES THAT QUESTION INTEREST YOU ?"
 r[42]  = "WHAT ANSWER WOULD PLEASE YOU THE MOST ?"
 r[43]  = "WHAT DO YOU THINK ?"
 r[44]  = "ARE SUCH QUESTIONS IN YOUR MIND OFTEN ?"
 r[45]  = "WHAT IS IT THAT YOU REALLY WANT TO KNOW ?"
 r[46]  = "HAVE YOU ASKED ANYONE ELSE ?"
 r[47]  = "HAVE YOU ASKED SUCH QUESTIONS BEFORE ?"
 r[48]  = "WHAT ELSE COMES TO MIND WHEN YOU ASK THAT ?"
 r[49]  = "NAMES DON'T INTEREST ME"
 r[50]  = "I DONT CARE ABOUT NAMES -- PLEASE GO ON"
 r[51]  = "IS THAT THE REAL REASON ?"
 r[52]  = "DONT ANY OTHER REASONS COME TO MIND ?"
 r[53]  = "DOES THAT REASON EXPLAIN ANYTHING ELSE ?"
 r[54]  = "WHAT OTHER REASONS MIGHT THERE BE ?"
 r[55]  = "PLEASE DON'T APOLOGIZE !"
 r[56]  = "APOLOGIES ARE NOT NECESSARY"
 r[57]  = "WHAT FEELINGS DO YOU HAVE WHEN YOU APOLOGIZE ?"
 r[58]  = "DON'T BE SO DEFENSIVE"
 r[59]  = "WHAT DOES THAT DREAM SUGGEST TO YOU ?"
 r[60]  = "DO YOU DREAM OFTEN ?"
 r[61]  = "WHAT PERSONS APPEAR IN YOUR DREAMS ?"
 r[62]  = "ARE YOU DISTURBED BY YOUR DREAMS ?"
 r[63]  = "HOW DO YOU DO ... PLEASE STATE YOUR PROBLEM"
 r[64]  = "YOU DON'T SEEM QUITE CERTAIN"
 r[65]  = "WHY THE UNCERTAIN TONE ?"
 r[66]  = "CAN'T YOU BE MORE POSITIVE ?"
 r[67]  = "YOU AREN'T SURE ?"
 r[68]  = "DON'T YOU KNOW ?"
 r[69]  = "WHY NO _ ?"
 r[70]  = "DON'T SAY NO, IT'S ALWAYS SO NEGATIVE"
 r[71]  = "WHY NOT ?"
 r[72]  = "ARE YOU SURE ?"
 r[73]  = "WHY NO ?"
 r[74]  = "WHY ARE YOU CONCERNED ABOUT MY _ ?"
 r[75]  = "WHAT ABOUT YOUR OWN _ ?"
 r[76]  = "CAN'T YOU THINK ABOUT A SPECIFIC EXAMPLE ?"
 r[77]  = "WHEN ?"
 r[78]  = "WHAT ARE YOU THINKING OF ?"
 r[79]  = "REALLY, ALWAYS ?"
 r[80]  = "DO YOU REALLY THINK SO ?"
 r[81]  = "BUT YOU ARE NOT SURE YOU _ "
 r[82]  = "DO YOU DOUBT YOU _ ?"
 r[83]  = "IN WHAT WAY ?"
 r[84]  = "WHAT RESEMBLANCE DO YOU SEE ?"
 r[85]  = "WHAT DOES THE SIMILARITY SUGGEST TO YOU ?"
 r[86]  = "WHAT OTHER CONNECTION DO YOU SEE ?"
 r[87]  = "COULD THERE REALLY BE SOME CONNECTIONS ?"
 r[88]  = "HOW ?"
 r[89]  = "YOU SEEM QUITE POSITIVE"
 r[90]  = "ARE YOU SURE ?"
 r[91]  = "I SEE"
 r[92]  = "I UNDERSTAND"
 r[93]  = "WHY DO YOU BRING UP THE TOPIC OF FRIENDS ?"
 r[94]  = "DO YOUR FRIENDS WORRY YOU ?"
 r[95]  = "DO YOUR FRIENDS PICK ON YOU ?"
 r[96]  = "ARE YOU SURE YOU HAVE ANY FRIENDS ?"
 r[97]  = "DO YOU IMPOSE ON YOUR FRIENDS ?"
 r[98]  = "PERHAPS YOUR LOVE FOR FRIENDS WORRIES YOU"
 r[99]  = "DO COMPUTERS WORRY YOU ?"
 r[100] = "ARE YOU TALKING ABOUT ME IN PARTICULAR ?"
 r[101] = "ARE YOU FRIGHTENED BY MACHINES ?"
 r[102] = "WHY DO YOU MENTION COMPUTERS ?"
 r[103] = "WHAT DO YOU THINK MACHINES HAVE TO DO WITH YOUR PROBLEMS ?"
 r[104] = "DON'T YOU THINK COMPUTERS CAN HELP PEOPLE ?"
 r[105] = "WHAT IS IT ABOUT MACHINES THAT WORRIES YOU ?"
 r[106] = "SAY, DO YOU HAVE ANY PSYCHOLOGICAL PROBLEMS ?"
 r[107] = "WHAT DOES THAT SUGGEST TO YOU ?"
 r[108] = "I SEE"
 r[109] = "IM NOT SURE I UNDERSTAND YOU FULLY"
 r[110] = "COME COME ELUCIDATE YOUR THOUGHTS"
 r[111] = "CAN YOU ELABORATE ON THAT ?"
 r[112] = "THAT IS QUITE INTERESTING"
 r[113] = "WHY DO YOU HAVE PROBLEMS WITH MONEY ?"
 r[114] = "DO YOU THINK MONEY IS EVERYTHING ?"
 r[115] = "ARE YOU SURE THAT MONEY IS THE PROBLEM ?"
 r[116] = "I THINK WE WANT TO TALK ABOUT YOU, NOT ABOUT ME"
 r[117] = "WHAT'S ABOUT ME ?"
 r[118] = "WHY DO YOU ALWAYS BRING UP MY NAME ?"
 # table for looking up answers that
 # fit to a certain keyword
 k["CAN YOU"]      = "1 2 3"
 k["CAN I"]        = "4 5"
 k["YOU ARE"]      =\
 k["YOURE"]        = "6 7 8 9"
 k["I DONT"]       = "10 11 12 13"
 k["I FEEL"]       = "14 15 16"
 k["WHY DONT YOU"] = "17 18 19"
 k["WHY CANT I"]   = "20 21"
 k["ARE YOU"]      = "22 23 24"
 k["I CANT"]       = "25 26 27"
 k["I AM"]         =\
 k["IM "]          = "28 29 30 31"
 k["YOU "]         = "32 33 34"
 k["I WANT"]       = "35 36 37 38 39"
 k["WHAT"]         =\
 k["HOW"]          =\
 k["WHO"]          =\
 k["WHERE"]        =\
 k["WHEN"]         =\
 k["WHY"]          = "40 41 42 43 44 45 46 47 48"
 k["NAME"]         = "49 50"
 k["CAUSE"]        = "51 52 53 54"
 k["SORRY"]        = "55 56 57 58"
 k["DREAM"]        = "59 60 61 62"
 k["HELLO"]        =\
 k["HI "]          = "63"
 k["MAYBE"]        = "64 65 66 67 68"
 k[" NO "]         = "69 70 71 72 73"
 k["YOUR"]         = "74 75"
 k["ALWAYS"]       = "76 77 78 79"
 k["THINK"]        = "80 81 82"
 k["LIKE"]         = "83 84 85 86 87 88 89"
 k["YES"]          = "90 91 92"
 k["FRIEND"]       = "93 94 95 96 97 98"
 k["COMPUTER"]     = "99 100 101 102 103 104 105"
 k["-"]            = "106 107 108 109 110 111 112"
 k["MONEY"]        = "113 114 115"
 k["ELIZA"]        = "116 117 118"
}

Author

Juergen Kahrs


categories: TenLiners,Apr,2009,AlanL

Towers of Hanoi

Contents

Synopsis

Description

Options

Example

Details

Globals

Code

Author

Synopsis

gawk -f hanoi.awk [-n Disks]

Description

The objective is to move N discks from stack 0 to stack 1, always putting a smaller disc on top of a larger one. or on an empty stack

Options

-n
Number of disks, defaults to 5.

Example

gawk -f hanoi.awk -n 4
0 4321
1 
2 

0 432
1 
2 1

0 43
1 2
2 1

0 43
1 21
2 

0 4
1 21
2 3

0 41
1 2
2 3

0 41
1 
2 32

0 4
1 
2 321

0 
1 4
2 321

0 
1 41
2 32

0 2
1 41
2 3

0 21
1 4
2 3

0 21
1 43
2 

0 2
1 43
2 1

0 
1 432
2 1

0 
1 4321
2 

Details

Globals

sp[i]
stack pointer for the ith stack = next free space
stack[i,j]
value of stack i at position j

Code

Main:

BEGIN {
  n = arg("-n",5)
  for (j=0; j<n; j++) push(0,n-j)
  showstacks()
  hanoi(n,0,1,2)
}

function hanoi(n,a,b,c) {
  if (n==1) {
    move(a,b)
  } else {
    hanoi(n-1,a,c,b)
    move(a,b)
    hanoi(n-1,c,b,a)
  }
}
function move(i,j) {
  push(j,pop(i))
  showstacks()
}

Showing the stack:

function showstacks(  i,j) {
  for (i=0; i<=2; i++) {
    printf "%s ", i
    for (j=0; j<sp[i]; j++) printf "%s", stack[i,j]
    print "" }
  print ""
}

Standard stuff:

function arg(tag,default) {
  for(i in ARGV) 
	if (ARGV[i] ~ tag) 
		return ARGV[i+1]
  return default
}
function push(i,v) { stack[i,sp[i]++]=v }
function pop(i)    { return stack[i,--sp[i]] }

Author

Alan Linton, 2001


categories: Games,TenLiners,Apr,2009,Anon

Mind-Reading Machine

Contents

Synposis

gawk -f readminds.awk

(then type "h" or "t").

Download

Download from LAWKER.

Description

Theory

Shannon's 1953 memo, A Mind-Reading(?) Machine, describes a machine built out of relays at Bell Labs.

    This machine is a somewhat simplified model of a machine designed by D.W. Hagelbarger. It plays what is essentially the old game of matching pennies or "odds and evens". This games has been discussed from the game theoretic angle by von Neumann and Morgenstern, and from the psychological point of view by Edgar Allen Poe in "The Purloined Letter". Oddly enough, the machine is aimed more nearly at Poe's method of play than von Neumann's.

The machine took advantage of the difficulty of generating truly random behavior in wetware by using a small (8-state) markov model to predict its human opponents.

Practice

We implement a 1970's version of this 1950's algorithm, using AWK instead of mechanical relays.

Our markov model is based on behavior over the last two rounds, with hpa and hpb recording the history of the player's plays, and hca and hcb recording the history of the computer's guesses. The possible cases are: the player won or lost two rounds ago, changed plays or stayed with the same play, and won or lost the last round, for a total of 23 = 8 histories, with any bias towards changing or staying in the upcoming round kept in the tally array.

If the player has repeated their behavior for a given history at least twice, we guess according to their predicted behavior. After the first observation, we guess with a 75%/25%, split, weighted towards the bias. If the player hasn't shown any bias (or during the first two rounds of the game), we guess at hazard.

Code

Begin

BEGIN	{
 print "+---------------------------------------------------------+"
 print "| An AWKward mind-reading machine                         |"
 print "|         (this retrogame inspired by the Bell Labs Memo: |"
 print "|          Shannon, 1953, 'A Mind-Reading (?) Machine')   |"
 print "+---------------------------------------------------------+"
 print "Shall we play a game?"
 print "Tell me either 'heads' or 'tails'."
 print "If I guess what you picked, I win.  Otherwise, you win."
 print "The match goes for 100 rounds, or someone gets 20."
 printf "your play? "
}

set seed

BEGIN	{ "date +%s" | getline seed; srand(seed) }

consult model

BEGIN	{ t = 0 }
NR > 2	{
	case = (hpa!=hca)"/"(hpa!=hpb)"/"(hpb!=hcb)
	t = tally[case]
	}

t < -1	{ guess=!hpb }
t == -1 { guess=(int(rand()+.75)?!hpb:hpb) }
t == 0	{ guess=int(rand()+.5) }
t == 1  { guess=(int(rand()+.75)?hpb:!hpb) }
t > 1	{ guess= hpb }

get input

/^[hH]/		{ play=1 }
/^[tT]/		{ play=0 }
/^[^hHtT]/	{ printf "heads or tails? "; next }

report results

We also report the results of the round to the player (in case they wish to update their internal models). En passant, we update pw and cw, the number of player (resp. computer) wins.

	{
	printf "You played " (play?"heads":"tails")
	printf "; I guessed " (guess?"heads":"tails")
	printf ".  "(play==guess?"I":"You")" win. "
	print "("(pw+=(play!=guess))"-"(cw+=(play==guess))")"
	}

update model

After finishing a round, we update the history with the results, including updating tally according to the player's behavior. Again, we wait for two rounds before touching the tally counters, at which point the history will have been fully initialized.

NR > 2	{ tally[case] += (hpb == play ? 1 : -1) }
	{
	hpa = hpb; hpb = play
	hca = hcb; hcb = guess
	}

check for victory

At the end of each round, if we haven't met a victory condition, we prompt for the next round.

cw+pw==100	{ printf (cw>pw?"I":"You")" won the match "
		  print  "by "(cw>pw?cw-pw:pw-cw)" games."
		  exit }
pw-cw==20	{ print "You win -- up by 20"; exit }
cw-pw==20	{ print "I win -- up by 20"; exit }
		{ printf "? " }

end

END	{ 
	print " T H A N K   Y O U   F O R   P L A Y I N G "
	}

Copyright

Copyright (c) 2009 the authors listed at the following URL, and/or the authors of referenced articles or incorporated external code: http://en.literateprograms.org/Mind_reading_machine_(AWK)?action=history&offset=20070207160312

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.


categories: Games,TenLiners,Apr,2009,Ysa

mastermind.awk

Contents

Synopsis

Download

Description

Example

Code

See Also

Author

Synopsis

gawk -f mastermind.awk

Download

Download from LAWKER.

Description

The aim of the game is to guess 4 numbers from 0,1,2,3,4,5,6,7,8,9. A "hit" is the right number in the right position and a "blow" is the right number in a wrong position.

You lose the game if you fail to guess after 10 rounds.

Example

 +++  Hit & Blow  +++   <Push Enter>

[ 1] >> 1234
              ##  1 Hit  2 Blow
[ 2] >> 1256
              ##  1 Hit  1 Blow
[ 3] >> 1789
              ##  1 Hit  0 Blow
[ 4] >> 1243 
              ##  1 Hit  2 Blow
[ 5] >> 1340
              ##  3 Hit  0 Blow
[ 6] >> 1320

  Congratulations !!  (1320)

Code

BEGIN{ 
	srand();  
	c=1;  
	print "\n\n +++  Hit & Blow  +++   <Push Enter>\n";
	q[z=p=int(9*rand())+1]=1;  
	for(i=2; i<=4;) 
		if(q[p=int(10*rand())]<1){ 
			q[p]=i++;  
			z=z*10+p; }
}

Note that the range 1023 ... 9876 are the smallest and largerst 4 digit integers with no repeates.

{ if((n=int($0+0))>=1023 && n<=9876) { 
		++c;
   		v=0;  
		for(i=4; i>0; n=int(n/10)) 
			v+=(q[p=n%10]==i--)?10:(q[p]>0)?1:0;
    			if	(v==40) exit; 
				else printf("%16s %2d Hit %2d Blow\n", "##", v/10, v%10);
 	}
 	if	(c>10) exit; 
	else printf("[%2d] >> ", c);
}
END{ 
	printf("\n  %s  (%d)\n", (v==40)?"Congratulations !!":"Over times", z); 
}

See Also

mastermind2.awk.

Author

The author's name is YSA.


categories: Sorting,TenLiners,Apr,2009,Ysa

Quicksort.awk

Contents

Synopsis

cat numbers | gawk -f quicksort.awk

Download

Download from LAWKER.

Description

Some Awk implementations come with built in sort routines (e.g. Gawk's asort and asorti functions). But it can be useful to code these yourself, especially in you are doing data structure tricks.

Quicksort selects a pivot and divides the data into values above and below the pivot. Sorting then recurses on these sub-lists.

Code

Loading the data

BEGIN { RS = ""; FS = "\n" }
      { A[NR] = $0 } 
END {
	qsort(A, 1, NR)
	for (i = 1; i <= NR; i++) {
		print A[i]
		if (i == NR) break
		print ""
	}
}

Sorting the data

function qsort(A, left, right,   i, last) {
	if (left >= right)
		return
	swap(A, left, left+int((right-left+1)*rand()))
	last = left
	for (i = left+1; i <= right; i++)
		if (A[i] < A[left])
			swap(A, ++last, i)
	swap(A, left, last)
	qsort(A, left, last-1)
	qsort(A, last+1, right)
}
function swap(A, i, j,   t) {
	t = A[i]; A[i] = A[j]; A[j] = t
}

See also

quicksort2.awk

Authors

Alfred Aho, Peter Weinberger, Brian Kernighan, 1988.


categories: TenLiners,Learn,Jan,2009,Admin

Awk ten-liners

Awk is famous for how much it can do in (around) 101 lines. Here are some samples of that capability.

(And if you have any more to add, please send them in.)


categories: TenLiners,Learn,Jan,2009,Ronl

Some Gawk (and PERL) Samples

by R. Loui

Here are a few short programs that do the same thing in each language. When reading these examples, the question to ask is `how many language features do I need to understand in order to understand the syntax of these examples'.

Some of these are longer than they need to be since they don't exploit some (e.g.) command line trick to wrap the code in for each line do X. And that is the point- for teach-ability, the preferred language is the one you need to know LESS about before you can be useful in it.

hello world

PERL:

 print "hello world\n"

GAWK:

 BEGIN { print "hello world" }

One plus one

PERL

 $x= $x+1;

GAWK

 x= x+1

Printing

PERL

 print $x, $y, $z;

GAWK

 print x,y,z

Printing the first field in a file

PERL

 while (<>) { 
   split(/ /);
   print "@_[0]\n" 
 }

GAWK

 { print $1 }

Printing lines, reversing fields

PERL

 while (<>) { 
  split(/ /);
  print "@_[1] @_[0]\n" 
 }

GAWK

 { print $2, $1 }

Concatenation of variables

PERL

 command = "cat $fname1 $fname2 > $fname3"

GAWK

 command = "cat " fname1 " " fname2 " > " fname3

Looping

PERL:

 for (1..10) { print $_,"\n" }

GAWK:

 BEGIN { 
  for (i=1; i<=10; i++) print i
 }

Pairs of numbers

PERL:

 for (1..10) { print "$_ ",$_-1 }
 print "\n"

GAWK:

 BEGIN { 
  for (i=1; i<=10; i++) printf i " " i-1
  print ""
 }

List of words into a hash

PERL

  foreach $x ( split(/ /,"this is not stored linearly") ) 
  { print "$x\n" }

GAWK

 BEGIN { 
  split("this is not stored linearly",temp)
  for (i in temp) print temp[i]
 }

Printing a hash in some key order

PERL

 $n = split(/ /,"this is not stored linearly");
 for $i (0..$n-1) { print "$i @_[$i]\n" }
 print "\n";
 for $i (@_) { print ++$j," ",$i,"\n" }

AWK

 BEGIN { 
  n = split("this is not stored linearly",temp)
  for (i=1; i<=n; i++) print i, temp[i]
  print ""
  for (i in temp) print i, temp[i]
 }

Printing all lines in a file

PERL

 open file,"/etc/passwd";
 while (<file>) { print $_ }

GAWK

  BEGIN { 
  while (getline < "/etc/passwd") print
 }

Printing a string

PERL

 $x = "this " . "that " . "\n";
 print $x

GAWK

 BEGIN {
  x = "this " "that " "\n" ; printf x
 }

Building and printing an array

PERL

 $assoc{"this"} = 4;
 $assoc{"that"} = 4;
 $assoc{"the other thing"} = 15;
 for $i (keys %assoc) { print "$i $assoc{$i}\n" }

GAWK

 BEGIN {
   assoc["this"] = 4
   assoc["that"] = 4
   assoc["the other thing"] = 15
   for (i in assoc) print i,assoc[i]
 }

Sorting an array

PERL

 split(/ /,"this will be sorted once in an array");
 foreach $i (sort @_) { print "$i\n" }

GAWK

 BEGIN {
  split("this will be sorted once in an array",temp," ")
  for (i in temp) print temp[i] | "sort"
  while ("sort" | getline) print
 }

Sorting an array (#2)

GAWK

 BEGIN {
  split("this will be sorted once in an array",temp," ")
  n=asort(temp)
  for (i=1;i<=n;i++) print temp[i] 
 }

Print all lines, vowels changed to stars

PERL

 while (<STDIN>) {
  s/[aeiou]/*/g;
  print $_
 }

GAWK

 {gsub(/[aeiou]/,"*"); print }

Report from file

PERL

 #!/pkg/gnu/bin/perl
 # this is a comment
 #
 open(stream1,"w | ");
 while ($line = <stream1>) {
   ($user, $tty, $login, $junk) = split(/ +/, $line, 4);
   print "$user $login ",substr($line,49)
 }

GAWK

#!/pkg/gnu/bin/gawk -f
 # this is a comment
 #
 BEGIN {
   while ("w" | getline) {
     user = $1; tty = $2; login = $3
     print user, login, substr($0,49)
   }
 }

Web Slurping

PERL

 open(stream1,"lynx -dump 'cs.wustl.edu/~loui' | ");
 while ($line = <stream1>) {
   if ($flag && $line =~ /[0-9]/) { print $line }
   if ($line =~ /References/) { $flag = 1 }
 }

GAWK

 BEGIN {
  com = "lynx -dump 'cs.wustl.edu/~loui' &> /dev/stdout"
  while (com | getline line) {
    if (flag && line ~ /[0-9]/) { print line }
    if (line ~ /References/) { flag = 1 }
  }
 }

categories: Sorting,TenLiners,Apr,2009,Awk

quicksort2.awk

Contents

Synopsis

Download

Description

Code

Bugs

See also

Copyright

Author

Synopsis

cat numbers | gawk -f quicksort2.awk

Download

Download from LAWKER.

Description

Quicksort divides the input data around a randomly selected pivot, then recurses on the divided data.

In quicksort2, the pivot is selected from the first line of input. Each data division is handled by a different UNIX pipe and recursive gawk processes are called on the divided data.

Yes, this is not the fastest way to do it but (in theory anyway) it should be able to handle very big data sets.

Code

BEGIN   { 
         recurse1 = "gawk -f quicksort2.awk #" rand()
         recurse2 = "gawk -f quicksort2.awk #" rand()
        }
NR == 1 { pivot=$0; next }
NR > 1  { if($0 < pivot) print | recurse1
          if($0 > pivot) print | recurse2
        }
END     { close(recurse1)
          if(NR > 0) print pivot
	      close(recurse2)
        }

Bugs

The output ignores repeated input values. I thought it was a problem with repeating the name of the pipes (hence the "rand()" labelling) but that did not fix the issues.

See also

quicksort.awk

Copyright

Copyright (c) 2009 by David Long.

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Author

Original version: David Long, 2004. Tim Menzies added some modifications in 2009 to call recursive Gawk pipes on both sides of the pivot.


categories: TenLiners,Tools,June,2009,Timm

shuffle.awk

Contents

Synopsis

To rearrange the items in the input list:

 nshuffle(Array)

To rearrange the items in a copy of the input list:

 shuffle(Array,Copy)

The above calls assumes that array item zero stores the length of the array. If this is not the case, use:

 shuffles(Array,Copy)

Download

Download from LAWKER.

Description

Suppose we want to shuffle items an array into a random order. This shuffle sort do so in linear time and memory.

The algorithm comes from the dawn of computer time but I first heard of it from Bart Massey (at Portland State). Thank Bart for the clarity of the explanation and blame me for any silliness in the implementation.

The Slow Way

A simple way to shuffle an input array of elements is to:

  • Allocate an output array of the same size.
  • Copy items selected at random from the input to the output array.
  • Compact the input array by sliding the first part of the array down to fill the hole left by the removed item.

This algorithm is clearly correct. However, the algorithm requires time quadratic in the size of the list, and 2x space.

The Better Way

We can easily reduce the time complexity to O(N). The only thing done with the input array is to select random elements from it, the order of the elements in it is irrelevant. Therefore, instead of closing the hole left by a removed element by shifting elements, we'll close it by moving the first remaining element of the input array to fill the gap.

Note an important invariant of the algorithm:

   the number of elements left in the input array 
 + the number of elements in the output array 
 ------------------------------------------------
 = the number of elements initially passed in.  

This means that once an element is removed from the input array and the hole filled, there is a fresh hole created right at the beginning of the input array. Let us put the newly removed element in that hole. Now we can dispense with the output array altogether, and just return the input array. Now the space complexity is just x+1.

Code

This code assumes that the array "a" stores its size at "a[0]".

function nshuffle(a,  i,j,n,tmp) {
  n=a[0]; # a has items at 1...n
  for(i=1;i<=n;i++) {
    j=i+round(rand()*(n-i));
    tmp=a[j];
    a[j]=a[i];
    a[i]=tmp;
  };
  return n;
}
function round(x) { return int(x + 0.5) }

nshuffle is fast, but rearranges the order of items in the original list. shuffle generates a new copy of the list with the items in a random order.

function shuffle(a,b) {
  for(i in a) b[i]=a[i];
  nshuffle(b);
}

nshuffle also assumes that the list is stores the list size at position zero. If this is not the case, use shuffles.

function shuffles(a,b,   c,n) {
  for(i in a) {n++; c[i]=a[i]};
  c[0]=n;
  shuffle(c,b);
}

Correctness proof

By number of loop iterations

Base case:
When i = 0 the 0 array elements in a below i form a shuffled list of 0 elements. All remaining elements are candidates for append.
Inductive case:
Assume that i = k and that the sequence of elements in a below k are a random subsequence of the input values of length k. Now every possible remaining candidate is equally likely to occur at position k in this iteration. Thus at the end of the iteration i = k + 1 and the sequence of elements in a below k + 1 are a random subsequence of the input values of length k + 1.

Examples

Random orders

One way to use the above is to run down a list in a random order. For example:

BEGIN {
  if (ShuffleDemo) {
  		if (Seed) { srand(Seed) } else { srand() };
  		s2i(ShuffleDemo,L1," ");
  		shuffles(L1,L2);
  		while(Item =pop(L2)) print Item;
  }
}
function s2i(str,a,sep,   n,i,tmp) {
  n=split(str,tmp,sep);
  for(i=1;i<=n;i++) a[i]=tmp[i];
  return n;
}
function pop(a,   x,i) {
  i=a[0]--;  
  if (!i) {return ""} else {x=a[i]; delete a[i]; return x}
} 

The above can be run using

 gawk -f shuffle.awk  -v ShuffleDemo="aa bb cc dd"

If you run this twice, you'll see two different orderings. Here's one:

 cc
 aa
 dd
 bb

And here's another:

 dd
 bb
 cc
 aa

Fast sampling

If you are generating the above lists very quickly, then be aware that srand() initializes its random number generator using CPU time in seconds. So, if you are calling the above command line many times per second, you can get repeated outputs.

The fix is to supply a seed from the Bash $RANDOM variable:

 gawk -f shuffle.awk -v ShuffleDemo="aa bb cc dd" -v Seed=$RANDOM

much faster than once a second, the above call will generate (far) fewer repeats.

Repeats

If you want to repeat some prior run (say, during debugging), set the Seed variable on the command line using (e.g.)

 gawk -f shuffle.awk -v ShuffleDemo="aa bb cc dd" -v Seed=23

This will always print out the same ordering.

Author

Tim Menzies


categories: TenLiners,Apr,2009,PhilipB

indent.awk

Contents

Synopsis

gawk -f indent.awk file.sh

Download

Download from LAWKER

Description

This is part of Phil's AWK tutorial at http://www.bolthole.com/AWK.html. This program adjusts the indentation level based on which keywords are found in each line it encounters.

Code

doindent

function doindent(){
	tmpindent=indent;
	if(indent<0){
		print "ERROR; indent level == " indent
	}
	while(tmpindent >0){
		printf("    ");
		tmpindent-=1;
	}
}

Out-denting

$1 == "done" 	{ indent -=1; }
$1 == "fi" 	{ indent -=1; }
$0 ~ /}/	{ if(indent!=0) indent-=1;  }

Worker

This is the 'default' action, that actually prints a line out. This gets called AS WELL AS any other matching clause, in the order they appear in this program. An "if" match is run AFTER we run this clause. A "done" match is run BEFORE we run this clause.

		{ 
		  doindent();
		  print $0;
		}

In-denting

$0 ~ /if.*;[ ]*then/	{ indent+=1; }
$0 ~ /for.*;[ ]*do/	{ indent+=1; }
$0 ~ /while.*;[ ]*do/	{ indent+=1; }

$1 == "then"		{ indent+=1; }
$1 == "do"		{ indent+=1; }
$0 ~ /{$/		{ indent+=1; }

Author

Philip Brown phil@bolthole.com

blog comments powered by Disqus