Awk.Info

"Cause a little auk awk
goes a long way."

About awk.info
 »  table of contents
 »  featured topics
 »  page tags


About Awk
 »  advocacy
 »  learning
 »  history
 »  Wikipedia entry
 »  mascot
Implementations
 »  Awk (rarely used)
 »  Nawk (the-one-true, old)
 »  Gawk (widely used)
 »  Mawk
 »  Xgawk (gawk + xml + ...)
 »  Spawk (SQL + awk)
 »  Jawk (Awk in Java JVM)
 »  QTawk (extensions to gawk)
 »  Runawk (a runtime tool)
 »  platform support
Coding
 »  one-liners
 »  ten-liners
 »  tips
 »  the Awk 100
Community
 »  read our blog
 »  read/write the awk wiki
 »  discussion news group

Libraries
 »  Gawk
 »  Xgawk
 »  the Lawker library
Online doc
 »  reference card
 »  cheat sheet
 »  manual pages
 »  FAQ

Reading
 »  articles
 »  books:

WHAT'S NEW?

Mar 01: Michael Sanders demos an X-windows GUI for AWK.

Mar 01: Awk100#24: A. Lahm and E. de Rinaldis' patent search, in AWK

Feb 28: Tim Menzies asks this community to write an AWK cookbook.

Feb 28: Arnold Robbins announces a new debugger for GAWK.

Feb 28: Awk100#23: Premysl Janouch offers a IRC bot, In AWK

Feb 28: Updated: the AWK FAQ

Feb 28: Tim Menzies offers a tiny content management system, in Awk.

Jan 31: Comment system added to awk.info. For example, see discussion bottom of ?keys2awk

Jan 31: Martin Cohen shows that Gawk can handle massively long strings (300 million characters).

Jan 31: The AWK FAQ is being updated. For comments/ corrections/ extensions, please mail tim@menzies.us

Jan 31: Martin Cohen finds Awk on the Android platform.

Jan 31: Aleksey Cheusov released a new version of runawk.

Jan 31: Hirofumi Saito contributes a candidate Awk mascot.

Jan 31: Michael Sanders shows how to quickly build an AWK GUI for windows.

Jan 31: Hyung-Hwan Chung offers QSE, an embeddable Awk Interpreter.

[More ...]

Bookmark and Share

categories: Awk100,May,2009,Dab

Jawk: Awk in Java

Download

Download from Source Forge.

Description

Jawk parses, analyzes, and interprets and/or compiles AWK scripts. Compilation is targetted for the JVM.

Jawk runs on any platform which supports, at minimum, J2SE 5.

Usage

To use, simply download the application, copy the release jar to the jawk.jar file and execute the following command:
java -jar jawk.jar {command-line-arguments}

To view the command line argument usage summary, execute

java -jar jawk.jar -h
The output of this command is shown below:
java ... org.jawk.Awk [-F fs_val] [-f script-filename] 
                      [-o output-filename] [-c] [-z] [-Z] 
                      [-d dest-directory] [-S] [-s] [-x] [-y] [-r] 
                      [-ext] [-ni] [-t] [-v name=val]... 
                      [script] [name=val | input_filename]...

 -F fs_val = Use fs_val for FS.
 -f filename = Use contents of filename for script.
 -v name=val = Initial awk variable assignments.

 -t = (extension) Maintain array keys in sorted order.
 -c = (extension) Compile to intermediate file. (default: a.ai)
 -o = (extension) Specify output file.
 -z = (extension) | Compile for JVM. (default: AwkScript.class)
 -Z = (extension) | Compile for JVM and execute it. (default: AwkScript.class)
 -d = (extension) | Compile to destination directory.  (default: pwd)
 -S = (extension) Write the syntax tree to file. (default: syntax_tree.lst)
 -s = (extension) Write the intermediate code to file. (default: avm.lst)
 -x = (extension) Enable _sleep, _dump as keywords, and exec as a builtin func.
                  (Note: exec enabled only in interpreted mode.)
 -y = (extension) Enable _INTEGER, _DOUBLE, and _STRING casting keywords.
 -r = (extension) Do NOT hide IllegalFormatExceptions for [s]printf.
-ext= (extension) Enable user-defined extensions. (default: not enabled)
-ni = (extension) Do NOT process stdin or ARGC/V through input rules.
                  (Useful for blocking extensions.)
                  (Note: -ext & -ni available only in interpreted mode.)

 -h or -? = (extension) This help screen.

Extensions

Jawk addresses a drawback with standard Awk. For example, in standard Awk, it us be impossible to create a socket or display a simple GUI without external assistance either from the shell or via extensions to Awk itself (i.e., gawk). To overcome this limitation, an extension facility is added to Jawk .

The Jawk extension facility allows for arbitrary Java code to be called as Awk functions in a Jawk script. These extensions can come from the user (developer) or 3rd party providers (i.e., the Jawk project team). And, Jawk extensions are opt-in. In other words, the -ext flag is required to use Jawk extensions and extensions must be explicitly registered to the Jawk instance via the -Djawk.extensions property (except for core extensions bundled with Jawk ).

Also, Jawk extensions support blocking. You can think of blocking as a tool for extension event management. A Jawk script can block on a collection of blockable services, such as socket input availability, database triggers, user input, GUI dialog input response, or a simple fixed timeout, and, together with the -ni option, action rules can act on block events instead of input text, leveraging a powerful AWK construct originally intended for text processing, but now can be used to process blockable events. A sample enhanced echo server script is included in this article. It uses blocking to handle socket events, standard input from the user, and timeout events, all within the 47-line script (including comments).

Example

The example script implements a simple echo server which also allows broadcast messaging via stdin input from the server process:
## to run: java ... -jar jawk.jar -ext -ni -f {filename}
BEGIN {
	css = CServerSocket(7777);
	print "(echo server socket created)"
}
## note: default input processing disabled by -ni
$0 = SocketAcceptBlock(css,
	SocketInputBlock(sockets,
		SocketCloseBlock(css, sockets,
			StdinBlock(
				Timeout(1000)))));
				## note: default action { print } disabled by -ni
# $1 = "SocketAccept", $2 = socket handle
$1 == "SocketAccept" {
	socket = SocketAccept($2)
	sockets[socket] = 1
}

# $1 = "SocketInput", $2 = socket handle
$1 == "SocketInput" {
	## echo server action:

	socket = $2
	line = SocketRead(socket)
	SocketWrite(socket, line)
}

# $1 = "SocketClose", $2 = socket handle
$1 == "SocketClose" {
	socket = $2
	SocketClose(socket)
	delete sockets[socket]
}
## display a . for every second the server is running
$0 == "Timeout" {
	printf "."
}
## stdin block is last because StdinGetline writes directly to $0
## $0 == "Stdin"
$0 == "Stdin" {
	## broadcast message to all sockets
	retcode = StdinGetline()
	if (retcode != 1)
		exit
	for (socket in sockets)
		SocketWrite(socket, "From server : " $0)
	print "(message sent)"
}

Each extension function used in the script above is covered in some detail below:

  • CServerSocket - Creates a character-based server socket. SocketRead for character-based sockets return lines of text (with newlines stripped), while SocketRead returns blocks of bytes (converted to a String) for sockets accepted by ServerSocket. Use character-based sockets for interactive or line-based input, and use ordinary sockets to achieve high-throughput since arbitrary byte blocks are returned. To create a client socket, use CSocket for character-based sockets, or Socket for byte-block-based sockets.
  • SocketAcceptBlock/SocketInputBlock/SocketCloseBlock/StdinBlock/Timeout - Each of these extensions is a blocking extension, blocking for particular events, such as a server socket is ready to accept an incoming socket, or a connected socket has input to be read, or a certain amount of time has elapsed, etc. Socket*Block extension functions come from SocketExtension, StdinBlock comes from StdinExtension, and Timeout comes from CoreExtension. Each Socket*Block extension returns a string of the format:
    extension-label-prefix OFS parameter
    
    while StdinBlock and Timeout returns
    extension-label-prefix
    
  • SocketAccept/SocketRead/SocketWrite/SocketClose - Socket operations, as the names of the extension functions suggest. Each will block until it is able to complete the operation.
  • StdinGetline - Get a line of input from stdin. If there is no stdin, block until input is available. This is why blocking is a valuable tool. This way, the script can wait for other events while waiting for stdin, bringing AWK out of the focused text processing domain into a powerful event processing language.

As stated by the comments, -ni disables stdin processing (as provided by Jawk itself, not the StdinExtension) and the default blank rule of { print } . Disabling stdin processing is paramount to extension processing because, otherwise, it would be confusing, if not completely impossible, to multiplex extension blocking with Jawk 's default stdin processing. And, disabling the default blank rule allows for easy-to-read blocking statements (like the one provided in the sample script) without the wierd side effect of printing the result.

Author

Dan: ddaglas at users.sourceforge.net.

blog comments powered by Disqus