Awk.Info

"Cause a little auk awk
goes a long way."

About awk.info
 »  table of contents
 »  featured topics
 »  page tags


About Awk
 »  advocacy
 »  learning
 »  history
 »  Wikipedia entry
 »  mascot
Implementations
 »  Awk (rarely used)
 »  Nawk (the-one-true, old)
 »  Gawk (widely used)
 »  Mawk
 »  Xgawk (gawk + xml + ...)
 »  Spawk (SQL + awk)
 »  Jawk (Awk in Java JVM)
 »  QTawk (extensions to gawk)
 »  Runawk (a runtime tool)
 »  platform support
Coding
 »  one-liners
 »  ten-liners
 »  tips
 »  the Awk 100
Community
 »  read our blog
 »  read/write the awk wiki
 »  discussion news group

Libraries
 »  Gawk
 »  Xgawk
 »  the Lawker library
Online doc
 »  reference card
 »  cheat sheet
 »  manual pages
 »  FAQ

Reading
 »  articles
 »  books:

WHAT'S NEW?

Mar 01: Michael Sanders demos an X-windows GUI for AWK.

Mar 01: Awk100#24: A. Lahm and E. de Rinaldis' patent search, in AWK

Feb 28: Tim Menzies asks this community to write an AWK cookbook.

Feb 28: Arnold Robbins announces a new debugger for GAWK.

Feb 28: Awk100#23: Premysl Janouch offers a IRC bot, In AWK

Feb 28: Updated: the AWK FAQ

Feb 28: Tim Menzies offers a tiny content management system, in Awk.

Jan 31: Comment system added to awk.info. For example, see discussion bottom of ?keys2awk

Jan 31: Martin Cohen shows that Gawk can handle massively long strings (300 million characters).

Jan 31: The AWK FAQ is being updated. For comments/ corrections/ extensions, please mail tim@menzies.us

Jan 31: Martin Cohen finds Awk on the Android platform.

Jan 31: Aleksey Cheusov released a new version of runawk.

Jan 31: Hirofumi Saito contributes a candidate Awk mascot.

Jan 31: Michael Sanders shows how to quickly build an AWK GUI for windows.

Jan 31: Hyung-Hwan Chung offers QSE, an embeddable Awk Interpreter.

[More ...]

Bookmark and Share

categories: Top10,Papers,Misc,WhyAwk,Jan,2009,Ronl

GAWK for AI

by R. Loui

ACM Sigplan Notices, Volume 31, Number 8, August 1996

Most people are surprised when I tell them what language we use in our undergraduate AI programming class. That's understandable. We use GAWK. GAWK, Gnu's version of Aho, Weinberger, and Kernighan's old pattern scanning language isn't even viewed as a programming language by most people. Like PERL and TCL, most prefer to view it as a `scripting language.' It has no objects; it is not functional; it does no built-in logic programming. Their surprise turns to puzzlement when I confide that (a) while the students are allowed to use any language they want; (b) with a single exception, the best work consistently results from those working in GAWK. (footnote: The exception was a PASCAL programmer who is now an NSF graduate fellow getting a Ph.D. in mathematics at Harvard.) Programmers in C, C++, and LISP haven't even been close (we have not seen work in PROLOG or JAVA).

There are some quick answers that have to do with the pragmatics of undergraduate programming. Then there are more instructive answers that might be valuable to those who debate programming paradigms or to those who study the history of AI languages. And there are some deep philosophical answers that expose the nature of reasoning and symbolic AI. I think the answers, especially the last ones, can be even more surprising than the observed effectiveness of GAWK for AI.

First it must be confessed that PERL programmers can cobble together AI projects well, too. Most of GAWK's attractiveness is reproduced in PERL, and the success of PERL forebodes some of the success of GAWK. Both are powerful string-processing languages that allow the programmer to exploit many of the features of a UNIX environment. Both provide powerful constructions for manipulating a wide variety of data in reasonably efficient ways. Both are interpreted, which can reduce development time. Both have short learning curves. The GAWK manual can be consumed in a single lab session and the language can be mastered by the next morning by the average student. GAWK's automatic initialization, implicit coercion, I/O support and lack of pointers forgive many of the mistakes that young programmers are likely to make. Those who have seen C but not mastered it are happy to see that GAWK retains some of the same sensibilities while adding what must be regarded as spoonful of syntactic sugar. Some will argue that PERL has superior functionality, but for quick AI applications, the additional functionality is rarely missed. In fact, PERL's terse syntax is not friendly when regular expressions begin to proliferate and strings contain fragments of HTML, WWW addresses, or shell commands. PERL provides new ways of doing things, but not necessarily ways of doing new things.

In the end, despite minor difference, both PERL and GAWK minimize programmer time. Neither really provides the programmer the setting in which to worry about minimizing run-time.

There are further simple answers. Probably the best is the fact that increasingly, undergraduate AI programming is involving the Web. Oren Etzioni (University of Washington, Seattle) has for a while been arguing that the "softbot" is replacing the mechanical engineers' robot as the most glamorous AI test bed. If the artifact whose behavior needs to be controlled in an intelligent way is the software agent, then a language that is well-suited to controlling the software environment is the appropriate language. That would imply a scripting language. If the robot is KAREL, then the right language is turn left; turn right. If the robot is Netscape, then the right language is something that can generate Netscape -remote 'openURL(http://cs.wustl.edu/~loui) with elan.

Of course, there are deeper answers. Jon Bentley found two pearls in GAWK: its regular expressions and its associative arrays. GAWK asks the programmer to use the file system for data organization and the operating system for debugging tools and subroutine libraries. There is no issue of user-interface. This forces the programmer to return to the question of what the program does, not how it looks. There is no time spent programming a binsort when the data can be shipped to /bin/sort in no time. (footnote: I am reminded of my IBM colleague Ben Grosof's advice for Palo Alto: Don't worry about whether it's highway 101 or 280. Don't worry if you have to head south for an entrance to go north. Just get on the highway as quickly as possible.)

There are some similarities between GAWK and LISP that are illuminating. Both provided a powerful uniform data structure (the associative array implemented as a hash table for GAWK and the S-expression, or list of lists, for LISP). Both were well-supported in their environments (GAWK being a child of UNIX, and LISP being the heart of lisp machines). Both have trivial syntax and find their power in the programmer's willingness to use the simple blocks to build a complex approach.

Deeper still, is the nature of AI programming. AI is about functionality and exploratory programming. It is about bottom-up design and the building of ambitions as greater behaviors can be demonstrated. Woe be to the top-down AI programmer who finds that the bottom-level refinements, `this subroutine parses the sentence,' cannot actually be implemented. Woe be to the programmer who perfects the data structures for that heap sort when the whole approach to the high-level problem needs to be rethought, and the code is sent to the junk heap the next day.

AI programming requires high-level thinking. There have always been a few gifted programmers who can write high-level programs in assembly language. Most however need the ambient abstraction to have a higher floor.

Now for the surprising philosophical answers. First, AI has discovered that brute-force combinatorics, as an approach to generating intelligent behavior, does not often provide the solution. Chess, neural nets, and genetic programming show the limits of brute computation. The alternative is clever program organization. (footnote: One might add that the former are the AI approaches that work, but that is easily dismissed: those are the AI approaches that work in general, precisely because cleverness is problem-specific.) So AI programmers always want to maximize the content of their program, not optimize the efficiency of an approach. They want minds, not insects. Instead of enumerating large search spaces, they define ways of reducing search, ways of bringing different knowledge to the task. A language that maximizes what the programmer can attempt rather than one that provides tremendous control over how to attempt it, will be the AI choice in the end.

Second, inference is merely the expansion of notation. No matter whether the logic that underlies an AI program is fuzzy, probabilistic, deontic, defeasible, or deductive, the logic merely defines how strings can be transformed into other strings. A language that provides the best support for string processing in the end provides the best support for logic, for the exploration of various logics, and for most forms of symbolic processing that AI might choose to call reasoning'' instead of logic.'' The implication is that PROLOG, which saves the AI programmer from having to write a unifier, saves perhaps two dozen lines of GAWK code at the expense of strongly biasing the logic and representational expressiveness of any approach.

I view these last two points as news not only to the programming language community, but also to much of the AI community that has not reflected on the past decade's lessons.

In the puny language, GAWK, which Aho, Weinberger, and Kernighan thought not much more important than grep or sed, I find lessons in AI's trends, Airs history, and the foundations of AI. What I have found not only surprising but also hopeful, is that when I have approached the AI people who still enjoy programming, some of them are not the least bit surprised.


categories: Top10,Misc,Papers,WhyAwk,Apr,2009,Ronl

In Praise of Scripting: Real Programming Pragmatism

by Ronald P. Loui
Associate Professor of CSE
Washington University in St. Louis

(Pre-publication draft; copyright reserved by author. A subsequent version of this document appeared as IEEE Computer, vol. 41, no. 7, July 2008).

    This article's main purpose is to review the changes in programming practices known collectively as the "rise of scripting," as predicted in 1998 IEEE COMPUTER by Ousterhout. This attempts to be both brief and definitive, drawing on many of the essays that have appeared in online forums. The main new idea is that programming language theory needs to move beyond semantics and take language pragmatics more seriously.

To the credit of this journal, it had the courage to publish the signal paper on scripting, John Ousterhout's "Scripting: Higher Level Programming for the 21st Century" in 1998. Today, that document rolls forward with an ever-growing list of positive citations. More importantly, every major observation in that paper seems now to be entrenched in software practice today; every benefit claimed for scripting appears to be genuine (flexibility of typelessness, rapid turnaround of interpretation, higher level semantics, development speed, appropriateness for gluing components and internet programming, ease of learning and increase in amount of casual programming).

Interestingly, IEEE COMPUTER also just printed one of the most canonical attacks on scripting, by one Diomidis Spinellis, 2005, "Java Makes Scripting Languages Irrelevant?" Part of what makes this attack interesting is that the author seems unconvinced of his own title; the paper concludes with more text devoted to praising scripting languages than it expends in its declaration of Java's progress toward improved usability. It is unclear what is a better recommendation for scripting: the durability of Ousterhout's text or the indecisiveness of this recent critic's.

The real shock is that the academic programming language community continues to reject the sea change in programming practices brought about by scripting. Enamored of the object-oriented paradigm, especially in the undergraduate curriculum, unwilling to accept the LAMP (Linux-Apache-MySQL-Perl/Python/Php) tool set, and firmly believing that more programming theory leads to better programming practice, the academics seem blind to the facts on the ground. The ACM flagship, COMMUNICATIONS OF THE ACM for example, has never published a paper recognizing the scripting philosophy, and the references throughout the ACM Digital Library to scripting are not encouraging.

Part of the problem is that scripting has risen in the shadow of object-oriented programming and highly publicized corporate battles between Sun, Netscape, and Microsoft with their competing software practices. Scripting has been appearing language by language, including object-oriented scripting languages now. Another part of the problem is that scripting is only now mature enough to stand up against its legitimate detractors. Today, there are answers to many of the persistent questions about scripting: is there a scripting language appropriate for the teaching of CS1 (the first programming course for majors in the undergraduate computing curriculum)? Is there a scripting language for enterprise or real-time applications? Is there a way for scripting practices to scale to larger software engineering projects?

I intend to review the recent history briefly for those who have not yet joined the debate, then present some of the answers that scripting advocates now give to those nagging questions. Finally, I will describe how a real pragmatism of academic interest in programming languages would have better prepared the academic computing community to see the changes that have been afoot.

1996-1998 are perhaps the most interesting years in the phylogeny of scripting. In those years, perl "held the web together", and together with a new POSIX awk and GNU gawk, was shipping with every new Linux. Meanwhile javascript was being deployed furiously (javascript bearing no important relation to java, having been renamed from "livescript" for purely corporate purposes, apparently a sign of Netscape's solidarity with Sun, and even renamed "jscript" under Microsoft). Also, a handoff from tcl/tk to python was taking place as the language of choice for GUI developers who would not yield to Microsoft's VisualBasic. Php appeared in those years, though it would take another round of development before it would start displacing server-side perl, cold fusion, and asp. Every one of these languages is now considered a classic, even prototypical, scripting language.

Already by mid-decade, the shift from scheme to java as the dominant CS1 language was complete, and the superiority of c++ over c was unquestioned in industry. But java applets were not well supported in browsers, so the appeal of "write once, run everywhere" quickly became derided as "write once, debug everywhere." Web page forms, which used the common gateway interface (cgi) were proliferating, and systems programming languages like c became recognized as overkill for server-side programming. Developers quickly discovered the main advantage of perl for cgi forms processing, especially in the dot-com setting: it minimized the programmer's write-time. What about performance? The algorithms were simple, network latency masked small delays, and database performance was built into the database software. It turned out that the bottleneck was the programming. Even at run-time, the network and disk properties were the problems, not the cpu processing. What about maintenance? The developers and management were both happy to rewrite code for redesigned services rather than deal with legacy code. Scripting, it turns out, was so powerful and programmer-friendly that it was easier to create new scripts from scratch than to modify old programs. What about user interface? After all, by 1990, most of the programming effort had become the writing of the GUI, and the object-oriented paradigm had much of its momentum in the inheritance of interface widget behaviors. Surprisingly, the interface that most programmers needed could be had in a browser. The html/javascript/cgi trio became the GUI, and if more was needed, then ambitious client-side javascript was more reliable than the browser's java virtual machine. Moreover, the server-side program was simply a better way to distribute automation in a heterogeneous internet than the downloadable client-side program, regardless of whether the download was in binary or bytecode.

Although there was not agreement on what exact necessary and sufficient properties characterized scripting and distinguished it from "more serious" programming, several things were clear:

  • scripting permitted rapid development, often regarded as merely "rapid prototyping," but subsequently recognized as a kind of agile programming;
  • scripting was the kind of high-level programming that had always been envisioned, in the ascent from low-level assembly language programming to higher levels of abstraction: it was concise, and it removed programmers from concerning themselves with many performance and memory management details;
  • scripting was well suited to the majority of a programming task, usually the accumulation, extraction, and transformation of data, followed eventually by its presentation, so that only the performance-critical portion of a project had to be written in a more cumbersome, high-performance language;
  • it was easier to get things right when source code was short, when behavior was determined by code that fit on a page, all types were easily coerced into strings for trace-printing, code fragments could be interpreted, identifiers were short, and when the programmer could turn ideas into code quickly without losing focus.

This last point was extremely counterintuitive. Strong typing, naming regimen, and verbosity were motivated mainly by a desire to help the programmer avoid errors. But the programmer who had to generate too many keystrokes and consult too many pages, who had to search through many different files to discover semantics, and who had to follow too many rules, who had to sustain motivation and concentration over a long period of time, was a distracted and consequently inefficient programmer. Just as vast libraries did not deliver the promise of greater reusability, and virtual machines did not deliver the promise of platform-independence, the language's promise to discipline the programmer quite simply did not reduce the tendency of humans to err. It exchanged one kind of frequent error for another.

Scripting languages became the favorite tools of the independent-minded programmers: the "hackers" yes, but also the gifted and genius programmers who tended to drive a project's design and development. As Paul Graham noted (in a column reprinted in "Hackers and Painters" or this), one of the lasting and legitimate benefits of java is that it permits managers to level the playing field and extract considerable productivity from the less talented and less motivated programmers (hence, more disposable). There was a corollary to this difference between the mundane and the liberating:

  • scripting was not enervating but was actually renewing: programmers who viewed code generation as tedious and tiresome in contrast viewed scripting as rewarding self-expression or recreation.

The distinct features of scripting languages that produce these effects are usually enumerated as semantic features, starting with low I/O specification costs, the use of implicit coercion and weak typing, automatic variable initialization with optional declaration, predominant use of associative arrays for storage and regular expressions for pattern matching, reduced syntax, and powerful control structures. But the main reason for the productivity gains may be found in the name "scripting" itself. To script an environment is to be powerfully embedded in that environment. In the same way that the dolphin reigns over the open ocean, lisp is a powerful language for those who would customize their emacs, javascript is feral among browsers, and gawk and perl rule the linux jungle.

There is even a hint of AI in the idea of scripting: the scripting language is the way to get high level control, to automate by capturing the intentions and routines normally provided by the human. If recording and replaying macros is a kind of autopilot, then scripting is a kind of proxy for human decisionmaking. Nowhere is this clearer than in simple server-side php, or in sysadmin shell scripting.

So where do we stand now? While it may have been risky for Ousterhout to proclaim scripting on the rise in 1998, it would be folly to dismiss the success of scripting today. It is even possible that java will yield its position of dominance in the near future. (By the time this essay is printed, LAMP and AJAX might be the new darlings of the tech press; see recent articles in Business Week, this IEEE COMPUTER, and even James Gosling's blog where he concedes he was wanting to write a scripting language when he was handed the java project. Java is very much in full retreat.) Is scripting ready to fill the huge vacuum that would be produced?

I personally believe that CS1 java is the greatest single mistake in the history of computing curricula. I believe this because of the empirical evidence, not because I have an a priori preference (I too voted to shift from scheme to java in our CS1, over a decade ago, so I am complicit in the java debacle). I reported in SIGPLAN 1996 ("Why gawk for AI?") that only the scripting programmers could generate code fast enough to keep up with the demands of the artificial intelligence laboratory class. Even though students were allowed to choose any language they wanted, and many had to unlearn the java ways of doing things in order to benefit from scripting, there were few who could develop ideas into code effectively and rapidly without scripting. In the intervening decade, little has changed. We actually see more scripting, as students are happy to compress images so that they can script their computer vision projects rather than stumble around in c and c++. In fact, students who learn to script early are empowered throughout their college years, especially in the crucial UNIX and web environments. Those who learn only java are stifled by enterprise-sized correctness and the chimerae of just-in-time compilation, swing, JRE, JINI, etc. Young programmers need to practice and produce, and to learn through mistakes why discipline is needed. They need to learn design patterns by solving problems, not by reading interfaces to someone else's black box code. It is imperative that programmers learn to be creative and inventive, and they need programming tools that support code exploration rather than code production.

What scripting language could be used for CS1? My personal preferences are gawk, javascript, php, and asp, mainly because of their very gentle learning curves. I don't think perl would be a disaster; its imperfection would create many teaching moments. But there is emerging consensus in the scripting community that python is the right choice for freshman programming. Ruby would also be a defensible choice. Python and ruby have the enviable properties that almost no one dislikes them, and almost everyone respects them. Both languages support a wide variety of programming styles and paradigms and satisfy practitioners and theoreticians equally. Both languages are carefully enough designed that "correct" programming practices can be demonstrated and high standards of code quality can be enforced. The fact that Google stands by python is an added motivation for undergraduate majors.

But do scripting solutions scale? What about the performance gap when the polynomial, or worse the exponential, algorithm faces large n, and the algorithm is written in an interpreted or weakly compiled language? What about software engineering in the large, on big projects? There has been a lot of discussion about scalability of scripts recently. In the past, debates have simply ended with the concession that large systems would have to be rewritten in c++, or a similar language, once the scripting had served its prototyping duty.

The enterprise question is the easier of the two. Just as the individual programmer reaps benefits from a division of labor among tools, writing most of the code in scripts, and writing all bottleneck code in a highly optimizable language, the group of programmers benefits from the use of multiple paradigms and multiple languages. In a recent large project, we used vhdl for fpga's with a lot of gawk to configure the vhdl. We used python and php to generate dynamic html with svg and javascript for the interfaces. We used c and c++ for high performance communications wrappers, which communicated xml to higher level scripts that managed databases and processes. We saw sysadmin and report-generation in perl, ruby, and gawk, data scrubbing in perl and gawk, user scripting in bash, tcl, and gawk, and prototyping in perl and gawk. Only one module was written in java (because that programmer loved java): it was late, it was slow, it failed, and it was eventually rewritten in c++. In retrospect, neither the high performance components nor the lightweight code components were appropriate for the java language. Does scripting scale to enterprise software? I would not manage a project that did not include a lot of scripting, to minimize the amount of "hard" programming, to increase flexibility and reduce delivery time at all stages, to take testing to a higher level, and to free development resources for components where performance is actually critical. I nearly weep when I think about the text processing that was written in c under my managerial watch, because the programmer did not know perl. We write five hundred line scripts in gawk that would be ten thousand line modules in java or c++. In view of the fact that there are much better scripting tools for most of what gets programmed in java and c++, perhaps the question is whether java and c++ scale.

How about algorithmic complexity? Don't scripting languages take too long to perform nested loops? The answer here is that a cpu-bound tight loop such as a matrix multiplication is indeed faster in a language like c. But such bottlenecks are easy to identify and indeed easy to rewrite in c. True system bottlenecks are things like paging, chasing pointers on disk, process initialization, garbage collection, fragmentation, cache mismanagement, and poor data organization. Often, we see that better data organization was unimplemented because it would have required more code, code that would have been attempted in an "easier" programming language like a scripting language, but which was too difficult to attempt in a "harder" programming language. We saw this in the AI class with heuristic search and computer vision, where brute force is better in c, but complex heuristics are better than brute force, and scripting is better for complex heuristics. When algorithms are exponential, it usually doesn't matter what language you use because most practical n will incur too great a cost. Again, the solution is to write heuristics, and scripting is the top dog in that house. Cpu's are so much faster than disks these days that a single extra disk read can erase the CPU advantage of using compiled c instead of interpreted gawk. In any case, java is hardly the first choice for those who have algorithmic bottlenecks.

The real reason why academics were blindsided by scripting is their lack of practicality. Academic computing was generally late to adopt Wintel architectures, late to embrace cgi programming, and late to accept Linux in the same decade that brought scripting's rise. Academia understandably holds industry at a distance. Still, there is a purely intellectual reason why programming language courses are only now warming to scripting. The historical concerns of programming language theory have been syntax and semantics. Java's amazing contribution to computer science is that it raised so many old-fashioned questions that tickled the talents of existing programming language experts: e.g., how can it be compiled? But there are new questions that can be asked, too, such as what a particular language is well-suited to achieve inexpensively, quickly, or elegantly, especially with the new mix of platforms. The proliferation of scripting languages represents a new age of innovation in programming practice.

Linguists recognize something above syntax and semantics, and they call it "pragmatics". Pragmatics has to do with more abstract social and cognitive functions of language: situations, speakers and hearers, discourse, plans and actions, and performance. We are entering an era of comparative programming language study when the issues are higher-level, social, and cognitive too.

My old friend, Michael Scott, has a popular textbook called PROGRAMMING LANGUAGE PRAGMATICS. But it is a fairly traditional tome concerned with parameter passing, types, and bindings (it's hard to see why it merits "pragmatics" in its title, even as it goes to second edition with a chapter on scripting added!). A real programming pragmatics would ask questions like:

  • how well does each language mate to the other UNIX tools?
  • what is the propensity in each language for programmers at various expertise levels to produce a memory leak?
  • what is the likelihood in each language that unmodified code will still function in five years?
  • what is the demand of a programmer's concentration, what is the load on her short-term memory of ontology, and what is the support for visual metaphor in each language?

There have been programming language "shootouts" and "scriptometers" on the internet that have sought to address some of the questions that are relevant to the choice of scripting language, but they have been just first steps. For example, one site reports on the shortest script in each scripting language that can perform a simple task. But absolute brevity for trivial tasks, such as "print hello world" is not as illuminating as typical brevity for real tasks, such as xml parsing.

Pragmatic questions are not the easiest questions for mathematically-inclined computer scientists to address. They refer by their nature to people, their habits, their sociology, and the technological demands of the day. But it is the importance of such questions that makes programmers choose scripting languages. Ousterhout declared scripting on the rise, but perhaps so too are programming language pragmatics.

Acknowledgements

I have to thank Charlie Comstock for contributing many ideas and references over the past two years that have shaped my views, especially the commitment to the idea of pragmatics.

About the Author

Prof. Dr. Loui and his students are the usual winners of the department programming contest and have contributed to current gnu releases of gawk and malloc. He has lectured on AI for two decades on five continents, taught AI programming for two decades, and is currently funded on a project delivering hardware and software on U.S. government contracts.

References


categories: SysAdmin,Papers,WhyAwk,Apr,2009,HenryS

Awk: A Systems Programming Language?

At the Proceedings of the Winter Usenix Conference (Dallas'91) Henry Spencer wrote in Awk As A Major Systems Programming Language that...

    ...even experienced Unix programmers often don't know awk, or know it but view it as a counterpart of sed: useful "glue" for sticking things together in shell programming, but quite unsuited for major programming tasks. This is a major underestimate of a very powerful tool, and has hampered the development of support software that would make awk much more useful.

    There is no fundamental reason why awk programs have to be small "glue" programs: even the "old" awk is a powerful programming language in its own right. Effective use of its data structures and its stream-oriented structure takes some adjustment for C programmers, but the results can be quite striking.

    On the other hand, getting there can be a bit painful, and improvements in both the language and its support tools would help.

In 2009, Arnold Robbins comments:

    The paper is still interesting, although some bits are outdated (we now have a profiler, for instance).

categories: Papers,Jul,2009,JiirL

Visual Awk

Reference: Visual AWK: A, Model for Text Processing by Demonstration by Jiirgen Landauer and Masahito Hirakawa . 11th International IEEE Symposium on Visual Languages, 1995

Download

Download from LAKWER.

Abstract

Programming by Demonstration (PBD) systems often have problems with control structure injerence and user-intended generalization. We propose a new solution for these weaknesses basred on concepts of AWK and present a prototype system for text processing. It utilizes vertical demonstration, extensive visual feedback, and program visualization via spreadsheets to achieve improved usability and expressive power.

Introduction

In text editing users are often confronted with reformatting tasks which involve large portions of texts, sometimes consisting of hundreds of lines. For example, let us assume we want to create mailing labels out of a given address list. The task seems to be easy to automat since all paragraphs are similarly structured, containing a name, an address, and a phone number e:ach. However, both the built-in find and replace function and the macro recorder of the editor prove to be not flexible enough to handle the task, because their facilities for specifying search patterns and for dealing with special cases and exceptions are limited.

On the other hand, most current end-uslers estimate solving such tasks with one of today's programming languages as too difficult for them. Programming by Demonstration (PBD) is a promising remedy here since, by contrast, it promises nearly unlimited prograrnming power though ease of learning and usage. Therefore, a variety of PBD systems were proposed for this application domain in the past. But PBD is not yet very widespread in commercial text editors because of some serious weaknesses.

This paper examines these weaknesses and present a new approach for the solution of the deficiencies of PBD. We introduce Visual AWK, a prototype text processing system developed at the Information Systems Lab of Hiroshima University based on the programming language AWK which incorporates the new design approach. Extensive visual feedback and program visualization via spreadsheets improve both usability and expressive power.

Visual AWK is aimed at users without previous knowledge in programming, but with ex- perience in text editor use. The application domain are semi-structured texts. That is, texts that consist of equally structured entities, for instance lines or paragraphs, but may contain a few syntactically classifiable sets of exceptions with a different structure.


categories: Papers,Verification,Jul,2009,GerardH

MicroTrace

by Gerard Holzmann

Description

Micro-tracer is a little awk-script for verifying state machines; quite possibly the world's smallest working verifier. Some comments on the working of the script, plus a sample input for the X.21 protocol, are given below.

Reproduce and use freely, at your own risk of course. The micro-tracer was first described in this report:

  • Gerard Holzmann, X.21 Analysis Revisited: the Micro-Tracer, AT&T Bell Laboratories, Technical Memorandum 11271-8710230-12, October 23, 1987. (PDF)

Code

This script was written to show how little code is needed to write a working verifier for safety properties. The hard problem in writing a practical verifier is to make the search efficient, to support a useful logic, and a sensible specification language... (see the Spin homepage.)

$1 == "init"	{	proc[$2] = $3	}
$1 == "inp"	{	move[$2,$3]=move[$2,$3] $1 "/" $4 "/" $5 "/" $6 "/;" }
$1 == "out"	{	move[$2,$3]=move[$2,$3] $1 "/" $4 "/" $5 "/" $6 "/;" }
END		{	verbose=0; for (i in proc) signal[i] = "-"
			run(mkstate(state))
			for (i in space) nstates++;
			print nstates " states, " deadlocks " deadlocks"
		}

function run(state,  i,str,moved)	# 1 parameter, 3 local vars
{
	if (space[state]++) return	# been here before

	level++; moved=0
	for (i in proc)
	{	str = move[i,proc[i]]
		while (str)
		{	v = substr(str, 1, index(str, ";"))
			sub(v, "", str)
			split(v, arr, "/")
			if (arr[1] == "inp" && arr[3] == signal[arr[4]])
			{	Level[level] = i " " proc[i] " -> " v
				proc[i] = arr[2]
				run(mkstate(k))
				unwrap(state); moved=1
			} else if (arr[1] == "out")
			{	Level[level] = i " " proc[i] " -> " v
				proc[i] = arr[2]; signal[arr[4]] = arr[3]
				run(mkstate(k))
				unwrap(state); moved=1
	}	}	}
	if (!moved)
	{	deadlocks++
		print "deadlock " deadlocks ":"
		for (i in proc) print "\t" i, proc[i], signal[i]
		if (verbose)
			for (i = 1; i < level; i++) print i, Level[i]
	}
	level--
}
function mkstate(state, m)
{	state = ""
	for (m in proc) state = state " " proc[m] " " signal[m]
	return state
}
function unwrap(state, m)
{	split(state, arr, " "); nxt=0
	for (m in proc) { proc[m] = arr[++nxt]; signal[m] = arr[++nxt] }
}

The first three lines of the script deal with the input. Data are stored in two arrays. The initial state of machine A is stored in array element proc[A]. The transitions that machine A can make from state s are stored in move[A,s]. All data are stored as strings, and most arrays are also indexed with strings. All valid moves for A in state s, for instance, are concatenated into the same array element move[A,s], and later unwound as needed in function run().

The line starting with END is executed when the end of the input file has been reached and the complete protocol specification has been read. It initializes the signals and calls the symbolic execution routine run().

The program contains three function definitions: run(), mkstate(), and unwrap(). The global system state, state, is represented as a concatenation of strings encoding process and signal states. The function mkstate() creates the composite, and the function unwrap() restores the arrays proc and signal to the contents that correspond to the description in state. (The recursive step in run() alters their contents.) Function run() uses three local variables, but only one real parameter state that is passed by the calling routine.

The analyzer runs by inspecting the possible moves for each process in turn, checking for valid inp or out moves, and performing a complete depth-first search. Any state that has no successors is flagged as a deadlock. A backtrace of transitions leading into a deadlock is maintained in array Level and can be printed when a deadlock is found.

The first line in run() is a complete state space handler. The composite state is used to index a large array space. If the array element was indexed before it returns a count larger than zero: the state was analyzed before, and the search can be truncated.

After the analysis completes, the contents of array space is available for other types of probing. In this case, the micro tracer just counts the number of states and prints it as a statistic, together with the number of deadlocks found.

A Sample Application -- X21

The transition rules are based on the classic two-process model for the call establishment phase of CCITT Recommendation X.21. Interface signal pairs T, C and R, I are combined. Each possible combination of values on these line pairs is represented by a distinct lower-case ASCII character below. Note that since the lines are modeled as true signals, the receiving process can indeed miss signals if the sending process changes them rapidly and does not wait for the peer process to respond.

Transition rules for the `dte' process.

inp dte state01 state08 u dte
inp dte state01 state18 m dte
inp dte state02 state03 v dte
inp dte state02 state15 u dte
inp dte state02 state19 m dte
inp dte state04 state19 m dte
inp dte state05 state19 m dte
inp dte state05 state6A r dte
inp dte state07 state19 m dte
inp dte state07 state6B r dte
inp dte state08 state19 m dte
inp dte state09 state10B q dte
inp dte state09 state19 m dte
inp dte state10 state19 m dte
inp dte state10 state6C r dte
inp dte state10B state19 m dte
inp dte state10B state6C r dte
inp dte state11 state12 n dte
inp dte state11 state19 m dte
inp dte state12 state19 m dte
inp dte state14 state19 m dte
inp dte state15 state03 v dte
inp dte state15 state19 m dte
inp dte state16 state17 m dte
inp dte state17 state21 l dte
inp dte state18 state01 l dte
inp dte state18 state19 m dte
inp dte state20 state21 l dte
inp dte state6A state07 q dte
inp dte state6A state19 m dte
inp dte state6B state07 q dte
inp dte state6B state10 q dte
inp dte state6B state19 m dte
inp dte state6C state11 l dte
inp dte state6C state19 m dte
out dte state01 state02 d dce
out dte state01 state14 i dce
out dte state01 state21 b dce
out dte state02 state16 b dce
out dte state03 state04 e dce
out dte state04 state05 c dce
out dte state04 state16 b dce
out dte state05 state16 b dce
out dte state07 state16 b dce
out dte state08 state09 c dce
out dte state08 state15 d dce
out dte state08 state16 b dce
out dte state09 state16 b dce
out dte state10 state16 b dce
out dte state10B state16 b dce
out dte state11 state16 b dce
out dte state12 state16 b dce
out dte state14 state01 a dce
out dte state14 state16 b dce
out dte state15 state16 b dce
out dte state18 state16 b dce
out dte state19 state20 b dce
out dte state21 state01 a dce
out dte state6A state16 b dce
out dte state6B state16 b dce
out dte state6C state16 b dce

Transition rules for the `dce' process.

inp dce state01 state02 d dce
inp dce state01 state14 i dce
inp dce state01 state21 b dce
inp dce state02 state16 b dce
inp dce state03 state04 e dce
inp dce state04 state05 c dce
inp dce state04 state16 b dce
inp dce state05 state16 b dce
inp dce state07 state16 b dce
inp dce state08 state09 c dce
inp dce state08 state15 d dce
inp dce state08 state16 b dce
inp dce state09 state16 b dce
inp dce state10 state16 b dce
inp dce state10B state16 b dce
inp dce state11 state16 b dce
inp dce state12 state16 b dce
inp dce state14 state01 a dce
inp dce state14 state16 b dce
inp dce state15 state16 b dce
inp dce state18 state16 b dce
inp dce state19 state20 b dce
inp dce state21 state01 a dce
inp dce state6A state16 b dce
inp dce state6B state16 b dce
inp dce state6C state16 b dce
out dce state01 state08 u dte
out dce state01 state18 m dte
out dce state02 state03 v dte
out dce state02 state15 u dte
out dce state02 state19 m dte
out dce state04 state19 m dte
out dce state05 state19 m dte
out dce state05 state6A r dte
out dce state07 state19 m dte
out dce state07 state6B r dte
out dce state08 state19 m dte
out dce state09 state10B q dte
out dce state09 state19 m dte
out dce state10 state19 m dte
out dce state10 state6C r dte
out dce state10B state19 m dte
out dce state10B state6C r dte
out dce state11 state12 n dte
out dce state11 state19 m dte
out dce state12 state19 m dte
out dce state14 state19 m dte
out dce state15 state03 v dte
out dce state15 state19 m dte
out dce state16 state17 m dte
out dce state17 state21 l dte
out dce state18 state01 l dte
out dce state18 state19 m dte
out dce state20 state21 l dte
out dce state6A state07 q dte
out dce state6A state19 m dte
out dce state6B state07 q dte
out dce state6B state10 q dte
out dce state6B state19 m dte
out dce state6C state11 l dte
out dce state6C state19 m dte

Initialization

init dte state01
init dce state01

Error Listings (verbose mode)

The error listings give with each step number, the name of the executing machine followed by its state and an arrow. Behind the arrow is the transition rule: inp or out, the new state, the required or provided signal value, and the signal name.

deadlock 1:
	dce state21 b
	dte state16 l
1 dce state01 -> out/state08/u/dte/;
2 dce state08 -> out/state19/m/dte/;
3 dte state01 -> inp/state18/m/dte/;
4 dte state18 -> inp/state19/m/dte/;
5 dte state19 -> out/state20/b/dce/;
6 dce state19 -> inp/state20/b/dce/;
7 dce state20 -> out/state21/l/dte/;
8 dte state20 -> inp/state21/l/dte/;
9 dte state21 -> out/state01/a/dce/;
10 dce state21 -> inp/state01/a/dce/;
11 dce state01 -> out/state08/u/dte/;
12 dce state08 -> out/state19/m/dte/;
13 dte state01 -> inp/state18/m/dte/;
14 dte state18 -> out/state16/b/dce/;
15 dce state19 -> inp/state20/b/dce/;
16 dce state20 -> out/state21/l/dte/;
deadlock 2:
	dce state03 b
	dte state16 v
1 dce state01 -> out/state08/u/dte/;
2 dce state08 -> out/state19/m/dte/;
3 dte state01 -> inp/state18/m/dte/;
4 dte state18 -> inp/state19/m/dte/;
5 dte state19 -> out/state20/b/dce/;
6 dce state19 -> inp/state20/b/dce/;
7 dce state20 -> out/state21/l/dte/;
8 dte state20 -> inp/state21/l/dte/;
9 dte state21 -> out/state01/a/dce/;
10 dce state21 -> inp/state01/a/dce/;
11 dce state01 -> out/state08/u/dte/;
12 dce state08 -> out/state19/m/dte/;
13 dte state01 -> out/state21/b/dce/;
14 dce state19 -> inp/state20/b/dce/;
15 dte state21 -> out/state01/a/dce/;
16 dte state01 -> inp/state18/m/dte/;
17 dce state20 -> out/state21/l/dte/;
18 dce state21 -> inp/state01/a/dce/;
19 dce state01 -> out/state18/m/dte/;
20 dte state18 -> inp/state19/m/dte/;
21 dce state18 -> out/state01/l/dte/;
22 dte state19 -> out/state20/b/dce/;
23 dte state20 -> inp/state21/l/dte/;
24 dce state01 -> out/state08/u/dte/;
25 dce state08 -> inp/state16/b/dce/;
26 dte state21 -> out/state01/a/dce/;
27 dte state01 -> inp/state08/u/dte/;
28 dce state16 -> out/state17/m/dte/;
29 dce state17 -> out/state21/l/dte/;
30 dce state21 -> inp/state01/a/dce/;
31 dce state01 -> out/state08/u/dte/;
32 dte state08 -> out/state15/d/dce/;
33 dce state08 -> inp/state15/d/dce/;
34 dce state15 -> out/state03/v/dte/;
35 dte state15 -> inp/state03/v/dte/;
36 dte state03 -> out/state04/e/dce/;
37 dte state04 -> out/state05/c/dce/;
38 dte state05 -> out/state16/b/dce/;
deadlock 3:
	dce state03 b
	dte state20 v
1 dce state01 -> out/state08/u/dte/;
2 dce state08 -> out/state19/m/dte/;
3 dte state01 -> inp/state18/m/dte/;
4 dte state18 -> inp/state19/m/dte/;
5 dte state19 -> out/state20/b/dce/;
6 dce state19 -> inp/state20/b/dce/;
7 dce state20 -> out/state21/l/dte/;
8 dte state20 -> inp/state21/l/dte/;
9 dte state21 -> out/state01/a/dce/;
10 dce state21 -> inp/state01/a/dce/;
11 dce state01 -> out/state08/u/dte/;
12 dce state08 -> out/state19/m/dte/;
13 dte state01 -> out/state21/b/dce/;
14 dce state19 -> inp/state20/b/dce/;
15 dte state21 -> out/state01/a/dce/;
16 dte state01 -> inp/state18/m/dte/;
17 dce state20 -> out/state21/l/dte/;
18 dce state21 -> inp/state01/a/dce/;
19 dce state01 -> out/state18/m/dte/;
20 dte state18 -> inp/state19/m/dte/;
21 dce state18 -> out/state01/l/dte/;
22 dte state19 -> out/state20/b/dce/;
23 dte state20 -> inp/state21/l/dte/;
24 dce state01 -> out/state08/u/dte/;
25 dce state08 -> inp/state16/b/dce/;
26 dte state21 -> out/state01/a/dce/;
27 dte state01 -> inp/state08/u/dte/;
28 dce state16 -> out/state17/m/dte/;
29 dce state17 -> out/state21/l/dte/;
30 dce state21 -> inp/state01/a/dce/;
31 dce state01 -> out/state18/m/dte/;
32 dte state08 -> out/state15/d/dce/;
33 dte state15 -> inp/state19/m/dte/;
34 dce state18 -> out/state01/l/dte/;
35 dce state01 -> inp/state02/d/dce/;
36 dce state02 -> out/state03/v/dte/;
37 dte state19 -> out/state20/b/dce/;
deadlock 4:
	dce state21 b
	dte state16 -
1 dte state01 -> out/state02/d/dce/;
2 dte state02 -> out/state16/b/dce/;
3 dce state01 -> inp/state21/b/dce/;
307 states, 4 deadlocks

categories: Papers,Verification,Jul,2009,MikhailA

An AWK Debugger and Assertion Checker

From "AUI - the Debugger and Assertion Checker for the Awk Programming Language" by Mikhail Auguston, Subhankar Banerjee, Manish Mamnani, Ghulam Nabi, Juris Reinfelds, Ugis Sarkans, and Ivan Strnad . Proceedings of the 1996 International Conference on Software Engineering: Education and Practice (SE:EP '96)

Download from LAWKER.

Abstract

This paper describes the design of Awk User Interface (AUI). AUI is a graphical programming environment for editing, running, testing and debugging of Awk programs. The AUI environment supports tracing of Awk programs, setting breakpoints, and inspection of variable values.

An assertion language to describe relationship between input and output of Awk program is provided. Assertions can be checked after the program run, and if violated, informative and readable messages can be generated. The assertions and debugging rules for the Awk program are written in a separate text file. Assertions are useful not only for testing and debugging but can be considered as a mean for program formal specification and documentation.

Example

The input file contains a list of all states of U.S.A. There are 50 records separated by newlines, one for each of the states. The number of fields in a record is variable. The first field is the name of the state, and the subsequent fields are names of neighbor states. Fields are separated by tabs. For example, the first two records in the database are

Alabama Mississippi Tennessee Georgia Florida 
Alaska 

The task is to color the U.S.A. map in such a way that any two neighboring states are in different colors. We will do it in a greedy manner (without backtracking), assigning to every state the ?rst possible color. The Awk program for this task is the following:

# Greedy map coloring 
BEGIN { FS= "\t"; OFS= "\t" # fields separated by tabs 
		color[0]= "yellow"  # color names 
		color[1]= "blue" 
		color[2]= "red" 
		color[3]= "green" 
		color[4]= "black" 
} 
{ 		i=0 
		while (a[$1,i] ) i++ # find first acceptable color for 
		                     # state $1 
		print $1"\t" color[i] # assign that color 
		for (j=2; j<=NF; j++) a[$j,i]=1	# make that color 
                                            # unacceptable for 
                                            # states $2..$NF 
} 

We can check the correctness of the coloring using the following assertion:

/* Checks the correctness of map coloring - any two neighbor
   states should be colored in different colors */
	FOREACH r1: RECORD FROM FILE input 
		(EXISTS r2: RECORD FROM FILE output 
			(r1.$1 == r2.$1 AND 
 			FOREACH i IN 2..FIELD_NUM(r1) 
				(EXISTS r3: RECORD FROM FILE output 
					(r3.$1 == r1.$i ANDr3.$2!=r2.$2) 
				) 
			) 
		)		 
SAY "Map colored correctly" 
ONFAIL  SAY r1.$1 "and" r1.$i "are of the same color" 
        SAY "although they are neighboring states" 

categories: Papers,Verification,Jul,2009,BalkhisB

Automated Result Verification with Awk

Source

From B.A. Bakar, T. Janowski, Automated Result Verification with AWK iceccs, pp.0188, Sixth IEEE International Conference on Complex Computer Systems (ICECCS'00), 2000

Download

Download from LAWKER.

Abstract

The goal of result-verification is to prove that one execution run of a program satisfies its specification. Compared with implementation-verification, result-verification has a larger scope for applications in practice, gives more opportunities for automation and, based on the execution record not the implementation, is particularly suitable for complex systems.

This paper proposes a technical framework to apply this technique in practice. We show how to write formal result-based specifications, how to generate a verifier program to check a given specification and to carry out result-verification according to the generated program.

The execution result is written as a text file, the verifier is written in AWK (special-purpose language for text processing) and verification is done automatically by the AWK interpreter; given the verifier and the execution result as inputs.

In this paper...

In this paper we propose a technical framework to carry out automated result-verification in practice. Its main features are:
  • The execution result is a simple text file. Many programs produce such (log) files during their normal operations, for administrative purposes. A general technique to record exactly the information needed for verification, is to introduce a program wrapper.
  • The execution result is given as input to the verifier program, which does the actual verification. Given the execution result in a text file, we consider result-verification as the text-processing task. Accordingly, the verifier is written in AWK, which is a special-purpose language for text processing, implemented for most computing platforms. Verification is done by the AWK interpreter, given the execution result and the verifier program as inputs.

categories: Top10,Awk100,Papers,Os,Apr,2009,YungC

Awk-Linux

Awk-Linux Educational Operating Systems

Purpose

Teaching operating systems.

Developers

Yung-Pin Cheng

Email

ypc@csie.ntnu.edu.tw

Organization

Software Engineering Lab. Department of Computer Science and Information Engineering National Taiwan Normal University

Country

TAIWAN

Domain

Educators of Operating Systems

Description

Most well-known instructional operating systems are complex, particularly if their companion software is taken into account. It takes considerable time and effort to craft these systems, and their complexity may introduce maintenance and evolution problems. In this project, a courseware called Awk-Linux is proposed. The basic hardware functions provided by Awk-Linux include timer interrupt and page-fault interrupt, which are simulated through program instrumentation over user programs.

A major advantange of the use of Awk for this tool is platform independence. Awk-Linux can be crafted relatively more easily and it does not depend on any hardware simulator or platform. Stable Awk versions run on many platforms so this tool can be readily and easily ported to other machines. The same can not be said for other, more complex operating systems courseware that may be much harder to port to new environments.

In practice, using Awk-Linux is very simple for the instructor and students:

  • Course projects based on Awk-Linux provides source code extracted and simplified from a Linux kernel.
  • Results of our study indicate that the projects helped students better to understand inner workings of operating systems.

Awk

Gawk under cygwin or Linux

Platform

Windows (CYGWIN required) or Linux

Uses

C programming language

Current

Status 3 (Released)

Use

3(Free/public domain)

DateDeployed

2004

References

Yung-Pin Cheng, Janet Mei-Chuen Lin, Awk-Linux: A Lightweight Operating Systems Courseware IEEE Transactions on Education, vol. 51, issue 4, pp. 461-467, 2008.

Url

www.csie.ntnu.edu.tw/~ypc/awklinux.htm


categories: Papers,Os,Apr,2009,SallyF

Simulations for Equation-Based Congestion Control for Unicast Applications

(Editor's Note: This page is a mirror of the original web site. It describes a collection of shell/awk/tcl scripts used for modeling complex domains. This code illustrates how language choice is not a matter of "awk" vs "X". Rather, systems can be a menagerie of different languages, including Awk.)

Description

This page has pointers to the simulation scripts for the Equation-Based Congestion Control for Unicast Applications by Sally Floyd, Mark Handley, Jitendra Padhye, and Joerg Widmer, May 2000, SIGCOMM 2000.

Download

These simulation scripts are also available from in LAWKER.

To test the code:

  • Unpack this zip file:
    contents.zip
    cd contents
    

    To use these scripts, you must go the following:

    gcc bwcnt2.c -o bwcnt2
    gcc bwcnt2a.c -o bwcnt2a
    

    Then, put a copy of "ns" in the current directory, for example:

    ln -s ~/vint/ns-2/ns ns
    

    To run the tests:

    ./single.com
    ./tfrm12.com
    ./queue2.com
    ./increase.com
    ./reduce.com
    ./reduce1.com
    

Details

These scripts are quick amalgams of shell scripts, awk, tcl, and whatever else was handy at the time, so they are not intended as an example of good programming style. They are run in a directory with a "graphs" subdirectory for saved output and *.mf files (gnuplot command files), and an "awk" subdirectory for awk files. Some of these scripts use supporting *.awk files that are available in the awk directory, but are not listed separately below. Some of the scripts (tfrm12.run) also use "bwcnt" C programs for processing output data; the C code for these is in the scripts directory. Possibly one day we will clean this all up to reduce the proliferation of scripts and languages involved.

The implementation of TFRC in the NS simulator is still occasionally being modified, so the precise results of simulations can change with different versions of NS.

Some of these simulations must be run with SBSIZE in scoreboard.h set to 10000 instead of to 1024, to allow larger TCP congestion windows.

From Scripts to Figures

The simulation for Figure 2 on "Illustration of the Average Loss Interval" can be run with "contents/single.com", with supporting files "contents/single.run", "contents/single.tcl", and "contents/queueSize.tcl". Generating the postscript file also uses the following files:
"contents/graphs/s0.interval.mf", "contents/graphs/s0.loss.mf", and "contents/graphs/s0.rate.mf".

The simulations for Figure 5 on "TCP flow sending rate" can be run with "contents/tfrm-full.CA.DropTail.run", "contents/tfrm-full.CA.RED.run" with supporting files "contents/tfrm-full.CA.tcl", "contents/queueSize.tcl", "contents/getmean-full.tcl". These scripts will produce data files called

graphs/s-full-RED.CA.tcpmean
graphs/s-full-DropTail.CA.tcpmean
There are three values for each data point (from three runs) in these output files. To merge them, use "contents/merge2.tcl":
merge2.tcl graphs/graphs/s-full-RED.CA.tcpmean > graphs/s-full-RED.CA.tcp
merge2.tcl graphs/graphs/s-full-DropTail.CA.tcpmean > graphs/s-full-DropTail.CA.tcp
Unfortunately, we no longer have the *.mf gnuplot script for generating the postscript from "s-full-RED.CA.tcp" and "s-full-DropTail.CA.tcp". BTW, on a 450MHz Xeon, each graph takes about 7 hours to generate

The simulations for Figure 6 on can be run with "contents/tfrm12.com", with supporting files "contents/tfrm12.run", "contents/tfrm12.tcl", "contents/awk/plotdrops.awk" and "contents/queueSize.tcl". The supporting programs "bwcnt2" and "bwcnt2a" for processing the output data are compiled from "contents/bwcnt2.c" and "contents/bwcnt2a.c". FYI: On Sally's computer, this simulation set took 13 minutes. The following supporting files were also required for generating the postscript file "contents/tfrm12.run1", "contents/graphs/getmean.tcl", "contents/graphs/s0.12.mf", "contents/graphs/s0.loss3.mf".

The simulations for Figure 7 on "Coefficient of variation of throughput between flows" can be run with "contents/tfrmvar.run" with supporting files "contents/tfrmvar.tcl", "contents/queueSize.tcl", and "contents/graphs/getvar.tcl". The scripts "contents/fixcov.tcl" combines the many output files together, and gnuplot requires "contents/graphs/s3xxx.mf" to generate the postscript.

When we have collected the scripts for Figure 8, we will put them on-line.

The simulations for Figures 9 and 10 can be run with the script "contents/long/doit". The supporting scripts are in the tar file. The simulation takes perhaps one hour.

The simulations for Figures 11-13 can be run with the script "contents/short/doit". The simulation takes up to three days.

The simulations for Figure 14 on 40 long-lived flows can be run with "contents/queue2.com", with supporting files "contents/queue.run", "contents/queue.tcl", "contents/queueSize.tcl", "contents/tracequeue.tcl", awk/"contents/awk/plotaveq.awk", and awk/"contents/awk/plotqueue.awk". Generating the postscript file also uses the following file: "contents/graphs/s0.queue.mf".

Figures 15-18 are from experiments.

The simulations for Figure 19 on "A TFRC flow with an end to congestion" can be run with "contents/increase.com", with supporting files "contents/increase.run", "contents/increase.tcl", "contents/queueSize.tcl", "contents/awk/increase.awk", and graphs/"scriptsTR/graphs/s0.packetrate.mf".

The simulations for Figure 20 on "A TFRC flow with persistent congestion" can be run with "contents/reduce.com", with supporting files "contents/reduce.run", "contents/reduce.tcl", "contents/queueSize.tcl", "contents/awk/reduce.awk", and "contents/awk/reduce1.awk". Generating the postscript file also uses the following file: "contents/graphs/s0.rate1.mf".

The simulations for Figure 21 on "Number of round-trip times to reduce the sending rate" can be run with "contents/reduce1.com", with supporting files "contents/reduce1.run", "contents/reduce.tcl", "contents/queueSize.tcl", "contents/awk/reduce1.awk", and "contents/awk/reduce2.awk". Generating the postscript file also uses the following file: graphs/"contents/graphs/s0.half.mf".


categories: Papers,May,2009,JonB

Template-Driven Interfaces for Numerical Subroutines

Jon L. Bentley, Mary F. Fernandez, Brian W. Kernighan, and Norman L. Schryer, ACM Transactions on Mathematical Software, Vol. 19, No. 3, September 1993, Pages 265-287

This paper describes a set of interfaces for numerical subroutines. Typing a short (often one-line) description allows one to solve problems in application domains including least-squares data fitting, differential equations, minimization, root finding, and integration. Our approach of "template-driven programming" makes it easy to build such an interface: a simple one takes a few hours to construct, while a few days suffice to build the most complex program we describe.

It is straightforward to implement this approach on many systems. We have tailored our implementation to our computing environment: our numerical routines are from the Port library, we call the routines from Fortran programs, and our interfaces are implemented in Awk.

An appendix to the paper describes "L2fit". This program performs only the least-squares regression to calculate the parameters; it does not prepare the graphical summary. It is implemented as a 50-line Awk program and a 40-line Fortran template. The complete L2fit is a 330-line Awk program that uses a 45-line Fortran template; it also uses a 60-line Troff and Grap template to produce the output.

Download pdf.


categories: Papers,Os,Apr,2009,KimD

Intrusion Alert Normalization with Awk

From Intrusion Alert Normalization method using AWK scripts and attack name database. Dongyoung Kim, HyoChan Bang, Jung-Chan Na, Advanced Communication Technology, 2005, ICACT 2005. The 7th International Conference on Publication Date: 21-23 Feb. 2005 Volume: 1, On page(s): 608- 611 Vol. 1

The current several classes of intrusion alert have various formats and semantics. And it is transferred using a variety of protocols. The protocols that transfer intrusion alert are IDXP, SNMP trap, SYSLOG protocol, etc. These varieties of intrusion alert formats make it difticult to use that together. Intrusion alert normalization makes various intrusion alert to same structure data and same semantics. We need this normalition process to unify alerts from a variety of security equipments. This paper describes how to normalize alerts from several IDS and security equipments.

blog comments powered by Disqus