About awk.info
» table of contents
» featured topics
» page tags
|
|
|
|
|
|
Mar 01: Michael Sanders demos an X-windows GUI for AWK.
Mar 01: Awk100#24: A. Lahm and E. de Rinaldis' patent search, in AWK
Feb 28: Tim Menzies asks this community to write an AWK cookbook.
Feb 28: Arnold Robbins announces a new debugger for GAWK.
Feb 28: Awk100#23: Premysl Janouch offers a IRC bot, In AWK
Feb 28: Updated: the AWK FAQ
Feb 28: Tim Menzies offers a tiny content management system, in Awk.
Jan 31: Comment system added to awk.info. For example, see discussion bottom of ?keys2awk
Jan 31: Martin Cohen shows that Gawk can handle massively long strings (300 million characters).
Jan 31: The AWK FAQ is being updated. For comments/ corrections/ extensions, please mail tim@menzies.us
Jan 31: Martin Cohen finds Awk on the Android platform.
Jan 31: Aleksey Cheusov released a new version of runawk.
Jan 31: Hirofumi Saito contributes a candidate Awk mascot.
Jan 31: Michael Sanders shows how to quickly build an AWK GUI for windows.
Jan 31: Hyung-Hwan Chung offers QSE, an embeddable Awk Interpreter.
RUNAWK is a small wrapper for the AWK interpreter that helps one write standalone AWK scripts. Its main feature is to provide a module/library system for AWK which is somewhat similar to Perl's "use" command. It also allows you to select a preferred AWK interpreter and to setup the environment for your scripts. RUNAWK makes programming AWK easy and efficient. RUNAWK also provides many useful AWK modules.
Version 0.17.0, by Aleksey Cheusov, Sat, 12 Sep 2009
runawk:
runawk -f abs.awk -e 'BEGIN {print abs(-123); exit}'
alt_getopt.awk and power_getopt.awk:
power_getopt.awk:
New modules:
Panos Papadopoulos offers the latest entry in our Awk mascot competition:
Scary, yes?
(Editor's note: On Nov 30'09, Hermann Peifer found and fixed bug in an older version of the test code at the end of this file.)
Writing in comp.lang.awk, Ed Morton reveals the secret WHINY_USERS flag.
"Nag" asked:
Hi,
I am creating a file like...
awk '{
....
...
..
printf"%4s %4s\n",$1,$2 > "file1"
}' input
How can I sort file1 within awk code?
Ed Morton writes:
$ cat file
2
1
4
3
$ gawk '{a[$0]}END{for (i in a) print i}' file
4
1
2
3
$ WHINY_USERS=1 gawk '{a[$0]}END{for (i in a) print i}' file
1
2
3
4
Your editor coded up the following test for the runtime costs of WHINY_USERS. The following code is called twice (once with, and once without setting WHINY_USERS):
runWhin() {
WHINY_USERS=1 gawk -v M=1000000 --source '
BEGIN {
M = M ? M : 50
N = M
print N
while(N-- > 0) {
key = rand()" "rand()" "rand()" "rand()" "rand()
A[key] = M - N
}
for(i in A)
N++
}'
}
runNoWhin() {
gawk -v M=1000000 --source '
BEGIN {
M = M ? M : 50
N = M
print N
while(N-- > 0) {
key = rand()" "rand()" "rand()" "rand()" "rand()
A[key] = M - N
}
for(i in A)
N++
}'
}
time runWhin
time runNoWhin
And the results? Sorted added 15% to runtimes:
% bash whiny.sh 1000000 real 0m18.897s user 0m15.826s sys 0m2.445s 1000000 real 0m16.345s user 0m13.469s sys 0m2.435s
That afternoon, I wrote a gawk script that widens the lines in a 256 color BMP version of the image - I can convert it back to a transparent background GIF later.
That script was presented in awk.info July 30, 2009. is an updated and extended version
The script widens lines in .bmp files to make them more visible when converted to TV video images. For the complete conversion, it is also necessary to mung the line colors to get rid of interpolated colors and togive some lines more contrast, but that is done elsewhere.
This functions converts byte strings (binary numbers) into their corresponding numeric strings so that they can be processed as gawk numbers. The lookup table (CharString) is a global variable. This code assumes that binary numbers are big-endian (most significant byte first) - it is up to the calling program to order the bytes.
On the first use, the (global) LUT is created, then left for later use. It consists of a list of characters from \000 to \777 in order - the (index value minus 1) of a character multiplied by the power of 256 corresponding to its position in the string is the byte's numerical weight. The function doesn't care about the length of the byte string (within the integer limits of the gawk version and port).
function Bytes2Number( String, x, y, z, Number ) {
if( !CharString ) {
for( x = 0; x <= 255; x++ ) CharString = CharString sprintf( "%c", x )
}
x = split( String, Scratch, "" )
Number = 0
for( y = 1; y <= x; y++ ) {
z = index( CharString, Scratch[ y ] ) -1
Number = Number + z * (256^(x - y))
}
return Number # Note that Number is a regular gawk scalar variable.
}
Uses a brute force approach to factor the image size into width and height numbers that actually match the real image size. It searches around the nominal values for a pair of numbers that, when multiplied together, produce the known size of the image in pixels.
function RealSize( Wide, High, Pixels, x, y ) {
for( x = Wide - 5; x <= Wide +5; x++ ) {
for( y = High - 5; y <= High + 5; y++ ) {
if( x * y == Pixels ) {
Width = x
Height = y
}
}
}
}
It is necessary to tell gawk to read/write the file as binary, especially under Windows where ^Z in files is a killer. Setting BINMODE to 3 will also work, but it throws error messages.
Setting FS to null causes gawk to make each byte a separate field.
Testing indicates that, in Windows at least, it is necessary to specify RS, even though it would appear redundant to set it to \n - not doing so results in 0A0D being replaced with 0A in the output, with the loss of one byte for each occurance. The value is arbitrary - it has been tested using one of the line colors.
BEGIN{
BINMODE = "rw"
FS= ""
# The next two lines are not strictly necessary-
# there are here for clarity.
Header = ""
ByteCount = 0
RS = "\n"
}
Read the file into an array. If there are multiple lines, that is, if RS appears in the file, insert the record separator back into the array at the end of each line for which RT exists.
{
for( x = 1; x <= NF; x++ ) Bytes[ ++ByteCount ] = $(x)
if( RT ) { Bytes[ ++ByteCount ] = RT }
}
Closing FILENAME here allows overwriting the original file - if that is desired, comment out the next line (which creates a new filename for the output).
Regarding image parameters: Width and Height are in pixels; Depth is the number of bytes per pixel; Data is the zero based index of the actual image in the file; Size refers to the bytes in the file, not the image; ImgSize is the number of pixels in the image. Unfortunately, Width and Height may be wrong: RealSize() calculates the actual values as found from the data block.
Once the image parameters are set, the two arrays for the image can be built: one to contain an unmodified copy (A) and one to contain a copy to be modified (B). These arrays are indexed by line and dots (Height, Width); data are complete pixels. The C array is used to determine the background color: it uses the pixel data as indexes and the count of the number of copies of that pixel as values - the largest value represents the most common color, and assuming that the image is mostly background, therefore the background color. This assumption will be true for almost all line art.
When performing line widening: for each pixel that is not part of
the background, copy its color to the four surrounding pixels, provided that
they are background. This approach prevents one line from encroaching on another,
but does not prevent the ends of lines that do not intersect other lines from
growing by one pixel on each pass through the program for each free end.
u, v, w, and z (z has been reused) are the coordinates of the four pixels
surrounding the one in work (defined by x and y).
END{
if( !OutFile ) OutFile = FILENAME
close( FILENAME )
sub( /[bB][mM][pP]$/, "widened.bmp" Arr[1], OutFile )
Width = Bytes2Number( Bytes[ 22 ] Bytes[ 21 ] Bytes[ 20 ] Bytes[ 19 ] )
Height = Bytes2Number( Bytes[ 26 ] Bytes[ 25 ] Bytes[ 24 ] Bytes[ 23 ] )
Data = Bytes2Number( Bytes[ 14 ] Bytes[ 13 ] Bytes[ 12 ] Bytes[ 11 ] )
Size = Bytes2Number( Bytes[ 6 ] Bytes[ 5 ] Bytes[ 4 ] Bytes[ 3 ] )
Depth = Bytes2Number( Bytes[ 30 ] Bytes[ 29 ] ) / 8
ImgSize = Bytes2Number( Bytes[ 38 ] Bytes[ 37 ] Bytes[ 36 ] Bytes[ 35 ] )
RealSize( Width, Height, ImgSize / Depth )
# Output the header in its original form to the target file.
for( x = 1; x <= Data; x++ ) Header = Header Bytes[ x ]
printf( "%s", Header ) > OutFile
# Build the two arrays
for( x = 1; x <= Height; x++) {
for( y = 1; y <= Width; y++ ) {
S = ""
# Values for the A & B array entries are strings of
# bytes representing the color of the pixel, either directly or
# as a pointer into a palette.
for( z = 1; z <= Depth; z++ ) S = S Bytes[ ++Data ]
A[x,y] = S
B[x,y] = S
C[ S ]++
}
}
z = 0
# Bkg is the (assumed) background color.
# The code is a simple maximum value loop.
for( x in C ) {
y = C[x]
if( y > z ) {
Bkg = x
z = y
}
}
# Begin the actual line widenning code.
for( x = 1; x <= Height; x++) {
for( y = 1; y <= Width; y++ ) {
if( A[x,y] !~ Bkg ) {
u = x + 1
v = x - 1
w = y + 1
z = y - 1
if( B[u,y] ~ Bkg ) B[u,y] = A[x,y]
if( B[v,y] ~ Bkg ) B[v,y] = A[x,y]
if( B[x,w] ~ Bkg ) B[x,w] = A[x,y]
if( B[x,z] ~ Bkg ) B[x,z] = A[x,y]
if( B[u,w] ~ Bkg ) B[u,w] = A[x,y]
if( B[u,z] ~ Bkg ) B[u,z] = A[x,y]
if( B[v,w] ~ Bkg ) B[v,w] = A[x,y]
if( B[v,z] ~ Bkg ) B[v,z] = A[x,y]
}
}
}
for( x = 1; x <= Height; x++) {
for( y = 1; y <= Width; y++ ) {
printf( "%s", B[x,y] ) > OutFile
}
}
}
Note the final nested for loops in the above code. After the B array has been modified, the target file can be completed by reading that array out to the file pixel by pixel. The array cannot be output during processing because pixels that have already been through the processor can still be changed.
Ted Davis tdavis@mst.edu.
Here is some Awk code from the Rosetta Code wiki hat multiplyes integers using only addition, doubling, and halving.
For example: 17 X 34
17 34
Halving the first column:
17 34
8
4
2
1
Doubling the second column:
17 34
8 68
4 136
2 272
1 544
Strike-out rows whose first cell is even:
17 34
8 --
4 ---
2 ---
1 544
Sum the remaining numbers in the right-hand column:
17 34
8 --
4 ---
2 ---
1 544
====
578
So 17 multiplied by 34, by the Ethiopian method is 578.
The task is to define three functions/methods/procedures/subroutines:
function halve(x) { return(int(x/2)) }
function double(x) { return(x*2) }
function iseven(x) { return((x%2) == 0) }
function ethiopian(plier, plicand) {
r = 0
while(plier >= 1) {
if ( !iseven(plier) ) {
r += plicand
}
plier = halve(plier)
plicand = double(plicand)
}
return(r)
}
BEGIN { print ethiopian(17, 34) }
In the Awk-verse, there are two TAWKs.
TAWK #1 is the TAWK Compiler from Thompson Automation Software (no longer trading)
TAWK #2 was a ultra-cut down version of AWK written in C++ by Bruce Eckel in 1989. Eckel writes: