Yet Another CLI Trick: simple awk reports
Jeffrey Wilson
Network Architect | Expert in Scalable Infrastructure, Cybersecurity, and Automation
If you read more than one or two of my posts, you might get the vibe that I'm a bit of an awk fanboy. You're not wrong. Once a sysadmin picks up one or two aspects of awk, she becomes unstoppable. So many possibilities with just a couple of simple tricks! Let's get into the details! (Requisite link to the online man page. ?? Can't say I'm not consistent!)
????????????
If you have experience with C, C++, or any of the scripting languages that lean on C as inspiration for syntax, you will be familiar with the perennial indexing question: start at 0 or start at 1?
Spoiler alert: awk starts at 1.
The substring function substr has two minimum required arguments: the original string, and the offset. If the third argument is specified, it defines the maximum length of the return value. If the third argument is not specified, the return value will be (length of original - offset).
For example: substr("Yet Another CLI Trick", 5, 7) will return "Another".
??????????
This is almost a mini-awk within awk. The entire premise of awk is splitting rows into fields using some designated separator (typically whitespace, but sometimes comma or semicolon). The split function does the same thing, but on a designated string. What I've always found odd about this function is that its return value is passed in and out as an argument.
For example: split("my-log-file.txt",arr,"-") writes the following entries into arr:
The accumulator pattern
Time and again, I come back to this one aspect of awk. It is shaped a bit differently in perl, but the same feature also exists there. I think of it as the labeled accumulator.
I'm assuming you have a passing understanding of how awk scripts are put together, so I won't review the basics just now.
I want to find lines that match a pattern. If that pattern exists, I want to sum up the quantity in the first column, grouped by the second column as a label.
领英推荐
/pattern to match/
{
accum[$2]+=$1
}
END
{ for(e in accum)
{ print e, accum[e] }
}
For example, take this listing of a directory:
[log]$ ls -1s *log*
648 Nudge.log
0 alf.log
16 fsck_apfs.log
8 fsck_apfs_error.log
8 fsck_hfs.log
33368 install.log
392 jamf.log
8 shutdown_monitor.log
16 system.log
8 system.log.0.gz
8 system.log.1.gz
8 system.log.2.gz
8 system.log.3.gz
8 system.log.4.gz
8 system.log.5.gz
1800 wifi.log
232 wifi.log.0.bz2
232 wifi.log.1.bz2
248 wifi.log.2.bz2
232 wifi.log.3.bz2
224 wifi.log.4.bz2
184 wifi.log.5.bz2
What happens if we pipe this output into awk? With default settings, awk will split the listing into two columns. The first column is the "number of blocks used in the file system by each file." The second column is the name of the file.
For some crazy reason, I've decided that the first portion of the log file name has significance, and I will be using that as my label for the accumulator. Here's my approach:
ls -s *log* | \
awk '{ \
split($NF,arr,"[^A-Za-z]");
sum[arr[1]]+=$1 \
}
END { \
for (s in sum) { \
print sum[s], s \
} \
}' | sort -rn
?????? ??????????: regular expressions allow for a definition of a character class. If I want to define all characters in the alphabet, upper and lower case, I would define that regex as [??-????-??]. By preceding the character class within the square brackets with a caret ^, I've negated that class - anything BUT a letter in the alphabet.
By using this ad hoc character class as the separator for my split function, I've cleanly broken any text in the various filenames into the first sequence of alphabetical characters.
Here's the output:
33368 install
3152 wifi
648 Nudge
392 jamf
64 system
32 fsck
8 shutdown
0 alf
This is roughly equivalent to the following SQL query:
select sum(blocks) as s, filename
from dir_ls
where filename like "%log%"
group by filename
order by s desc
Room to grow
By no means an exhaustive overview of awk, this brief tutorial should give you a taste of what's possible. Dive in, play around, figure out a useful report that's waiting for YOU to craft it out of the log files!
Happy hunting!