Exploring basic input field separators
- One of AWK's most fundamental concepts is that its input and output consist of records and each record is divided into fields. But how does AWK define a record and a field? It's actually very flexible, which is good because you'll encounter a wide variety of file formats in the real world. By default, AWK considers each line of its text input to be a record and white space, any string of spaces and or tabs marks the end of one field and the beginning of the next. For example, this AWK program tells AWK to print the second field on each line. Now, if we separate our fields with a space one, space two, space three, it prints two. If we use a tab character one, tab two, tab three, it also prints two. You can also use several spaces or tabs in a row. One, space, space, space, two, space, space, space. Tab three, or you can use a mix of spaces in tabs. One space tab space, two, tab tab space, tab tab three. As you may recall from the last chapter the capital F command line flag tells AWK to use the following argument as a field separator. For example, if our input uses commas to separate fields, we can tell AWK about it using the capital F flag. F comma print dollar two. In this case, it uses the comma to separate the line into fields and recognizes that the two is the second field on the line. Once you have specified a new field separator, white space is now considered part of the field rather than separating fields. For example, again with the comma as the field separator, if you type one space, one comma two, space, two, three space three, it does not recognize those spaces as field separators anymore. It recognizes two space two as one field separated from the fields on either side by commas. This is actually very useful because it's common to encounter tab separated text files that have space characters within the fields. With the default field separator, which is any amount of white space, it is not possible to have a field that is empty, because two spaces or tabs in a row are considered a single field separator. But with a user specified field separator, two field separators next to each other can create a field whose value is the empty string. So in this case, again, using the comma as the field separator, we can type one comma comma three. In this case, there's no second field, or we can type comma two comma three, in which case, there's no first field. In that last example, it's the first field on the line that's empty. Now, the field separator doesn't have to be a single character. It can actually be any string of characters. For example, we can specify that the field separator is ABC. If we print the second field on the line, we can say one ABC, two ABC, three, and it recognizes that two is the second field on the line. Finally the field separator can actually be any regular expression. For example, in this case, I'll put it in quotes because it's special to the shell. We can specify the regular expression indicating either a comma or an exclamation point is the field separator. So we can say one exclamation point, two comma three and it still recognizes that two is the second field on the line. We'll talk about regular expressions in detail later on, but for now, I'll just point out that if the regular expression contains characters special to the shell, it must be enclosed in quotes.
随堂练习,边学边练
下载课堂讲义。学练结合,紧跟进度,轻松巩固知识。