登录查看更多内容

Swapping columns of a text file and then joining its contents with another text file via Command Line

Maria Campbell

发布日期: 2024年7月9日

Note: This article was originally published on my personal website mariadcampbell.com.

I came across a problem which I wanted to solve. It could even prove quite practical. It involved joining two text files. However, the order of the columns were not the same, and in order for the join command to work, they had to be the same.

I called the first file file1.txt, and it contained the following:

John 1

Jane 2

Mary 3

I called the second file file2.txt, and it contained the following:

1 Doe

2 Doe

3 Sue

What differs between the two files? For one, in file1.txt, column 1 is populated by people's first names followed by a number column. In file2.txt, column 1 is populated by numbers, and column 2 contains people's last names. So if I were to try to join the two files without manipulating their content with the following command:

join file1.txt file2.txt

it would result in the following output in Terminal:

Exactly. Nothing would output to the Termninal. It took me a little while to figure out what to do, but I did know that I would have to switch the order of the columns in file1.txt so that it would match the order of the columns in file2.txt. I initially couldn't figure it out with join (I subsequently did later on), but I did come across the awk command, which even now just seems more intuitive and much more flexible. I ended up doing the following:

awk '{ t=$1; $1=$2; $2=t; print }' file1.txt > file1a.txt
join file1a.txt file2.txt

The first line of the script resulted in a third file called file1a.txt, and it contained the following:

1 John

2 Jane

3 Mary

The order of file1a.txt matched the order of file2.txt. This made it easy to join the contents of the two files represented in line 2 of the code. join file1a.txt file2.txt resulted in the following in Terminal:

1 John Doe

2 Jane Doe

3 Mary Sue

This is exactly what I wanted to achieve.

So what is the awk command and why is it so powerful? According to man awk (awk manual built into Terminal),

awk is a pattern-directed scanning and processing language. wk scans each input file for lines that match any of a set of patterns specified literally in prog or in one or more files specified as -f progfile. With each pattern there can be an associated action that will be performed when a line of a file matches the pattern.

awk was perfect for the task at hand:

An input line is normally made up of fields separated by white space, or by the regular expression FS. The fields are denoted $1, $2, ..., while $0 refers to the entire line.

Now let's break down the code. awk is a macOS built-in command, which is convenient, and it dates back to early Linux (and Unix). It is part of the POXIX standard, which makes application programming interfaces provided by Unix-y operating systems (and ancillary issues, such as command line shell utilities) more comprehensible and consistent. This then allows for code portability across various Unix derivatives, including Linux and macOS. awk is great for solving issues in our data files.

After the awk command is:

领英推荐

?? Top LLM Papers of the Week (December Week 1, 2024)

Kalyan KS 3 个月前

No-Code LLM Fine-Tuning and Debugging in Real Time:…

Vincent Granville 5 个月前

Data Science #18

Andriy Burkov 1 年前

'{ t=$1; $1=$2; $2=t; print }'

'{}' denotes encapsulating the complete code to manipulate the file that follows and that it should be executed all together. t denotes an arbitrarily chosen variable and $1 represents field one (aka column). $1=$2 means that column 1 now equals column 2, and $2=t means column 2 now equals t, meaning that column 2 now is replacing column one, so column 1 now positioned where column 2 originally was positioned, and column 1 is now positioned where column 2 was originally positioned. Column 1 has switched positions with column 2, resulting in the following in Command Line:

# file1.txt

1 John

2 Jane

3 Mary

print simply means to print the output of the command. file1.txt represents the file to manipulate.

I also wanted to make sure to be able to preserve the file manipulation on file1.txt, so I redirected the manipulation into a new file called file1a.txt, which was created at the same time as the redirection, represented by >. The contents of file1a.txt was the following;

1 John

2 Jane

3 Mary

which is exactly what I wanted, as previously indicated. Which made it then possible to run the following command in Terminal:

join file1atxt > file2.txt

which resulted in the following:

1 John Doe

2 Jane Doe

3 Mary Sue

I subsequently found out that there is an even shorter version of this awk command, very similar to the join command (which I will discuss shortly):

awk '{ print $2, $1 }' file1.txt

In this case, print has to precede $2, $1. Why? Because print has to precede the fields it will print. In the original example, the fields were being saved into variables first, so print was placed at the end. I still had to add > file1a.txt after awk '{ print $2, $1 }' in order to preserve the temporary changes made to file1.txt output to the Command Line.

The similar join command which changes the column/field order of one file to match the other file is the following:

join -1 2 -2 1 file1.txt file2.txt

which results in the following in Terminal:

1 John Doe

2 Jane Doe

3 Mary Sue

-1 refers to file1.txt and -2 refers to file2.txt. And the non minus number that follows represents the column position it will be switched to.

In this case, it is definitely shorter and therefore easier to use the join command, but the original awk command definitely clarified what was going on under the hood, which then made it very easy to figure out the join command.

I'm glad I had a chance to use awk, and will continue to explore it. But I am also glad I got to know join better!

Related Resources

How to Use Awk to Print Fields and Columns in File: tecmint Linux blog
Awk One-Liners Explained, Part II: Text Conversion and Substitution: catonmat

要查看或添加评论，请登录

Maria Campbell的更多文章

The su command in Linux and Unix (macOS)

2024年7月24日

The su command in Linux and Unix (macOS)

This article was first published on my personal blog mariadcampbell.com.
The script command in Linux and Unix (macOS)

2024年7月15日

The script command in Linux and Unix (macOS)

In my article The pipe (|) and tee commands in Linux and Unix (macOS), I talk about how one can save the stdout…
stdin stdout and stderr in Linux and Unix (macOS)

2024年7月12日

stdin stdout and stderr in Linux and Unix (macOS)

Note: this article was first published on my personal blog mariadcampbell.com.
Redirect operators in Linux and Unix (macOS)

2024年7月12日

Redirect operators in Linux and Unix (macOS)

Note: this post was originally published on my personal blog mariadcampbell.com.
How to find a folder on your computer

2023年10月19日

How to find a folder on your computer

I have a lot of folders containing projects on my computer. Sometimes I don't remember where I have placed them! This…
Data Recovery with Git

2016年10月24日

Data Recovery with Git

So I just did my first data recovery with Git. Git really is an amazing tool.
Celebrating 10 Years of Wordpress

2015年1月16日

Celebrating 10 Years of Wordpress

I just found out this past Friday, May 24, 2013, that there's going to be a 10 year anniversary celebration of…
Behind the highlight reel: "National Singles Day" at The Shakespeare

2015年1月12日

Behind the highlight reel: "National Singles Day" at The Shakespeare

Behind the highlight reel: "National Singles Day" at The Shakespeare Yesterday was a bit surreal for me, and I felt…
The Latest (and the coolest) on the Tempo Smart Calendar App

2014年9月23日

The Latest (and the coolest) on the Tempo Smart Calendar App

For those of you that are acquainted with the Tempo Smart Calendar App, you know that it is contextual and intuitive…
Why I'm Still Single (and not a Mom)

2014年9月4日

Why I'm Still Single (and not a Mom)

Life is a journey. Being an entrepreneur is a journey.

See all articles

社区洞察

Machine Learning

How can you clean text data effectively for sentiment analysis?

Swapping columns of a text file and then joining its contents with another text file via Command Line

Maria Campbell

领英推荐

Related Resources

Maria Campbell的更多文章

社区洞察

其他会员也浏览了

Fastest way to finetune and deploy Large Language Model without writing any code

Limitations of the Linear Kalman Filter

The Implications of Prompt Interfaces (Vol. 7)

Geek out time: Creating an Agent with Customized Skill in AutoGen Builder

DeciCoder-6B and DeciDiffusion 2.0: Models Built for Accuracy, Speed, and Cost-Efficiency

Running LLMs Locally: A Comprehensive Guide

Game of Thrones Book Analysis

Text Classification with Hugging Face's BERT Model in Langchain

A Better Model? Or a Smarter Approach to Problem Solving?

Linear Discriminant Analysis (LDA)

领英推荐

Related Resources

Maria Campbell的更多文章

The su command in Linux and Unix (macOS)

The script command in Linux and Unix (macOS)

stdin stdout and stderr in Linux and Unix (macOS)

Redirect operators in Linux and Unix (macOS)

How to find a folder on your computer

Data Recovery with Git

Celebrating 10 Years of Wordpress

Behind the highlight reel: "National Singles Day" at The Shakespeare

The Latest (and the coolest) on the Tempo Smart Calendar App

Why I'm Still Single (and not a Mom)

社区洞察

其他会员也浏览了

Fastest way to finetune and deploy Large Language Model without writing any code

Limitations of the Linear Kalman Filter

The Implications of Prompt Interfaces (Vol. 7)

Geek out time: Creating an Agent with Customized Skill in AutoGen Builder

DeciCoder-6B and DeciDiffusion 2.0: Models Built for Accuracy, Speed, and Cost-Efficiency

Running LLMs Locally: A Comprehensive Guide

Game of Thrones Book Analysis

Text Classification with Hugging Face's BERT Model in Langchain

A Better Model? Or a Smarter Approach to Problem Solving?

Linear Discriminant Analysis (LDA)