Coding Challenge #9 - grep
John Crickett
Helping you become a better software engineer by building real-world applications.
This week your challenge is to build your own version of the Unix command line tool grep.
The Unix command line tools are a great metaphor for good software engineering and they follow the Unix Philosophies of:
- Writing simple parts connected by clean interfaces - each tool does just one thing and provides a simple CLI that handles text input from either files or file streams.
- Design for simplicity; add complexity only where you must.
- Design programs to be connected to other programs - each tool can be easily connected to other tools to create incredibly powerful compositions.
Following these philosophies has made the simple Unix command line tools some of the most widely used software engineering tools which can be chained together to create far more complex and powerful set of tools that you’d expect.
You can read more about the Unix Philosophy on the Coding Challenges blog.
Following these philosophies has made the simple unix command line tools some of the most widely used software engineering tools - allowing us to create very complex text data processing pipelines from simple command line tools. There’s even a Coursera course on Data Engineering with Bash!
The Challenge - Building grep
The functional requirements for grep are concisely described by it’s man page - give it a go in your local terminal now:
% man grep
The grep utility searches any given input files, selecting lines that
match one or more patterns. By default, a pattern matches an input line
if the regular expression (RE) in the pattern matches the input line
without its trailing newline. An empty expression matches every line.
Each input line that matches at least one of the patterns is written to
the standard output.
N.B. grep can support basic regular expression and extended regular expressions. In GNU grep there is no difference between the two, both are extended. For none GNU versions of grep the -E option enables the extended pattern matching which is what we are going to look at in this challenge.
Difficulty - You Choose!
You can tackle this challenge at two different levels of difficulty:
- Easy - use the regular expression library from your programming language.
- Hard - implement your own regex engine.
You can do it either way, or even both! For both, build it the easy way then come back and re-implement the regex with your own engine. If you decided to do that, there’s a pretty good article explaining it at Let’s Build a Regex Engine.
领英推è
Step Zero
Like most programming languages we’re zero indexed!
For this step, I’ll leave you to setup your IDE / editor of choice and programming language of choice. After that here’s what I’d like you to do to be ready to test your solution.
Download the following text: [<https://www.gutenberg.org/cache/epub/132/pg132.txt>](<https://www.gutenberg.org/cache/epub/132/pg132.txt>) and save it as test.txt. After that please download the additional test data from my dropbox here ****and unzip them, you should then have the following in your test directory:
% tree
.
├── rockbands.txt
├── symbols.txt
├── test-subdir
│ └── BFS1985.txt
└── test.txt
If you’re on Windows you can use GoW or GitBash to get a bash console, or if you’re a PowerShell user I’m sure you’re capable of translating the commands! Well better than I can! ??
Step 1
In this step your goal is to implement support for an empty expression. An empty expression matches every line so the command grep "" test.txt will write every line of the file test.txt.
Implement your solution, run the command and check you get every line written out, you can automate this test with the Unix command line tools:
% grep "" test.txt | diff test.txt -
%
Shows that your grep output the same contents as the file test.txt because no diff is output. If your grep is not correct for this step there will be a diff output.
Step 2
You can find Step 2 and beyond on the Coding Challenges website as Write You Own grep.
Or if you'd rather get the whole challenge delivered to you inbox every week, you can subscribe on the Coding Challenges Substack.
Software Tech Lead @Zemetric | ex Co-Founder @Evy Energy (Acquired)
1 å¹´Here's my solution using the inbuilt RegExp: https://github.com/jainmohit2001/coding-challenges/tree/master/src/9 Too lazy to implement the whole RegEx considering we already build 3 parsers till now xD.
Writing about Engineering Management | Director of Engineering
1 å¹´Nice challenge :) I'm less interested in building my coding skills at this stage, but your challenges are very well executed!
Software Engineer | AI Assisted Coding Champ @Blend | Ex-Googler
1 å¹´Just a friendly tip, Check the code on the final step, I can't see the -i.
Helping you become a better software engineer by building real-world applications.
1 å¹´If you'd like it emailed to you, you can subscribe here: https://codingchallenges.substack.com/