Reuse and the wisdom “grep” and “awk”

Reuse and the wisdom “grep” and “awk”

We all want our code to be re-used, yet GitHub is full of code that is only used by one person. Some of it is mine. What makes re-use so difficult? Software may not be reused for a number of reasons, but I’ve seen one particular problem often enough that I think it is worth some discussion. I call it the “Uber framework” fixation.

An Uber framework aspires to take care of everything for you. In particular, it does a lot of orchestration tasks. When it does what you need, it’s a beautiful thing. The trouble arises when the your notion of “everything” differs a bit from that of the Uber framework creator. When that happens, the effort required to make even a small change can be disproportionately large.

For the purpose of illustration, let’s look at a simple example. Suppose I want to search all of the directories of a folder on my hard drive for “.yaml” files that contain the word “password” and save the name of each file to “passwords.csv”. Each line of “passwords.csv” should have 2 ,comma separated, fields; the first being the name of the file and the second being the number of times that the word “password” appears in the file.

The Uber framework approach would be to make the ultimate file finder/text searcher/output formatter thing. It would take parameters about what to find and what to search for and perhaps feature a pattern language for formatting the output. As long as I stay within the original scope of the framework, life is good. However, once I want something not envisioned by author, the picture becomes less rosy. Now I need to extend the framework.

Since the focus was on “taking care of everything”, the author may have spent all their efforts on documenting the usage and relatively little on describing how the framework is structured. Doubtless the author would know exactly how to make the required change, but for me to make the change I need to “grok” the whole framework. If I do “grok” it, it’s possible that structural changes are required. This stems from the fact that my problem statement doesn’t match the author’s original vision. The would-be user now has to make a disproportionate investment of mental energy to go from a 90% fit to a 100% fit.

Now let’s look at what I will call the “Unix” approach (so called because it predates Linux). The following command will do the trick:

find . -name "*.yaml" | xargs grep -c password  2>/dev/null  | awk -F "\:" '{ print $1","$2 }' > passwords.csv        

This solution involves 4 different tools, “find”, “grep”, “awk” and , the wonderful “xargs”. The tools are strung together with a small amount of orchestration code. Note that each tool is very focused on one broadly useful thing. Note also that the original authors knew nothing about my finding/searching/formatting project but those tools were useful to me none the less. This “tools and orchestration” approach is under-appreciated.

One possible objection is that I had to learn about 4 tools instead of 1. That sounds like a bad deal. But is it really? I’m not optimizing over this one project. I’ll have a new project next week or next month. When I learn something new, I’m always cognizant of whether it is likely to be useful down the road. I’m optimizing for the long haul.

Tools that are smaller often have broader applicability. When I know I’ll be able to reuse something in many different situations, I’m more willing to invest my mental energy into learning it. As good as it may be, the Uber framework is only useful in one context. On the other hand, something like “grep” applies to many different problems. I get a much better return on my investment. Giving developers a good return on their investment of time and energy is one of the keys to writing a component that will be re-used.

Also, the Unix approach is less brittle. Imagine, in the example above I want to find only files that have been modified in the last 3 days. The “find” command actually does this, but let’s pretend for a moment it doesn’t. I need to extend the solution. Rather than being offered an “all or nothing” proposition, I have incremental options. I don’t have to crack open “awk” or “grep”, I don’t even have to modify “find” if I don’t want to. I can just add functionality to check the modify date and use orchestration to weave it in. Maybe I didn’t use “find” but I still used “grep” and “awk”.

Interestingly, a successful Uber framework does sometimes evolve from smaller, more focused components. One example is “git”, which started off as a collection of small commands that had to be orchestrated. Those commands still exist and have come to be called the “plumbing” layer. An orchestration layer now sits on top of the plumbing and that “porcelain” layer is what we all now use. It’s hard to imagine a better thing for a developer to invest in than learning “git”. Perhaps “return on investment” and “broad applicability” are more fundamental principles.

Also, at an earlier stage in git’s evolution, everyone used the plumbing commands. After those lower level commands had proven their usefulness, certain patterns crystalized and were codified into the porcelain layer. The plumbing came first!

I do want to point out that this doesn’t apply to large, professionally produced frameworks. With a sufficiently large development team you often can make a framework that “takes care of everything”. This article is about an obstacles to reuse at a smaller scale.

So, if you want your code to be re-used, don’t try to make an Uber framework. Instead, follow the Unix example and write small, focused components like “grep” and “awk”. The key is to be cognizant of the user’s return on investment of time and mental energy.

  • Make small, focused components that do one generally useful job. Smaller components can have broader applicability.
  • Leave the orchestration to the end user.

The intent of this article is to help us all make tools and components that our colleagues will want to use. I hope it generates some interesting dialog. Please let me know your thoughts.


要查看或添加评论,请登录

Randy May的更多文章

  • ML Inference in Java ?

    ML Inference in Java ?

    I’ve recently had occasion to look into running ML inference in Java. I was pleasantly surprised with how the state of…

社区洞察

其他会员也浏览了