Simple is not easy.

SRP: The UNIX Way

Chaining together a set of UNIX’s small, simple command line utilities gives the operator the ability to solve complex problems in a concise way. Many of these tools are over 30 years old, and yet they continue to be used to solve modern problems.

Specifically, this week at work we were focusing on optimizing our Rails application’s database queries. We were looking for controller actions that made the same database query several times so that we could we eliminate unnecessary database I/O. The application log files contained all the data necessary to find these duplicated queries, we just needed a way to scrape the data and group it in a meaningful way.

We created the following set of commands:

sed '1,/CurrentItemsController#show/d' log/test.log | \
sed '/Completed/,$d' | sed 's/^.*SELECT/SELECT/' | \
sort | uniq -c | sort -r | head -5

A quick explanation of each command:

  1. Read the log file and skip all the lines before the controller action.

    sed '1,/CurrentItemsController#show/d' log/test.log
    
  2. Skip all the lines after the controller action.

    sed '/Completed/,$d' 
    
  3. Remove all the text on each line that precedes ‘SELECT’.

    sed 's/^.*SELECT/SELECT/'
    
  4. Sort the rows so duplicate queries will be on adjacent lines.

    sort
    
  5. Group the duplicate lines and count them.

    uniq -c
    
  6. Sort the queries by how many times they were duped.

    sort
    
  7. Return the top 5.

    head -5
    

And it generated this output:

7 SELECT COUNT(*) FROM "activity_attemptables" WHERE "activity_attemptables"."activity_id" = 1 AND "activity_attemptables"."completed" = 'f'
7 SELECT "activities".* FROM "activities" WHERE "activities"."lesson_id" = 1 AND "activities"."completed" = 'f' ORDER BY activities.position LIMIT 1
4 SELECT "lesson".* FROM "lesson" WHERE "lesson"."id" = 1 AND (lesson.passed IS NULL) ORDER BY lesson.position LIMIT 1
3 SELECT COUNT(*) FROM "activity_attemptables" WHERE "activity_attemptables"."activity_id" = 1 AND "activity_attemptables"."completed" = 'f'
3 SELECT "activities".* FROM "activities" WHERE "activities"."lesson_id" = 1 AND "activities"."completed" = 'f' ORDER BY activities.position LIMIT 1

Armed with the sorted list of the duplicated queries, we were able to find the sources of the duplication and optimize them.

While I hope you found the concrete example above to be useful, I used the example to make a larger point. Each of the utilities used above has a single responsibility. Because of this, stringing them together to work in ways the original authors never intended was relatively easy.

As we write software, one of the hardest things we have to do is build the right thing for today while making it easy to accommodate the needs of the future. In my experience, this kind of flexibility comes from decomposing objects and functions into small components that have a single responsibility. As requirements change, if a system is decomposed well, then only a few components will need to be added or changed to accommodate the change.

So, before adding new behavior to existing objects, consider this: Did the UNIX designers give grep the ability to count lines? No! They made wc. But, grep was modified to support different kinds of regular expressions.

The UNIX command line toolset is a great example of decomposition into SRP. I hope you will use it not only to solve problems directly, but also to inform your own object design so that your system will achieve some of the same flexibility as UNIX.