Note: I’ve discovered MathJax… as will be obvious, lol
In this post, we’ll take a look at the pipes and filters metaphor for describing data processing pipelines.
The “successive application of functions” view
If one thinks of a particular process in terms of a data transformation, say a function transforming some set of parameters into some type of output:
Then you could think of \(f\) in terms of a composition of its steps, something like:
Where the functions \(f_1, f_2, f_3\) are successively applied to the input, eventually yielding the expected result, \(y\). You can see that the data is successively transformed by each step until the desired result is achieved. I often try to think of a process in terms of a data transformation. For instance, a web-service login process could be imagined as a transformation from a tuple of credentials to a session token. Thinking in these terms helps one write code that avoids side effects and maintains a narrow focus. Additionally, if these steps are invertible, the code you write lends itself nicely to supported the revered undo operation :-)
But what does this have to do with pipes and filters?
The “data flow” view
The same process described above could also be imagined as a data flow, with the provided parameters flowing from one step in the process to the next.
In this model, the arrows connecting the steps are called pipes (as one would channel a fluid through) and the process steps (\(f_1 \cdots f_n\)) are called filters (as one might use to dephlogisticate a foul elastic fluid). Basically, we break the process down into discrete (ideally stateless) steps and get all Mario on this beast.
That, in a nutshell, is the concept being pipes and filters. It arises naturally from the data flow view of the world, and is useful in terms of messaging systems for a number of reasons:
- The “flow” is facilitated via messages over the message system
- The filters may be co-located, or physically separated
- Steps that may be performed in parallel can be(*)
- ‘Wire taps’ can be inserted in the pipeline to monitor progress
- The pipeline is inherently dynamic! Strategy Pattern anyone?
I’ve starred an item above, as I think it’s particularly useful. In order to make use
of several components, we’ll need to cheat and mention an as-yet described component,
splitter and the
aggregator. Let’s say we have a commerce site and the user has
a shopping wagon (why limit ourselves to a mere cart?) with one or more items in it.
Additionally, let’s say that the processing of an item is rather complex and requires a
number of steps. If we process these items in series, item 1, then item 2, then item 3,
etc, then processing an order will take linear time. We can do better by processing all
of the items in parallel! Then the request will take as long as the lengthiest part, plus
the overhead of forking and joining:
As shown above, all of the orders are sent through the identical order processing pipelines, but without waiting on each other.
Are there any other benefits?
Why of course there are! I’m so glad you asked!
Being a newly minted continuous delivery junkie, tests are an important concept to me. Test coverage must be kept high, and tests should meaningfully exercise components. When a process is modeled in terms of pipes and filters, each filter is a perfectly isolated black box, ready to be fully exercised in a unit test. Additionally, performance characteristics can be easily measured thanks to the same basic input-output structure of a filter.
As a quick aside, it’s work noting that the pipe and filter scheme can also be seen as a version of the previously buzz-wordy map-reduce. The filters map their inputs to their outputs, and the final filter could perform a reduction. Bam! Map reduce.
Is there a hands on part to this post?
Why yes there is!
In the git repo I’ve updated the sample code to include a contrived filter factory and a couple of filters. Try making your own filter or exploring the possible combinations.
Be sure to have RabbitMQ running locally - otherwise the poor pipelines will have no place to connect ;-(
Until next time.