Some thoughts on composability

Discuss on hacker news

Two days ago I released an article on the complexities of the mv command for discussion on Hacker news which got some attention.

The core insight was that mv is an example of packaging up a supposedly useful and basic behaviour which is actually complex enough that the current GNU implementation in some cases destructively affects data even in the absence of race conditions.[1]

Briefly, mv can be explained as doing a safe rename(2), and, if that fails because the move is across devices, falling back to a dangerous recursive copy followed by a dangerous recursive deletion.

But what is really missing is a trivial shell interface doing only the safe rename(2), failing if that fails.

A few people in the discussion pointed out that mv was a much needed command in practice, and there be no use for a command which refuses move across devices.

Now let me not go deeply into the aspect of practicality. Just a note from my personal experience: I don't remember a single time where I really meant to use mv to move across devices. I do remember a few times where that happened by accident, though — the consequence sometimes being time lost waiting for completion, and feeling unsafe about filesystem state when I had decided to interrupt, or I/O errors like filesystem full had occurred. In practice, when I really need to move something across devices (which is seldom enough), I use cp, then do whatever it takes to ensure the copy was successful[2], then use rm to delete the source (Exercise: find out how that is related to the following topic).

Anyway, there was more interesting input from the discussion.

Composability vs composition

Quoting a discussion thread somewhat liberally:

> > mv was “a stupid system call interface to rename(2)”. However, some
> > users didn't like having to think about whether the move was
> > across devices, so eventually it was made to fall back to copying
> That's actually a good thing, because it makes scripts more composable.

Which leads me into a rant about composability.

I disagree with the implicit premise of the first, doubly-quoted statement. “Not having to think” is actually the benefit of a rename(2)-only interface, not of the more complex version. In the common case of a move which is supposed to be not cross-device, a rename-only interface reports cross-device conditions in an infallible way (taking the burden from the user).

I strongly disagree with the second, singly-quoted statement: The “composability” argument here expresses the philosophy that packaging up flowcharts in binaries or libraries, in the way mv does it, makes program development more tractable. What is meant for use in a larger number of settings is deemed more composable.

While this popular way of packaging can be useful and pragmatic, it's actually backwards from the idea of composability. Packaging fixed flowcharts inside binaries is a kind of composition, and it results in uncomposable artifacts. An artifact is composable if it is unassuming about the context in which it will be used.

Baked in compositional strategies

Let's restate this in a concrete way: We have looked at the mv idiom which in pseudo code might be implemented as mv a b = try { rename a b } catch EXDEV { unsafe-cp a b ; unsafe-rm a }. This way of composing the idiom has disadvantages: It's baking in a compositional strategy: “If the safe way fails, unconditionally take the dangerous route”. The result is a sealed building block. Its composability coefficient is 0 because it can't adapt to different compositional contexts at all. For example, what if we want to reuse this building block in the context that permission should be requested from the user before executing unsafe operations?

Command-line switches hellp

And that's why your average Unix command provides loads and loads of ad-hoc command-line switches. Many of these switches are meant to recreate the composability which was lost by compiling small functions into a fixed binary. Switches improve composability because giving them as command-line argument influences in what way some building blocks inside the binary are composed. For example, mv has an -i switch to let it request the user's confirmation when an existing target would be overwritten.

Wherever switches appear, I get wary. One problem is that they are still very inflexible: The possible usage scenarios have to be more or less figured out in advance by the designer of the program. There is still no possibility to include a custom compositional strategy. For a concrete example, what if we want to augment the behaviour of mv -i by ringing a bell when the user's attention is needed? What if we don't want to consider replacing files of a given type?

Also, switches don't enable the propagation of compositional strategy from the highest layers down to the individual building blocks. Environment variables are another approach, but it's usually unclear what building blocks respect them and whether they will be propagated (to other processes).

The other significant problem is complexity. Composability-recreating code suspectedly is the reason for a major part of program bugs. Programs no longer have simple specifications that map classifications of environmental state to outcome. There is now another dimension to handle, namely the acceptable sets of command-line switches. Of course most of these switches are implemented as special cases all over the code, which significantly increases complexity. (If no special casing was needed, there was also less need to share code — functionality could be split into major modes or even separate binaries, and no switches would be required to begin with).

Composability is possible

If you haven't already, look at Haskell. Really, I mean it. And by the way, this article is an introduction why functors and monads are very useful.

The Unix way

Having worked out what a bad historical accident mv is, and having looked at the inflexibility of the concept of combining binaries, it could be concluded that the Unix model is too limited for many applications. In a way, it clearly is.

But taking the idea of “small programs which are responsible for small and well-defined tasks”, functional methodologies like monads are only the logical continuation of that. It's all about making things as small as you can. Avoid overspecification where possible and leave the rest to the users of your code to decide.

That's not to blame the implementation. If anything, it's to blame the specification.
True copies don't exist for various technical reasons. For example, you can't copy the identity of a file. What's a good enough copy differs from case to case.

Created: 2015-05-23
Last Updated: 2015-11-14