Unix filesystems: How mv can be dangerous

Discuss on Hacker news

Today I played a little with btrfs subvolumes, and I learned about another rusty Unix corner. In particular how the mv command can be dangerous and about some complexities of the POSIX file system API.

(This article applies to Debian Wheezy. It might not apply to other systems, particularly non-GNU ones).

Try this little test script:

#!/bin/sh -x
dd of=/tmp/testfs bs=1G seek=1 count=0
mkfs.btrfs /tmp/testfs
mount /tmp/testfs /mnt
touch /mnt/bar
mv /mnt /mnt/foo
find /mnt

(this is no btrfs particularity, for example mkfs.ext4 would also do).

Output ends with:

+ mv /mnt /mnt/foo
mv: cannot copy a directory, ‘/mnt’, into itself, ‘/mnt/foo’
+ find /mnt

mv created an additional directory between the mount point and its former childs! This was unexpected. And while mv failed overall, the operation was destructive to the source directory hierarchy. Too bad.

Putting strace before the mv /mnt /mnt/foo and running again, we find the following

stat("/mnt/foo", 0x7fff8f50b780)        = -1 ENOENT (No such file or directory)
lstat("/mnt", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
lstat("/mnt/foo", 0x7fff8f50b450)       = -1 ENOENT (No such file or directory)
rename("/mnt", "/mnt/foo")              = -1 EXDEV (Invalid cross-device link)
rmdir("/mnt/foo")                       = -1 ENOENT (No such file or directory)
mkdir("/mnt/foo", 0700)                 = 0
lstat("/mnt/foo", {st_mode=S_IFDIR|0700, st_size=0, ...}) = 0
getdents(3, /* 3 entries */, 32768)     = 72
getdents(3, /* 0 entries */, 32768)     = 0
close(3)                                = 0
lstat("/mnt/foo", {st_mode=S_IFDIR|0700, st_size=0, ...}) = 0
stat("/mnt", {st_mode=S_IFDIR|0755, st_size=6, ...}) = 0
stat("/mnt", {st_mode=S_IFDIR|0755, st_size=6, ...}) = 0

As can be seen, mv first tries to rename("/mnt", "/mnt/foo") and only after that fails with EXDEV it decides to mkdir("/mnt/foo"). The intent here is to copy recursively the directory structure below the source /mnt to the target /mnt/foo, not understanding that the (non-existing) target will be a subdirectory of the source.

When we count in all possible mount setups and error conditions, this algorithm can quickly get very complex. Especially for something which we explain by "rename or move a to b".

Our simple test script illustrates how easily the algorithm can fail in an ungracious, destructive way. Maybe it's even possible to come up with setups where mv does not halt, ever creating new subdirectories and sub-subdirectories?

For completeness, let's now watch what happens when /mnt is not a mountpoint:

$ umount /mnt
$ touch /mnt/bar
$ mv /mnt /mnt/foo
mv: cannot move ‘/mnt’ to a subdirectory of itself, ‘/mnt/foo’
$ find /mnt
$ strace mv /mnt /mnt/foo
stat("/mnt/foo", 0x7fff209f3240)        = -1 ENOENT (No such file or directory)
lstat("/mnt", {st_mode=S_IFDIR|0755, st_size=6, ...}) = 0
lstat("/mnt/foo", 0x7fff209f2f10)       = -1 ENOENT (No such file or directory)
rename("/mnt", "/mnt/foo")              = -1 EINVAL (Invalid argument)

This time the rename(2) call fails with EINVAL instead of EXDEV, and mv gives up instantly. From a look at the rename(2) manpage we find that EINVAL is meant for exactly this move-to-subdirectory condition.

Sadly, it seems we lack a simple rename/move shell command that guarantees it will not go to great lengths with copy operations, inviting all sorts of race conditions, identity problems and unpredictable running time.

(Another thing that bugs me, and this by the way applies to btrfs subvolumes as well, is how the semantics of mv's invocation depend on the existence of the target as a directory. For GNU mv, there is a cure: use -t/-T)

For most cases, all we need is a stupid system call interface to rename(2) a large directory structure atomically. Only when that failed one might or might not consider a costly and error-prone copy-then-remove-old. And that would be the job of cp and rm.

Created: 2015-01-21
Last Update: 2015-05-22