Wednesday, December 10, 2008

Solved: weird problem with special characters in filenames

After moving my iTunes library from my Mac to a directory on an Ubuntu Linux server mounted via CIFS, iTunes could no longer find a good number of songs. All of the problematic songs had special characters such as accents in their filenames.

After trying a few different things over the course of several weeks, I finally noticed that the problematic names had the plain characters followed by so-called combining accents.

For example, instead of instead of a simple Ñ they had an N followed by a combining tilde. CIFS clients, including OS X Finder and smbclient could not handle these combined versions, and the files were inaccessible.

Fortunately, the convmv utility can fix this problem in bulk for a whole directory tree:

convmv -r -f utf8 -t utf8 --nfc .

It appears that the key was to convert the filenames into NFC Unicode normalization form (--nfc argument), typically used on Linux. It does not seem to work with NFD, typically used on Mac OS.

Update on Monday, August 17, 2009, 20:36

This problem just came back to haunt me when trying to synchronize the recently modified version of the iTunes library on OS X (NFD encoding) with the one on the Linux server (NFC encoding) using the Unison file synchronizer via ssh (not via CIFS, for better reliability). It turns out that the released and beta versions of Unison do not yet solve this problem. After I found this discussion, I downloaded and built the most recent revision from the repository and am running it with the unicode=true option, which makes it handle this situation correctly. The non-GUI version built out of the box without problems on OS X and Ubuntu with the most recent OCaml. That's one more loose end tied up...