Version 1 --------- Following sections explain internals of metastore file (.metadata), version 1 ### Data types SIGNATURE = Magic signature, 10 bytes long = "MeTaSt00r3" VERSION = Format version string, 8 bytes long. This version = "00000001" URLSTRING = URL-encoded string. Chars 0x00 to 0x20 (inclusive), 0x7F, and 0x25 (%) *must* be encoded. Terminated by any character which must be encoded, which is not (e.g. "\t", "\n"). INTSTRING = ASCII-encoded integer, in a pre-specified base from 2 (binary) to 16 (hexadecimal). May be preceded by "-" for negative values ### File layout SIGNATURE VERSION "\n" n * (ENTRY "\n") ### ENTRY format URLSTRING - Path (absolute or relative) "\t" URLSTRING - Owner (owner name, not uid) "\t" URLSTRING - Group (group name, not gid) "\t" INTSTRING - Mode (base 8, of struct stat.st_mode & 0177777. i.e. File type and mode, as per inode(7) "\t" URLSTRING - Mtime (including nanoseconds) in ISO-8601 format, UTC. "YYYY-mm-ddTHH:MM:SS.nnnnnnnnnZ" m * ("\t" URLSTRING "\t" URLSTRING) - xattr name/value pairs. `m` may be 0. "\n" - Entry-terminating newline. ### Discussion This format is designed to work with version control systems, specifically `git(1)`. To fit in with `git` and its related tooling, this format is a line-based text file. Each record is a bunch of text fields separated by tabs, terminated by a newline. This means records should be identifiable and somewhat understandable to readers, and should work with `diff(1)` and `patch(1)` (and their `git`ified descendants). Merge conflicts should produce files that are resolvable with any ordinary text editor. (Even `ed(1)`, if you insist!) This format is generally slightly larger than Format 0, but shouldn't be significantly so for most use cases, and this is a reasonable trade-off for readability and diff/patch/merge-ability. The format could be significantly larger if files have large amounts of binary data in xattrs, as 35 out of the 256 possible bytes require URL-encoding as a 3-byte sequence, giving a 27.3% increase (by my calculations). This clearly isn't ideal, but this author suspects that the proportion of files with large binary xattrs is fairly small, and this should not cause an issue in practice. If a user does have large amounts of binary xattr data but can't handle the 27% size increase this format incurs, they can still use Format 0 to store it instead. If *you* have large amounts of binary xattr data that you have to store in git in a way that's diff/patch/merge-able - well, feel free to submit patches for Format 2 yourself ;-) If you do update this format, remember to change the man page as well as this document! I've tried to keep the info in the man page as short as possible, and to only include what a user should need to work with the resulting files. Extended musings and notes for implementors go here (or in the `git commit` log :-) ### UTF-8 cleanliness Note that because bytes >= 0x80 are not required to be URL-encoded, binary xattr data is very unlikely to be UTF-8 clean. If this is a problem for the editor you use to resolve conflicts... I dunno. Get a better editor maybe? We could URL-encode all high bytes, but that would triple the size of half the bytes in binary data, and of all non-ASCII byte sequences in UTF-8 text. I suppose it might be possible to URL-encode all sequences of high bytes that are *not* UTF-8 clean (and that would be backwards-compatible with the existing format) but I don't want to add that much complexity at the moment. Also, it might not be "enough" as you'd probably want to encode non-printable UTF-8 control codes such as RTL/LTR marks (U+200E/U+200F) to prevent the possibility of "Trojan Source" type attacks. (See for more info on "Trojan Source") ### Sorting To generate stable metatdata files that do not depend on the order that files are returned by `readdir()`, which would producing spurious diffs, sort entries by path ASCIIbetically, as with `strcmp(3)`. But we don't require metadata files to be sorted when reading them. If we read two entries for the same path, the results are currently unspecified. Users should probably avoid sorting Format 1 files with standard tools like `sort(1)`, as the HEADER line must always be first, and also URL-encoded characters will throw off the sort order. Also, tools like `sort` will typically sort according to the current locale, e.g. using `strcoll(3)` rather than `strcmp()`.