summaryrefslogtreecommitdiff
path: root/FILEFORMAT_1
diff options
context:
space:
mode:
Diffstat (limited to 'FILEFORMAT_1')
-rw-r--r--FILEFORMAT_188
1 files changed, 88 insertions, 0 deletions
diff --git a/FILEFORMAT_1 b/FILEFORMAT_1
new file mode 100644
index 0000000..b16d85a
--- /dev/null
+++ b/FILEFORMAT_1
@@ -0,0 +1,88 @@
+Version 1
+---------
+
+Following sections explain internals of metastore file (.metadata), version 1
+
+
+### Data types
+
+ SIGNATURE = Magic signature, 10 bytes long = "MeTaSt00r3"
+ VERSION = Format version string, 8 bytes long. This version = "00000001"
+ URLSTRING = URL-encoded string. Chars 0x00 to 0x20 (inclusive), 0x7F, and
+ 0x25 (%) *must* be encoded. Terminated by any character which
+ must be encoded, which is not (e.g. "\t", "\n").
+ INTSTRING = ASCII-encoded integer, in a pre-specified base from 2 (binary)
+ to 16 (hexadecimal). May be preceded by "-" for negative values
+
+### File layout
+
+ SIGNATURE VERSION "\n"
+ n * (ENTRY "\n")
+
+
+### ENTRY format
+
+ URLSTRING - Path (absolute or relative)
+ "\t" URLSTRING - Owner (owner name, not uid)
+ "\t" URLSTRING - Group (group name, not gid)
+ "\t" INTSTRING - Mode (base 8, of struct stat.st_mode & 0177777.
+ i.e. File type and mode, as per inode(7)
+ "\t" URLSTRING - Mtime (including nanoseconds) in ISO-8601 format, UTC.
+ "YYYY-mm-ddTHH:MM:SS.nnnnnnnnnZ"
+
+ m * ("\t" URLSTRING "\t" URLSTRING)
+ - xattr name/value pairs. `m` may be 0.
+
+ "\n" - Entry-terminating newline.
+
+
+### Discussion
+
+This format is designed to work with version control systems, specifically
+`git(1)`.
+
+To fit in with `git` and its related tooling, this format is a line-based text
+file. Each record is a bunch of text fields separated by tabs, terminated by a
+newline. This means records should be identifiable and somewhat understandable
+to readers, and should work with `diff(1)` and `patch(1)` (and their `git`ified
+descendants). Merge conflicts should produce files that are resolvable with any
+ordinary text editor. (Even `ed(1)`, if you insist!)
+
+This format is generally slightly larger than Format 0, but shouldn't be
+significantly so for most use cases, and this is a reasonable trade-off for
+readability and diff/patch/merge-ability.
+
+The format could be significantly larger if files have large amounts of binary
+data in xattrs, as 35 out of the 256 possible bytes require URL-encoding as a
+3-byte sequence, giving a 27.3% increase (by my calculations). This clearly
+isn't ideal, but this author suspects that the proportion of files with large
+binary xattrs is fairly small, and this should not cause an issue in practice.
+
+If a user does have large amounts of binary xattr data but can't handle the 27%
+size increase this format incurs, they can still use Format 0 to store it
+instead. If *you* have large amounts of binary xattr data that you have to store
+in git in a way that's diff/patch/merge-able - well, feel free to submit patches
+for Format 2 yourself ;-)
+
+If you do update this format, remember to change the man page as well as this
+document! I've tried to keep the info in the man page as short as possible, and
+to only include what a user should need to work with the resulting files.
+Extended musings and notes for implementors go here (or in the `git commit`
+log :-)
+
+
+### UTF-8 cleanliness
+
+Note that because bytes >= 0x80 are not required to be URL-encoded, binary xattr
+data is very unlikely to be UTF-8 clean. If this is a problem for the editor
+you use to resolve conflicts... I dunno. Get a better editor maybe? We could
+URL-encode all high bytes, but that would triple the size of half the bytes in
+binary data, and of all non-ASCII byte sequences in UTF-8 text. I suppose it
+might be possible to URL-encode all sequences of high bytes that are *not* UTF-8
+clean (and that would be backwards-compatible with the existing format) but I
+don't want to add that much complexity at the moment. Also, it might not be
+"enough" as you'd probably want to encode non-printable UTF-8 control codes
+such as RTL/LTR marks (U+200E/U+200F) to prevent the possibility of "Trojan
+Source" type attacks.
+
+(See <https://lwn.net/Articles/874951/> for more info on "Trojan Source")