The bytes representation of record id's is designed such that they would provide a meaningful row sort order in HBase, and be usable for scan operations. The encoding is such that when an ID in un-encoded form is a prefix of another ID, it remains a prefix when encoded as bytes. This allows for prefix-scanning a range of records. (Of course, this only applies to user-specified IDs, not to UUID's).
The format for a master record id is as follows:
{identifier byte}{basic byte representation}
Where the identifier byte is (byte)0 for a USER record id, and (byte)1 for a UUID record id.
The {identifier byte} is put at the start because otherwise UUIDs and USER-id's would be intermingled, preventing meaningful scan operations on USER id's.
In case there are variant properties:
The variant properties themselves are written as:
({key string utf8 length}{key string in utf8}{value string utf8 length}{value string in utf8})*
There is no separator between the key-value pairs, as this is not needed. The key-value pairs are always sorted by key.
|
|
|
|