Layout of a Fiji table.
Fiji uses the term layout to describe the structure of a table.
Fiji does not use the term schema to avoid confusion with Avro schemas or XML schemas.
FijiTableLayout wraps a layout descriptor represented as a
com.moz.fiji.schema.avro.TableLayoutDesc Avro record.
FijiTableLayout provides strict validation and accessors to navigate through the layout.
FijiTableLayouts can be created via one of two methods: from a concrete layout with
#newLayout(TableLayoutDesc), or as a layout update from a preexisting
FijiTableLayout, with
#createUpdatedLayout(TableLayoutDesc,FijiTableLayout).
For the format requirements of layout descriptors for these methods, see the
"Layout descriptors" section below.
Overall structure
At the top-level, a table contains:
- the table name and description;
- how row keys are encoded;
- the table locality groups.
Each locality group has:
- a primary name, unique within the table, a description and some name aliases;
- whether the data is to be stored in memory or on disk;
- data retention lifetime;
- maximum number of versions to keep;
- type of compression;
- column families stored in this locality group
Each column family has:
- a primary name, globally unique within the table,
a description and some name aliases;
- for map-type families, the Avro schema of the cell values;
- for group-type families, the collection of columns in the group.
Each column in a group-type family has:
- a primary name, unique within the family, a description and some name aliases;
- an Avro schema.
Layout descriptors
Layout descriptors are represented using
com.moz.fiji.schema.avro.TableLayoutDesc Avro records.
Layout descriptors come in two flavors:
concrete layouts and
layout updates.
Concrete layout descriptors
A concrete layout descriptors is an absolute, standalone description of a table layout, which
does not reference or build upon any previous version of the table layout. Column IDs have
been assigned to all locality groups, families and columns.
Names of tables, locality groups, families and column qualifiers must be valid identifiers.
Name validation occurs in
com.moz.fiji.schema.util.FijiNameValidator.
Validation rules
- Table names, locality group names, family names, and column names in a group-type family
must be valid identifiers (no punctuation or symbols).
Note: map-type family qualifiers are free-form, but do never appear in a table layout.
- Locality group names and aliases must be unique within the table.
- Family names and aliases must be unique within the table.
- Group-type family qualifiers must be unique within the family.
Layout update descriptors
A table layout update descriptor builds on a reference table layout, and describes layout
modification to apply on the reference layout.
The reference table layout is specified by writing the ID of the reference layout
(
TableLayoutDesc#layout_id) into the
TableLayoutDesc#reference_layout.
This mechanism prevents race conditions when updating the layout of a table.
The first layout of a newly created table has no reference layout.
During a layout update, the user may delete or declare new locality groups, families and/or
columns, or modify existing entities, by specifying the new layout. Update validation rules
are enforced to ensure compatibility (see Validation rules for updates below).
Entities may also be renamed, as long as uniqueness requirements are met.
Primary name updates must be explicitly annotated by setting the
renamedFrom field of
the entity being renamed.
The name of a table cannot be changed.
For example, suppose the reference layout contained one family
Info, containing a
column
Name, and the user wishes to add a new
Address column to the
Info family.
To perform this update, the user would create a layout update by starting with the existing
layout, setting the
reference_layout field to the
layout_id of the
current layout, and adding a new
ColumnDesc record describing the
Addresscolumn to the the
columns field of the
FamilyDesc for the
Info family.
The result of applying a layout update on top of a concrete reference layout is a new
concrete layout.
Validation rules for updates
Updates are subject to the same restrictions as concrete layout descriptors.
In addition:
- The type of a family (map-type or group-type) cannot be changed.
- A family cannot be moved into a different locality group.
- The encoding of Fiji cells (hash, UID, final) cannot be modified.
- The schema of a Fiji cell can only be changed to a schema that is compatible with
all the former schemas of the column. Schema compatibility requires that the new schema
allows decoding all former schemas associated to the column or the map-type family.
Row keys encoding
A row in a Fiji table is identified by its Fiji row key. Fiji row keys are converted into HBase
row keys according to the row key encoding specified in the table layout:
- Raw encoding: the user has direct control over the encoding of row keys in the HBase
table. In other words, the HBase row key is exactly the Fiji row key. These are used
when the user would like to use arrays of bytes as row keys.
- Hashed: Deprecated! The HBase row key is computed as a hash of a single String or
byte array component.
- Hash-prefixed: the HBase row key is computed as the concatenation of the hash of a
single String or byte array component.
- Formatted: the row key is comprised of one or more components. Each component can be
a string, a number or a hash of another component. The user will specify the size
of this hash. The user also specifies the actual order of the components in the key.
Hashing allows to spread the rows evenly across all the regions in the table. Specifying the size
of the hash gives the user fine grained control of how the data will be distributed.
Cell schema
Fiji cells are encoded according to a schema specified via
com.moz.fiji.schema.avro.CellSchema Avro records.
Fiji provides various cell encoding schemes:
- Hash: each Fiji cell is encoded as a hash of the Avro schema, followed by the binary
encoding of the Avro value.
- UID: each Fiji cell is encoded as the unique ID of the Avro schema, followed by the
binary encoding of the Avro value.
- Final: each Fiji cell is encoded as the binary encoding of the Avro value.
See
com.moz.fiji.schema.impl.AvroCellEncoderand
com.moz.fiji.schema.impl.AvroCellDecoderfor more implementation details.
Column IDs
Fiji allows the column names to be represented on HBase in multiple modes via
com.moz.fiji.schema.avro.ColumnNameTranslator Avro enumeration.
By default we use the shortened Fiji column name translation due to space efficiency.
Depending on compatability requirements with other HBase tools it may be desirable to use the
IDENTITY or HBASE_NATIVE column name translators.
SHORT Fiji column name translation:
For storage efficiency purposes, Fiji family and column names are translated into short
HBase column names by default.
This translation happens in
com.moz.fiji.schema.layout.impl.hbase.ShortColumnNameTranslatorand relies on
com.moz.fiji.schema.layout.impl.ColumnId.
Column IDs are assigned automatically by FijiTableLayout.
The user may specify column IDs manually. FijiTableLayout checks the consistency of column IDs.
Column IDs cannot be changed (a column ID change is equivalent to deleting the existing column
and then re-creating it as a new empty column).
IDENTITY Fiji column name translation:
For compatibility with other HBase tools, Fiji family and column names can be written to HBase
directly.
This translation happens in
com.moz.fiji.schema.layout.impl.hbase.IdentityColumnNameTranslatorIn this mode:
- Fiji locality groups are translated into HBase families.
- Fiji column families and qualifiers are combined to form the HBase
qualifier("family:qualifier").
HBASE_NATIVE Fiji column name translation:
For compatibility with existing HBase tables, the notion of a Fiji locality group can be
ignored, mapping Fiji family and column names directly to their HBase equivalents.
This translation happens in
com.moz.fiji.schema.layout.impl.hbase.HBaseNativeColumnNameTranslatorIn this mode:
- Fiji locality groups and column families are translated into HBase families.
- Additionally, Fiji locality groups must match the Fiji column families. This has the
side effect of requiring a one to one mapping between the Fiji locality groups and column
families.
- Fiji column qualifiers are combined to form the HBase qualifier.