An indexer that saves block information for the indexed terms. Block information is usually recorded in terms of relative term positions (position 1, positions 2, etc),
however, since 2.2, Terrier supports the presence of "marker terms" during indexing which are used to increment the block counter.
Properties:
- blocks.size - How many terms should be in one block. If you want to use phrasal search, this need to be 1 (default).
- blocks.max - Maximum number of blocks in a document. After this number of blocks, all subsequent terms will be in the same block. Default 100,000
- block.indexing - This class should only be used if the block.indexing property is set.
- indexing.max.encoded.documentindex.docs - how many docs before the DocumentIndexEncoded is dropped in favour of the DocumentIndex (on disk implementation).
- See Also: Properties in org.terrier.indexing.Indexer and org.terrier.indexing.BasicIndexer
Markered Blocks
Markers are terms (artificially inserted or otherwise into the term stream that are used to denote when the block counter should
be incremented. This functionality is enabled using the block.delimiters.enabled property, while the terms are specified using a comma delimited fashion with the
block.delimiters property. The following lists the properties:
- block.delimiters.enabled - enabled markered blocks. Defaults to false, set to true to enable.
- block.delimiters - comma delimited list of terms that are markers. Defaults to empty. Terms are lowercased is lowercase is set to true (default).
- block.delimiters.index.terms - set to true if markers terms should actually be indexed. Defaults to false.
- block.delimiters.index.doclength - set to true if markers terms should contribute to document length. Defaults to false, only has effect if
block.delimiters.index.terms is set.