Utility method to (re-)build the subject-based full text index. This is a
high latency operation for a database of any significant size. You must
be using the unisolated view of the
AbstractTripleStore for this
operation.
AbstractTripleStore.Options#TEXT_INDEX must be
enabled. This operation is only supported when the
ITextIndexeruses the
FullTextIndex class.
The subject-based full text index is one that rolls up normal
object-based full text index into a similarly structured index that
captures relevancy across subjects. Instead of
(t,s) => s.len, termWeight
Where s is the subject's IV. The term weight has the same interpretation,
but it is across all literals which are linked to that subject and which
contain the given token. This index basically pre-computes the (?s ?p ?o)
join that sometimes follows the (?o bd:search "xyz") request.
Truth Maintenance
We will need to perform truth maintenance on the subject-centric text
index, that is - the index will need to be updated as statements are
added and removed (to the extent that those statements involving a
literal in the object position). Adding a statement is the easier case
because we will never need to remove entries from the index, we can
simply write over them with new relevance values. All that is involved
with truth maintenance for adding a statement is taking a post- commit
snapshot of the subject in the statement and running it through the
indexer (a "subject-refresh").
The same "subject-refresh" will be necessary for truth maintenance for
removal, but an additional step will be necessary beforehand - the index
entries associated with the deleted subject/object (tokens+subject) will
need to be removed in case the token appears only in the removed literal.
After this pruning step the subject can be refreshed in the index exactly
the same as for truth maintenance on add.
It looks like the right place to hook in truth maintenance for add is
AbstractTripleStore#addStatements(AbstractTripleStore,boolean,IChunkedOrderedIterator,com.bigdata.relation.accesspath.IElementFilter)after the ISPOs are added to the SPORelation. Likewise, the place to hook
in truth maintenance for delete is
AbstractTripleStore#removeStatements(IChunkedOrderedIterator,boolean)after the ISPOs are removed from the SPORelation.