A class managing execution of concurrent queries against a local
IIndexManager.
Design notes
Much of the complexity of the current approach owes itself to having to run a
separate task for each join for each shard in order to have the appropriate
lock when running against the unisolated shard view. This also means that the
join task is running inside of the concurrency manager and hence has the
local view of the shard.
The main, and perhaps the only, reason why we run unisolated rules is during
closure, when we query against the unisolated indices and then write the
entailments back on the unisolated indices.
Supporting closure has always been complicated. This complexity is mostly
handled by ProgramTask#executeMutation() and
AbstractTripleStore#newJoinNexusFactory() which play games with the
timestamps used to read and write on the database, with commit points
designed to create visibility for tuples written by a mutation rule, and with
the automated advance of the read timestamp for the query in each closure
pass in order to make newly committed tuples visible to subsequent rounds of
closure. For scale-out, we do shard-wise auto commits so we always have a
commit point which makes each write visible and the read timestamp is
actually a read-only transaction which prevents the historical data we need
during a closure round from being released as we are driving updates onto the
federation. For the RWStore, we are having a similar problem (in the HA
branch since that is where we are working on the RWStore) where historically
allocated records were being released as writes drove updates on the indices.
Again, we "solved" the problem for the RWStore using a commit point followed
by a read-only transaction reading on that commit point to hold onto the view
on which the next closure round needs to read (this uncovered a problem with
the RWStore and transaction service interaction which Martyn is currently
working to resolve through a combination of shadow allocators and deferred
deletes which are processed once the release time is advanced by the
transaction service).
The WORM does not have some of these problems with closure because we never
delete history, so we do not need to create a commit point and a read-behind
transaction. However, the WORM would have problems with concurrent access to
the unisolated indices except that we hack that problem through the
transparent use of the UnisolatedReadWriteIndex, which allows multiple
threads to access the same unisolated index view using a read/write lock
pattern (concurrent readers are allowed, but there is only one writer and it
has exclusive access when it is running). This works out because we never run
closure operations against the WORM through the concurrency manager. If we
did, we would have to create a commit point after each mutation and use a
read-behind transaction to prevent concurrent access to the unisolated index.
The main advantage that I can see of the current complexity is that it allows
us to do load+closure as a single operation on the WORM, resulting in a
single commit point. This makes that operation ACID without having to use
full read/write transactions. This is how we gain the ACID contract for the
standalone Journal in the SAIL for the WORM. Of course, the SAIL does not
have that contract for the RWStore because we have to do the commit and
read-behind transaction in order to have visibility and avoid concurrent
access to the unisolated index (by reading behind on the last commit point).
I think that the reality is even one step more complicated. When doing truth
maintenance (incremental closure), we bring the temporary graph to a fixed
point (the rules write on the temp store) and then apply the delta in a
single write to the database. That suggests that incremental truth
maintenance would continue to be ACID, but that database-at-once-closure
would be round-wise ACID.
So, I would like to suggest that we break ACID for database-at-once-closure
and always follow the pattern of (1) do a commit before each round of
closure; and (2) create a read-behind transaction to prevent the release of
that commit point as we drive writes onto the indices. If we follow this
pattern then we can write on the unisolated indices without conflict and read
on the historical views without conflict. Since there will be a commit point
before each mutation rule runs (which corresponds to a closure round),
database-at-once-closure will be atomic within a round, but will not be a
single atomic operation. Per above, I think that we would retain the ACID
property for incremental truth maintenance against a WORM or RW mode Journal.
----
The advantage of this proposal (commit before each mutation rule and run
query against a read-behind transaction) is that this could enormously
simplify how we execute joins.
Right now, we use a factory pattern to create a join task on each node for
each shard for which that node receives binding sets for a query. The main
reason for doing this is to gain the appropriate lock for the unisolated
index. If we never run a query against the unisolated index then we can go
around the concurrency manager and run a single "query manager" task for all
joins for all shards for all queries. This has some great benefits which I
will go into below.
That "query manager" task would be responsible for accepting buffers
containing elements or binding sets from other nodes and scheduling
consumption of those data based on various criteria (order of arrival,
priority, buffer resource requirements, timeout, etc.). This manager task
could use a fork join pool to execute light weight operations (NIO,
formulation of access paths from binding sets, mapping of binding sets onto
shards, joining a chunk already read from an access path against a binding
set, etc). Operations which touch the disk need to run in their own thread
(until we get Java 7 async file IO, which is already available in a preview
library). We could handle that by queuing those operations against a fixed
size thread pool for reads.
This is a radical change in how we handle distributed query execution, but I
think that it could have a huge payoff by reducing the complexity of the join
logic, making it significantly easier to execute different kinds of join
operations, reducing the overhead for acquiring locks for the unisolated
index views, reducing the #of threads consumed by joins (from one per shard
per join per query to a fixed pool of N threads for reads), etc. It would
centralize the management of resources on each node and make it possible for
us to handle things like join termination by simply purging data from the
query manager task for the terminated join.