Methods for static analysis of a query. There is one method which looks "up".
This corresponds to how we actually evaluation things (left to right in the
query plan). There are two methods which look "down". This corresponds to the
bottom-up evaluation semantics of SPARQL.
When determining the "known" bound variables on entry to a node we have to
look "up" the tree until we reach the outer most group. Note that named
subqueries DO NOT receive bindings from the places where they are INCLUDEd
into the query.
Analysis of Incoming "Known" Bound Variables (Looking Up)
Static analysis of the incoming "known" bound variables does NOT reflect
bottom up evaluation semantics. If a variable binding would not be observed
for bottom up evaluation semantics due to a badly designed left join pattern
then the AST MUST be rewritten to lift the badly designed left join into a
named subquery where it will enjoy effective bottom up evaluation semantics.
Analysis of "must" and "maybe" Bound Variables (Looking Down).
The following classes are producers of bindings and need to be handled by
static analysis when looking down the AST tree:
QueryBase
The static analysis of the definitely and maybe bound variables depends
on the projection and where clauses.
SubqueryRoot
SPARQL 1.1 subquery. This is just the static analysis of the QueryBase
for that subquery.
NamedSubqueryRoot
This is just the static analysis of the QueryBase for that named
subquery. Named subqueries are run without any visible bindings EXCEPT those
which are exogenous.
NamedSubqueryInclude
The static analysis of the INCLUDE is really the static analysis of the
NamedSubqueryRoot which produces that solution set. The incoming known
variables are ignored when doing the static analysis of the named subquery
root.
ServiceNode
The static analysis of the definitely and maybe bound variables depends
on the graph pattern for that service call. This is analyzed like a normal
graph pattern. Everything visible in the graph pattern is considered to be
projected. As far as I can tell, ServiceNodes are not run "as-bound" and
their static analysis is as if they were named subqueries (they have no known
bound incoming variables other than those communicated by their
BindingsClause).
StatementPatternNode
All variables are definitely bound UNLESS
StatementPatternNode#isOptional() is
true
.
Note: we sometimes attach a simple optional join to the parent group for
efficiency, at which point it becomes an "optional" statement pattern. An
optional statement pattern may also have zero or more
FilterNodes
associated with it.
JoinGroupNode
UnionNode
The definitely bound variables is the intersection of the definitely
bound variables in the child join groups. The maybe bound variables is the
union of the maybe bound variables in the child join groups.
AssignmentNode
BIND(expr AS var) in a group will not bind the variable if there is an
error when evaluating the value expression and does not fail the solution.
Thus BIND() in a group contributes to "maybe" bound variables.
Note: BIND() in a PROJECTION is handled differently as it is non-optional (if
the value expression results in an error the solution is dropped).
Projections are handled when we do the analysis of a QueryBase node since we
can see both the WHERE clause and the PROJECTION clauses at the same time.
See If the
evaluation of the expression produces an error, the variable remains unbound
for that solution.
IF()
*
IF
semantics : If evaluating the first argument raises an
error, then an error is raised for the evaluation of the IF expression. (This
greatly simplifies the analysis of the EBV of the IF value expressions, but
there is still uncertainty concerning whether the THEN or the ELSE is
executed for a given solution.) However,
IF
is not allowed to
conditionally bind a variable in the THEN/ELSE expressions so we do not have
to consider it here.
BOUND(var)
Filters which use BOUND() can not be pruned unless we can prove that the
variable is not (or is not) bound and also collapse the filter to a constant
after substituting either
true
or
false
in for the
BOUND() expression.
FILTERs
FILTERs are groups based on whether they can run before any required joins
(pre-), with the required join (join-), or after all joins (post-).
pre-
The pre-filters have all their required variables bound on entry to the
join group. They should be lifted into the parent join group.
join-
The join-filters will have all their required variables bound by the time
the required joins are done. These filters will wind up attached to the
appropriate required join. The specific filter/join attachments depend on the
join evaluation order.
post-
The post-filters might not have all of their required variables bound. We
have to wait until the last of the optionals joins has been evaluated before
we can evaluate any post-filters, so they run "last".
prune-
The prune-filters are those whose required variables CAN NOT be bound.
They should be pruned from the AST.
TODO We can probably cache the heck out of things on this class. There is no
reason to recompute the SA of the know or maybe/must bound variables until
there is an AST change, and the caller can build a new SA when that happens.
However, note that we must make the cache sets unmodifiable since there are a
lot of patterns which rely on computing the difference between two sets and
those can not have a side-effect on the cache.
We could also attach the
StaticAnalysis as an annotation on the
QueryRoot and provide a factory method for accessing it. That way we
would have reuse of the cached static analysis data. Each AST optimizer (or
the
ASTOptimizerList) would have to clear the cached
StaticAnalysis when producing a new
QueryRoot. Do this when
we add an ASTContainer to provide a better home for the queryStr, the parse
tree, the original AST, and the optimized AST.