Attempt to translate the join group using a merge join.
We recognize a merge join when there an INCLUDE followed by either a
series of INCLUDEs -or- a series of OPTIONAL {INCLUDE}
s in
the group. The initial INCLUDE becomes the primary source for the merge
join (the hub). Each INCLUDE after the first must have the same join
variables. If the OPTIONAL {INCLUDE}
s pattern is recognized
then the MERGE JOIN is itself OPTIONAL. The sequences of such INCLUDEs in
this group is then translated into a single MERGE JOIN operator.
Note: The critical pattern for a merge join is that we have a hash index
against which we may join several other hash indices. To join, of course,
the hash indices must all have the same join variables. The pattern
recognized here is based on an initial INCLUDE (defining the first hash
index) followed by several either required or OPTIONAL INCLUDEs, which
are the additional hash indices. The merge join itself is the N-way
solution set hash join and replaces the sequence of 2-way solution set
hash joins which we would otherwise do for those solution sets. It is
more efficient because the output of each 2-way join is fed into the next
2-way join, which blows up the input cardinality for the next 2-way hash
join. While the merge join will consider the same combinations of
solutions, it does so with a very efficient linear pass over the hash
indices.
For the JVM Merge Join operator we have to do a SORT first, but the HTree
imposes a consistent ordering by the hash bits so we can go directly to
the linear pass. (It is actually log linear due to the tree structure of
the HTree, which we presume has basically the same costs as a function of
depth as a B+Tree, but it is against main memory and it is a sequential
scan of the index so it should be effectively linear.)