See if we should actually SEEK or rather just SKIP to the next Cell (see HBASE-13109).
ScanQueryMatcher may issue SEEK hints, such as seek to next column, next row,
or seek to an arbitrary seek key. This method decides whether a seek is the most efficient
_actual_ way to get us to the requested cell (SEEKs are more expensive than SKIP, SKIP,
SKIP inside the current, loaded block).
It does this by looking at the next indexed key of the current HFile. This key
is then compared with the _SEEK_ key, where a SEEK key is an artificial 'last possible key
on the row' (only in here, we avoid actually creating a SEEK key; in the compare we work with
the current Cell but compare as though it were a seek key; see down in
matcher.compareKeyForNextRow, etc). If the compare gets us onto the
next block we *_SEEK, otherwise we just SKIP to the next requested cell.
Other notes:
- Rows can straddle block boundaries
- Versions of columns can straddle block boundaries (i.e. column C1 at T1 might be in a
different block than column C1 at T2)
- We want to SKIP if the chance is high that we'll find the desired Cell after a
few SKIPs...
- We want to SEEK when the chance is high that we'll be able to seek
past many Cells, especially if we know we need to go to the next block.
A good proxy (best effort) to determine whether SKIP is better than SEEK is whether
we'll likely end up seeking to the next block (or past the next block) to get our next column.
Example:
| BLOCK 1 | BLOCK 2 |
| r1/c1, r1/c2, r1/c3 | r1/c4, r1/c5, r2/c1 |
^ ^
| |
Next Index Key SEEK_NEXT_ROW (before r2/c1)
| BLOCK 1 | BLOCK 2 |
| r1/c1/t5, r1/c1/t4, r1/c1/t3 | r1/c1/t2, r1/c1/T1, r1/c2/T3 |
^ ^
| |
Next Index Key SEEK_NEXT_COL
Now imagine we want columns c1 and c3 (see first diagram above), the 'Next Index Key' of r1/c4
is > r1/c3 so we should seek to get to the c1 on the next row, r2. In second case, say we only
want one version of c1, after we have it, a SEEK_COL will be issued to get to c2. Looking at
the 'Next Index Key', it would land us in the next block, so we should SEEK. In other scenarios
where the SEEK will not land us in the next block, it is very likely better to issues a series
of SKIPs.