- <init>
- explodeRecordRDDWithFileComparisons
For each incoming record, produce N output records, 1 each for each file against
which the record's
- loadInvolvedFiles
Load all involved files as pair RDD.
- autoComputeParallelism
The index lookup can be skewed in three dimensions : #files, #partitions,
#records To be able to smo
- fetchRecordLocation
- findMatchingFilesForRecordKeys
Find out pair. All workload grouped by file-level. Join PairRDD(PartitionPath,
RecordKey) and PairRD
- lookupIndex
Lookup the location for each record key and return the pair for all record keys
already present and
- shouldCompareWithFile
if we dont have key ranges, then also we need to compare against the file. no
other choice if we do,
- tagLocation
- tagLocationBacktoRecords
Tag the back to the original HoodieRecord RDD.