@Override public List<ByteArray> reEncodeDictionary(List<ByteArray> value, MeasureDesc measureDesc, Map<TblColRef, Dictionary<String>> oldDicts, Map<TblColRef, Dictionary<String>> newDicts) { TblColRef colRef = getRawColumn(measureDesc.getFunction()); Dictionary<String> sourceDict = oldDicts.get(colRef); Dictionary<String> mergedDict = newDicts.get(colRef); int valueSize = value.size(); byte[] newIdBuf = new byte[valueSize * mergedDict.getSizeOfId()]; int bufOffset = 0; for (ByteArray c : value) { int oldId = BytesUtil.readUnsigned(c.array(), c.offset(), c.length()); int newId; String v = sourceDict.getValueFromId(oldId); if (v == null) { newId = mergedDict.nullId(); } else { newId = mergedDict.getIdFromValue(v); } BytesUtil.writeUnsigned(newId, newIdBuf, bufOffset, mergedDict.getSizeOfId()); c.reset(newIdBuf, bufOffset, mergedDict.getSizeOfId()); bufOffset += mergedDict.getSizeOfId(); } return value; } };
public List<T> enumeratorValues() { List<T> ret = Lists.newArrayListWithExpectedSize(getSize()); for (int i = getMinId(); i <= getMaxId(); i++) { ret.add(getValueFromId(i)); } return ret; }
public static void serialize(Dictionary<?> dict, OutputStream outputStream) { try { DataOutputStream out = new DataOutputStream(outputStream); out.writeUTF(dict.getClass().getName()); dict.write(out); out.flush(); } catch (IOException e) { throw new RuntimeException(e); } }
public ShrunkenDictionary<T> build(ValueSerializer<T> valueSerializer) { return new ShrunkenDictionary<>(valueSerializer, fullDict.getMinId(), fullDict.getMaxId(), fullDict.getSizeOfId(), fullDict.getSizeOfValue(), valueToIdMap); } }
private TupleFilter translateFunctionTupleFilter(BuiltInFunctionTupleFilter builtInFunctionTupleFilter) { if (!builtInFunctionTupleFilter.isValid()) return null; TblColRef columnRef = builtInFunctionTupleFilter.getColumn(); Dictionary<?> dict = dimEncMap.getDictionary(columnRef); if (dict == null) return null; CompareTupleFilter translated = new CompareTupleFilter(builtInFunctionTupleFilter.isReversed() ? FilterOperatorEnum.NOTIN : FilterOperatorEnum.IN); translated.addChild(new ColumnTupleFilter(columnRef)); try { int translatedInClauseMaxSize = KylinConfig.getInstanceFromEnv().getTranslatedInClauseMaxSize(); for (int i = dict.getMinId(); i <= dict.getMaxId(); i++) { Object dictVal = dict.getValueFromId(i); if ((Boolean) builtInFunctionTupleFilter.invokeFunction(dictVal)) { translated.addChild(new ConstantTupleFilter(dictVal)); if (translated.getChildren().size() > translatedInClauseMaxSize) { return null; } } } logger.debug("getting a in clause with {} children", translated.getChildren().size()); } catch (Exception e) { logger.debug(e.getMessage()); return null; } return translated; }
@Override public void serialize(Object value, ByteBuffer buf) { int id = dict.getIdFromValue(value == null ? null : value.toString(), roundingFlag); BytesUtil.writeUnsigned(id, dict.getSizeOfId(), buf); }
Dictionary<?> mergedDict = dictMgr.getDictionary(mergedCubeSegment.getDictResPath(col)); while (sourceDict.getSizeOfValue() > newKeyBuf.length - bufOffset || mergedDict.getSizeOfValue() > newKeyBuf.length - bufOffset) { byte[] oldBuf = newKeyBuf; newKeyBuf = new byte[2 * newKeyBuf.length]; int idInMergedDict; int size = sourceDict.getValueBytesFromId(idInSourceDict, newKeyBuf, bufOffset); if (size < 0) { idInMergedDict = mergedDict.nullId(); } else { idInMergedDict = mergedDict.getIdFromValueBytes(newKeyBuf, bufOffset, size); BytesUtil.writeUnsigned(idInMergedDict, newKeyBuf, bufOffset, mergedDict.getSizeOfId()); bufOffset += mergedDict.getSizeOfId(); } else {
public boolean isNullId(int id) { int nullId = NULL_ID[getSizeOfId()]; return (nullId & id) == nullId; }
/** * Convenient form of <code>getIdFromValue(value, 0)</code> */ final public int getIdFromValue(T value) throws IllegalArgumentException { return getIdFromValue(value, 0); }
@Override public Object deserialize(ByteBuffer in) { int id = BytesUtil.readUnsigned(in, dict.getSizeOfId()); return dict.getValueFromId(id); }
buf.append(", "); buf.append(s.toString()).append("=>").append(dict.getIdFromValue(s)); logger.debug("Dictionary cardinality: " + dict.getSize()); logger.debug("Dictionary builder class: " + builder.getClass().getName()); logger.debug("Dictionary class: " + dict.getClass().getName());
DataOutputStream dos = new DataOutputStream(bos); shrunkenDict.write(dos); dShrunkenDict.readFields(dis); Assert.assertEquals(dict.getIdFromValue(value), dShrunkenDict.getIdFromValue(value));
dict.readFields(in); int[] rowIndex = new int[n]; for (int i = 0; i < n; i++) { rowIndex[i] = dict.getIdFromValue(row[i]);
@Override public String[] getRow() { int[] rowIndex = rowIndices.get(i); String[] row = new String[rowIndex.length]; for (int x = 0; x < row.length; x++) { row[x] = dict.getValueFromId(rowIndex[x]); } return row; }
private void saveDictionaryInfo(CubeSegment cubeSeg, TblColRef col, DictionaryInfo dictInfo) throws IOException { if (dictInfo == null) return; // work on copy instead of cached objects CubeInstance cubeCopy = cubeSeg.getCubeInstance().latestCopyForWrite(); // get a latest copy CubeSegment segCopy = cubeCopy.getSegmentById(cubeSeg.getUuid()); Dictionary<?> dict = dictInfo.getDictionaryObject(); segCopy.putDictResPath(col, dictInfo.getResourcePath()); segCopy.getRowkeyStats().add(new Object[] { col.getIdentity(), dict.getSize(), dict.getSizeOfId() }); CubeUpdate update = new CubeUpdate(cubeCopy); update.setToUpdateSegs(segCopy); updateCube(update); }
/** * Returns the ID integer of given value. In case of not found * <p> * - if roundingFlag=0, throw IllegalArgumentException; <br> * - if roundingFlag<0, the closest smaller ID integer if exist; <br> * - if roundingFlag>0, the closest bigger ID integer if exist. <br> * <p> * The implementation often has cache, thus faster than the byte[] version getIdFromValueBytes() * * @throws IllegalArgumentException * if value is not found in dictionary and rounding is off; * or if rounding cannot find a smaller or bigger ID */ final public int getIdFromValue(T value, int roundingFlag) throws IllegalArgumentException { if (isNullObjectForm(value)) return nullId(); int id = getIdFromValueImpl(value, roundingFlag); if (id == -1) { throw new IllegalArgumentException("Value : " + value + " not exists"); } return id; }
void writeData(DataOutput out) throws IOException { out.writeInt(rowIndices.size()); if (rowIndices.size() > 0) { int n = rowIndices.get(0).length; out.writeInt(n); if (this.useDictionary == true) { dict.write(out); for (int i = 0; i < rowIndices.size(); i++) { int[] row = rowIndices.get(i); for (int j = 0; j < n; j++) { out.writeInt(row[j]); } } } else { for (int i = 0; i < rowIndices.size(); i++) { int[] row = rowIndices.get(i); for (int j = 0; j < n; j++) { // NULL_STR is tricky, but we don't want to break the current snapshots out.writeUTF(dict.getValueFromId(row[j]) == null ? NULL_STR : dict.getValueFromId(row[j])); } } } } }
@Override public boolean moveNext() throws IOException { String minValue = null; int curDictIndex = 0; // multi-merge dictionary forest for (int i = 0; i < dictionaryList.size(); i++) { Dictionary<String> dict = dictionaryList.get(i); if (dict == null) continue; int curKey = curKeys.get(i); if (curKey > dict.getMaxId()) continue; String curValue = dict.getValueFromId(curKey); if (minValue == null || dataType.compare(minValue, curValue) > 0) { minValue = curValue; curDictIndex = i; } } if (minValue == null) { curValue = null; return false; } curValue = minValue; curKeys.set(curDictIndex, curKeys.get(curDictIndex) + 1); return true; }
@Override public boolean moveNext() throws IOException { if (curDictIndex < dictionaryList.size() && curKey <= curDict.getMaxId()) { byte[] buffer = new byte[curDict.getSizeOfValue()]; int size = curDict.getValueBytesFromId(curKey, buffer, 0); curValue = Bytes.copy(buffer, 0, size); if (++curKey > curDict.getMaxId()) { if (++curDictIndex < dictionaryList.size()) { curDict = dictionaryList.get(curDictIndex); curKey = curDict.getMinId(); } } return true; } curValue = null; return false; }