@Override public AnalyzingSuggester getSuggester(Analyzer indexAnalyzer, Analyzer queryAnalyzer, ShardSuggestService.FieldType fieldType) throws Exception { AnalyzingSuggester analyzingSuggester = new AnalyzingSuggester(indexAnalyzer, queryAnalyzer, AnalyzingSuggester.EXACT_FIRST, 256, -1, fieldType.preservePositionIncrements()); analyzingSuggester.build(dictCache.getUnchecked(fieldType.field())); return analyzingSuggester; } }
final Automaton toAutomaton(final BytesRef surfaceForm, final TokenStreamToAutomaton ts2a) throws IOException { // Analyze surface form: Automaton automaton; try (TokenStream ts = indexAnalyzer.tokenStream("", surfaceForm.utf8ToString())) { // Create corresponding automaton: labels are bytes // from each analyzed token, with byte 0 used as // separator between tokens: automaton = ts2a.toAutomaton(ts); } automaton = replaceSep(automaton); automaton = convertAutomaton(automaton); // TODO: LUCENE-5660 re-enable this once we disallow massive suggestion strings // assert SpecialOperations.isFinite(automaton); // Get all paths from the automaton (there can be // more than one path, eg if the analyzer created a // graph using SynFilter or WDF): return automaton; }
final Automaton toLookupAutomaton(final CharSequence key) throws IOException { // TODO: is there a Reader from a CharSequence? // Turn tokenstream into automaton: Automaton automaton = null; try (TokenStream ts = queryAnalyzer.tokenStream("", key.toString())) { automaton = getTokenStreamToAutomaton().toAutomaton(ts); } automaton = replaceSep(automaton); // TODO: we can optimize this somewhat by determinizing // while we convert automaton = Operations.determinize(automaton, DEFAULT_MAX_DETERMINIZED_STATES); return automaton; }
Automaton lookupAutomaton = toLookupAutomaton(key); List<FSTUtil.Path<Pair<Long,BytesRef>>> prefixPaths = FSTUtil.intersectPrefixPaths(convertAutomaton(lookupAutomaton), fst); if (sameSurfaceForm(utf8Key, output2)) { results.add(getLookupResult(completion.output.output1, output2, spare)); break; prefixPaths = getFullPrefixPaths(prefixPaths, lookupAutomaton, fst); LookupResult result = getLookupResult(completion.output.output1, completion.output.output2, spare);
BytesRefBuilder scratch = new BytesRefBuilder(); TokenStreamToAutomaton ts2a = getTokenStreamToAutomaton(); new LimitedFiniteStringsIterator(toAutomaton(surfaceForm, ts2a), maxGraphExpansions); output.writeInt(encodeWeight(iterator.weight()));
@Override TokenStreamToAutomaton getTokenStreamToAutomaton() { final TokenStreamToAutomaton tsta = super.getTokenStreamToAutomaton(); tsta.setUnicodeArcs(unicodeAware); return tsta; }
public ShardSuggestStatisticsResponse getStatistics() { ShardSuggestStatisticsResponse shardSuggestStatisticsResponse = new ShardSuggestStatisticsResponse(shardId()); for (FieldType fieldType : analyzingSuggesterCache.asMap().keySet()) { long sizeInBytes = analyzingSuggesterCache.getIfPresent(fieldType).ramBytesUsed(); FstStats.FstIndexShardStats fstIndexShardStats = new FstStats.FstIndexShardStats(shardId, "analyzingsuggester", fieldType, sizeInBytes); shardSuggestStatisticsResponse.getFstIndexShardStats().add(fstIndexShardStats); } for (FieldType fieldType : fuzzySuggesterCache.asMap().keySet()) { long sizeInBytes = fuzzySuggesterCache.getIfPresent(fieldType).ramBytesUsed(); FstStats.FstIndexShardStats fstIndexShardStats = new FstStats.FstIndexShardStats(shardId, "fuzzysuggester", fieldType, sizeInBytes); shardSuggestStatisticsResponse.getFstIndexShardStats().add(fstIndexShardStats); } return shardSuggestStatisticsResponse; }
private LookupResult getLookupResult(Long output1, BytesRef output2, CharsRefBuilder spare) { LookupResult result; if (hasPayloads) { int sepIndex = -1; for(int i=0;i<output2.length;i++) { if (output2.bytes[output2.offset+i] == PAYLOAD_SEP) { sepIndex = i; break; } } assert sepIndex != -1; spare.grow(sepIndex); final int payloadLen = output2.length - sepIndex - 1; spare.copyUTF8Bytes(output2.bytes, output2.offset, sepIndex); BytesRef payload = new BytesRef(payloadLen); System.arraycopy(output2.bytes, sepIndex+1, payload.bytes, 0, payloadLen); payload.length = payloadLen; result = new LookupResult(spare.toString(), decodeWeight(output1), payload); } else { spare.grow(output2.length); spare.copyUTF8Bytes(output2); result = new LookupResult(spare.toString(), decodeWeight(output1)); } return result; }
private Collection<String> getSuggestions(ShardSuggestRequest shardSuggestRequest) throws IOException { List<LookupResult> lookupResults = Lists.newArrayList(); if ("full".equals(shardSuggestRequest.suggestType())) { AnalyzingSuggester analyzingSuggester = analyzingSuggesterCache.getUnchecked(new FieldType(shardSuggestRequest)); lookupResults.addAll(analyzingSuggester.lookup(shardSuggestRequest.term(), false, shardSuggestRequest.size())); } else if ("fuzzy".equals(shardSuggestRequest.suggestType())) { lookupResults.addAll(fuzzySuggesterCache.getUnchecked(new FieldType(shardSuggestRequest)) .lookup(shardSuggestRequest.term(), false, shardSuggestRequest.size())); } else { lookupResults.addAll(lookupCache.getUnchecked(shardSuggestRequest.field()) .lookup(shardSuggestRequest.term(), true, shardSuggestRequest.size() + 1)); Collection<String> suggestions = Collections2.transform(lookupResults, new LookupResultToStringFunction()); float similarity = shardSuggestRequest.similarity(); if (similarity < 1.0f && suggestions.size() < shardSuggestRequest.size()) { suggestions = Lists.newArrayList(suggestions); suggestions.addAll(getSimilarSuggestions(shardSuggestRequest)); } return suggestions; } return Collections2.transform(lookupResults, new LookupResultToStringFunction()); }