How to use
NGramTokenFilter
in
org.apache.lucene.analysis.ngram

Best Java code snippets using org.apache.lucene.analysis.ngram.NGramTokenFilter (Showing top 12 results out of 315)

 @Override
 public TokenFilter create(TokenStream input) {
  return new NGramTokenFilter(input, minGramSize, maxGramSize, preserveOriginal);
 }
}

clearAttributes();
termAtt.copyBuffer(curTermBuffer, curPos, curGramSize);
if (hasIllegalOffsets) {

 return false;
state = captureState();
restoreState(state);
final int start = Character.offsetByCodePoints(curTermBuffer, 0, curTermLength, 0, curPos);
final int end = Character.offsetByCodePoints(curTermBuffer, 0, curTermLength, start, curGramSize);
restoreState(state);
posIncrAtt.setPositionIncrement(0);
termAtt.copyBuffer(curTermBuffer, 0, curTermLength);

/**
 * Creates NGramTokenFilter with given min and max n-grams.
 * @param input {@link TokenStream} holding the input to be tokenized
 * @param minGram the smallest n-gram to generate
 * @param maxGram the largest n-gram to generate
 */
public NGramTokenFilter(TokenStream input, int minGram, int maxGram) {
 super(new CodepointCountFilter(input, minGram, Integer.MAX_VALUE));
 this.charUtils = CharacterUtils.getInstance();
 if (minGram < 1) {
  throw new IllegalArgumentException("minGram must be greater than zero");
 }
 if (minGram > maxGram) {
  throw new IllegalArgumentException("minGram must not be greater than maxGram");
 }
 this.minGram = minGram;
 this.maxGram = maxGram;
 posIncAtt = addAttribute(PositionIncrementAttribute.class);
 posLenAtt = addAttribute(PositionLengthAttribute.class);
}

clearAttributes();
final int start = charUtils.offsetByCodePoints(curTermBuffer, 0, curTermLength, 0, curPos);
final int end = charUtils.offsetByCodePoints(curTermBuffer, 0, curTermLength, start, curGramSize);

  @Override
  protected TokenStreamComponents wrapComponents(String fieldName, TokenStreamComponents components) {
    return new TokenStreamComponents(components.getTokenizer(),
      new NGramTokenFilter(components.getTokenStream(),
        this.min, this.max));
  }
}

  @Override
  public TokenStream create(TokenStream tokenStream) {
    return new NGramTokenFilter(tokenStream, minGram, maxGram);
  }
}

  @Override
  public TokenStream create(TokenStream tokenStream, Version version) {
    return new NGramTokenFilter(tokenStream);
  }
},

 public NGramTokenFilter create(TokenStream input) {
  return new NGramTokenFilter(input, minGramSize, maxGramSize);
 }
}

@Override
public TokenStream create(TokenStream tokenStream) {
  return new NGramTokenFilter(tokenStream, minGram, maxGram);
}

 @Override
 public TokenFilter create(TokenStream input) {
  if (luceneMatchVersion.onOrAfter(Version.LUCENE_4_4_0)) {
   return new NGramTokenFilter(input, minGramSize, maxGramSize);
  }
  return new Lucene43NGramTokenFilter(input, minGramSize, maxGramSize);
 }
}

            + "Please change the filter name to [ngram] instead.");
  return new NGramTokenFilter(reader);
}));
filters.add(PreConfiguredTokenFilter.singleton("persian_normalization", true, PersianNormalizationFilter::new));

Javadoc

Tokenizes the input into n-grams of the given size(s). As of Lucene 4.4, this token filter:

handles supplementary characters correctly,
emits all n-grams for the same token at the same position,
does not modify offsets,
sorts n-grams by their offset in the original token first, then increasing length (meaning that "abc" will give "a", "ab", "abc", "b", "bc", "c").

If you were using this TokenFilter to perform partial highlighting, this won't work anymore since this filter doesn't update offsets. You should modify your analysis chain to use NGramTokenizer, and potentially override NGramTokenizer#isTokenChar(int) to perform pre-tokenization.

Most used methods

<init>
Creates an NGramTokenFilter that, for a given input term, produces all contained n-grams with length
clearAttributes
addAttribute
captureState
restoreState

Popular in Java

Creating JSON documents from java classes using gson
getOriginalFilename (MultipartFile)
Return the original filename in the client's filesystem.This may contain path information depending
setContentView (Activity)
findViewById (Activity)
PrintWriter (java.io)
Wraps either an existing OutputStream or an existing Writerand provides convenience methods for prin
URLConnection (java.net)
A connection to a URL for reading or writing. For HTTP connections, see HttpURLConnection for docume
NumberFormat (java.text)
The abstract base class for all number formats. This class provides the interface for formatting and
Scanner (java.util)
A parser that parses a text string of primitive types and strings with the help of regular expressio
Join (org.hibernate.mapping)
Location (org.springframework.beans.factory.parsing)
Class that models an arbitrary location in a Resource.Typically used to track the location of proble
Github Copilot alternatives

How to useNGramTokenFilter in org.apache.lucene.analysis.ngram

Best Java code snippets using org.apache.lucene.analysis.ngram.NGramTokenFilter (Showing top 12 results out of 315)

How to use
NGramTokenFilter
in
org.apache.lucene.analysis.ngram