How to use
PatternTokenizer
in
org.apache.solr.analysis

Best Java code snippets using org.apache.solr.analysis.PatternTokenizer (Showing top 4 results out of 315)

 @Override
 public TokenStream tokenStream(String fieldName, Reader reader) {
  TokenStream result = null;
  try {
   result = new PatternTokenizer(reader, pattern, 0);
  } catch (IOException e) {
  }
  return result;
 }
};

@Override
public void end() throws IOException {
 final int ofs = correctOffset(str.length());
 offsetAtt.setOffset(ofs, ofs);
}

 termAtt.setTermBuffer(match);
 index = matcher.start(group);
 offsetAtt.setOffset(correctOffset(index), correctOffset(matcher.end(group)));
 return true;
  offsetAtt.setOffset(correctOffset(index), correctOffset(matcher.start()));
  index = matcher.end();
  return true;
offsetAtt.setOffset(correctOffset(index), correctOffset(str.length()));

/**
 * Split the input using configured pattern
 */
public Tokenizer create(final Reader in) {
 try {
  return new PatternTokenizer(in, pattern, group);
 } catch( IOException ex ) {
  throw new SolrException( SolrException.ErrorCode.SERVER_ERROR, ex );
 }
}

Javadoc

This tokenizer uses regex pattern matching to construct distinct tokens for the input stream. It takes two arguments: "pattern" and "group".

"pattern" is the regular expression.
"group" says which group to extract into tokens.

group=-1 (the default) is equivalent to "split". In this case, the tokens will be equivalent to the output from (without empty tokens): String#split(java.lang.String)

Using group >= 0 selects the matching group as the token. For example, if you have:

 
pattern = \'([^\']+)\' 
group = 0 
input = aaa 'bbb' 'ccc'

the output will be two tokens: 'bbb' and 'ccc' (including the ' marks). With the same input but using group=1, the output would be: bbb and ccc (no ' marks)

NOTE: This Tokenizer does not output tokens that are of zero length.

Most used methods

<init>
creates a new PatternTokenizer returning tokens from group (-1 for split functionality)
correctOffset

Popular in Java

Running tasks concurrently on multiple threads
findViewById (Activity)
compareTo (BigDecimal)
getApplicationContext (Context)
BitSet (java.util)
The BitSet class implements abit array [http://en.wikipedia.org/wiki/Bit_array]. Each element is eit
Comparator (java.util)
A Comparator is used to compare two objects to determine their ordering with respect to each other.
Font (java.awt)
The Font class represents fonts, which are used to render text in a visible way. A font provides the
JCheckBox (javax.swing)
JOptionPane (javax.swing)
Scheduler (org.quartz)
This is the main interface of a Quartz Scheduler. A Scheduler maintains a registry of org.quartz.Job
Top Sublime Text plugins

How to usePatternTokenizer in org.apache.solr.analysis

Best Java code snippets using org.apache.solr.analysis.PatternTokenizer (Showing top 4 results out of 315)

How to use
PatternTokenizer
in
org.apache.solr.analysis