How to use
TokenizerState
in
edu.illinois.cs.cogcomp.nlp.tokenizer

Best Java code snippets using edu.illinois.cs.cogcomp.nlp.tokenizer.TokenizerState (Showing top 7 results out of 315)

/**
 * Create a new span.
 *
 * @param s the state at the start.
 */
State(TokenizerState s) {
  this.stateindex = s.ordinal();
}

/**
 * Create a new span.
 * 
 * @param s the state at the start.
 */
State(TokenizerState s) {
  this.stateindex = s.ordinal();
}

/**
 * Create a new span.
 * 
 * @param s the state at the start.
 */
State(TokenizerState s) {
  this.stateindex = s.ordinal();
}

@Override
public Pair<String[], IntPair[]> tokenizeSentence(String sentence) {
  
  // parse the test
  TokenizerStateMachine tsm = new TokenizerStateMachine(splitOnDash, splitOnSecondNewline);
  tsm.parseText(sentence);
  // construct the data needed for the tokenization.
  int words = 0;
  for (State s : tsm.completed) {
    int idx = s.stateIndex();
    if (idx != TokenizerState.IN_SENTENCE.ordinal())
      words++;
  }
  IntPair[] wordOffsets = new IntPair[words];
  String[] tokens = new String[words];
  int wordIndex = 0;
  for (State s : tsm.completed) {
    State ms = (State) s;
    if (s.stateIndex() != TokenizerState.IN_SENTENCE.ordinal()) {
      tokens[wordIndex] = new String(tsm.text, ms.start, ms.end - ms.start);
      wordOffsets[wordIndex++] = new IntPair(ms.start, ms.end);
    }
  }
  return new Pair<>(tokens, wordOffsets);
}

@Override
public Pair<String[], IntPair[]> tokenizeSentence(String sentence) {
  
  // parse the test
  TokenizerStateMachine tsm = new TokenizerStateMachine(splitOnDash, splitOnSecondNewline);
  tsm.parseText(sentence);
  // construct the data needed for the tokenization.
  int words = 0;
  for (State s : tsm.completed) {
    int idx = s.stateIndex();
    if (idx != TokenizerState.IN_SENTENCE.ordinal())
      words++;
  }
  IntPair[] wordOffsets = new IntPair[words];
  String[] tokens = new String[words];
  int wordIndex = 0;
  for (State s : tsm.completed) {
    State ms = (State) s;
    if (s.stateIndex() != TokenizerState.IN_SENTENCE.ordinal()) {
      tokens[wordIndex] = new String(tsm.text, ms.start, ms.end - ms.start);
      wordOffsets[wordIndex++] = new IntPair(ms.start, ms.end);
    }
  }
  return new Pair<>(tokens, wordOffsets);
}

for (State s : tsm.completed) {
  int idx = s.stateIndex();
  if (idx == TokenizerState.IN_SENTENCE.ordinal())
    sentences++;
  else
for (State s : tsm.completed) {
  State ms = (State) s;
  if (s.stateIndex() == TokenizerState.IN_SENTENCE.ordinal())
    sentenceEnds[sentenceIndex++] = wordIndex;
  else {

for (State s : tsm.completed) {
  int idx = s.stateIndex();
  if (idx == TokenizerState.IN_SENTENCE.ordinal())
    sentences++;
  else
for (State s : tsm.completed) {
  State ms = (State) s;
  if (s.stateIndex() == TokenizerState.IN_SENTENCE.ordinal())
    sentenceEnds[sentenceIndex++] = wordIndex;
  else {

Javadoc

State for the state machine.

Most used methods

ordinal

Popular in Java

Finding current android device location
setScale (BigDecimal)
runOnUiThread (Activity)
onCreateOptionsMenu (Activity)
MessageDigest (java.security)
Uses a one-way hash function to turn an arbitrary number of bytes into a fixed-length byte sequence.
Stack (java.util)
Stack is a Last-In/First-Out(LIFO) data structure which represents a stack of objects. It enables u
DataSource (javax.sql)
An interface for the creation of Connection objects which represent a connection to a database. This
BorderLayout (java.awt)
A border layout lays out a container, arranging and resizing its components to fit in five regions:
ImageIO (javax.imageio)
Join (org.hibernate.mapping)
Top 12 Jupyter Notebook extensions

How to useTokenizerState in edu.illinois.cs.cogcomp.nlp.tokenizer

Best Java code snippets using edu.illinois.cs.cogcomp.nlp.tokenizer.TokenizerState (Showing top 7 results out of 315)

How to use
TokenizerState
in
edu.illinois.cs.cogcomp.nlp.tokenizer