How to use
tokenize
method
in
eu.monnetproject.translation.Tokenizer

Best Java code snippets using eu.monnetproject.translation.Tokenizer.tokenize (Showing top 5 results out of 315)

public String normalize(String text) {
  text = text.toLowerCase();
  text = TextNormalizer.convertToUnicode(text);        
  text = TextNormalizer.joinStrings(tokenizer.tokenize(text));
  text = TextNormalizer.removePunctuations(text);
  text = text.replaceAll("\n", " ").trim();
  return text;	
}

private String normalize(String text, Language language){	
  text = text.toLowerCase();
  text = TextNormalizer.convertToUnicode(text);        
  if(!langTokenizerMap.containsKey(language))
    langTokenizerMap.put(language, TextNormalizer.getTokenizer(language));
  List<String> tokens = langTokenizerMap.get(language).tokenize(text);
  text = TextNormalizer.joinStrings(tokens);
  text = TextNormalizer.removePunctuations(text);
  text = text.replaceAll("\n", " ").trim();		
  text = TextNormalizer.deAccent(text);
  return text;
}

private String normalize(String text, Language language){	
  text = text.toLowerCase();
  text = TextNormalizer.convertToUnicode(text);        
  if(!langTokenizerMap.containsKey(language))
    langTokenizerMap.put(language, TextNormalizer.getTokenizer(language));
  List<String> tokens = langTokenizerMap.get(language).tokenize(text);
  text = TextNormalizer.joinStrings(tokens);
  text = TextNormalizer.removePunctuations(text);
  text = text.replaceAll("\n", " ").trim();		
  text = TextNormalizer.deAccent(text);
  return text;
}

if (labels.containsKey(srcLang)) {
  for (String label : labels.get(srcLang)) {
    for (String token : tokenizer.tokenize(label.toLowerCase())) {
      ontoDocTokens.add(token);

private List<Translation> doDecoding(TranslatorSetup setup, Decoder decoder, EntityLabel el, int options) {
  final ChunkList chunkList = setup.chunker(el.entity).chunk(tokenizer.tokenize(el.srcLabel));
  final PhraseTableImpl pt = new PhraseTableImpl(setup.sourceLanguage(), setup.targetLanguage(), "mert_table");
  for (Chunk chunk : chunkList) {
    for (TranslationSource source : setup.sources()) {
      pt.addAll(source.candidates(chunk));
    }
  }
  PhraseTable rerankedTable = pt;
  for (TranslationFeaturizer featurizer : setup.featurizers(el.entity)) {
    try {
      rerankedTable = featurizer.featurize(rerankedTable, el.entity);
    } catch (Exception x) {
    }
  }
  final List<Translation> decoded = (options & OntologyTranslator.DECODE_FAST) == 0 ?
      decoder.decode(Arrays.asList(el.srcLabel.split("\\s+")), rerankedTable, setup.featureNames(), nBest) :
      decoder.decodeFast(Arrays.asList(el.srcLabel.split("\\s+")), rerankedTable, setup.featureNames(), nBest);
  return decoded;
}

Popular methods of Tokenizer

Popular in Java

Updating database using SQL prepared statement
findViewById (Activity)
setRequestProperty (URLConnection)
scheduleAtFixedRate (Timer)
System (java.lang)
Provides access to system-related information and resources including standard input and output. Ena
ArrayList (java.util)
ArrayList is an implementation of List, backed by an array. All optional operations including adding
Collections (java.util)
This class consists exclusively of static methods that operate on or return collections. It contains
SortedMap (java.util)
A map that has its keys ordered. The sorting is according to either the natural ordering of its keys
Collectors (java.util.stream)
XPath (javax.xml.xpath)
XPath provides access to the XPath evaluation environment and expressions. Evaluation of XPath Expr
Top Sublime Text plugins

How to use tokenizemethodin eu.monnetproject.translation.Tokenizer

Best Java code snippets using eu.monnetproject.translation.Tokenizer.tokenize (Showing top 5 results out of 315)

How to use
tokenize
method
in
eu.monnetproject.translation.Tokenizer