How to use
normalizeApostrophes
method
in
zemberek.core.text.TextUtil

Best Java code snippets using zemberek.core.text.TextUtil.normalizeApostrophes (Showing top 6 results out of 315)

public String normalize(String input) {
 StringBuilder sb = new StringBuilder(input.length());
 input = TextUtil.normalizeApostrophes(input.toLowerCase(TR));
 for (char c : input.toCharArray()) {
  if (letterMap.containsKey(c) || c == '.' || c == '-') {
   sb.append(c);
  } else {
   sb.append("?");
  }
 }
 return sb.toString();
}

public static String normalizeForAnalysis(String word) {
 String s = word.toLowerCase(Turkish.LOCALE);
 s = TurkishAlphabet.INSTANCE.normalizeCircumflex(s);
 String noDot = s.replace(".", "");
 if (noDot.length() == 0) {
  noDot = s;
 }
 return TextUtil.normalizeApostrophes(noDot);
}

 void wordFeatures(String word, String featurePrefix, List<String> features) {
  if (word == null) {
   return;
  }
  features.add(featurePrefix + "Upper:" + Character.isUpperCase(word.charAt(0)));
  features.add(featurePrefix + "Punct:" + (word.length() == 1));
  boolean allCap = true;
  for (char c : word.toCharArray()) {
   if (!Character.isUpperCase(c)) {
    allCap = false;
    break;
   }
  }
  features.add(featurePrefix + "AllCap:" + allCap);
  String s = TextUtil.normalizeApostrophes(word);
  int apostropheIndex = s.indexOf('\'');
  features.add(featurePrefix + "Apost:" + (apostropheIndex >= 0));
  if (apostropheIndex >= 0) {
   String stem = word.substring(0, apostropheIndex);
   String ending = word.substring(apostropheIndex + 1);
   features.add(featurePrefix + "Stem:" + stem);
   features.add(featurePrefix + "Ending:" + ending);
  }
 }
}

line = TextUtil.normalizeApostrophes(line);
line = TextUtil.normalizeQuotesHyphens(line);
line = TextUtil.normalizeSpacesAndSoftHyphens(line);

s = TextUtil.normalizeApostrophes(s);
s = TextUtil.normalizeQuotesHyphens(s);
s = TextUtil.normalizeSpacesAndSoftHyphens(s);

@Override
public void run() throws Exception {
 initializeOutputDir();
 IOUtil.checkDirectoryArgument(modelRoot, "Model Root");
 IOUtil.checkFileArgument(inputPath, "Input File");
 Path out = outDir.resolve(inputPath.toFile().getName() + ".ne");
 List<String> lines = Files.readAllLines(inputPath, StandardCharsets.UTF_8);
 List<String> sentences = TurkishSentenceExtractor.DEFAULT.fromParagraphs(lines);
 Log.info("There are %d lines and about %d sentences", lines.size(), sentences.size());
 TurkishMorphology morphology = TurkishMorphology.createWithDefaults();
 PerceptronNer ner = PerceptronNer.loadModel(modelRoot, morphology);
 Stopwatch sw = Stopwatch.createStarted();
 int tokenCount = 0;
 try (PrintWriter pw = new PrintWriter(out.toFile(), "UTF-8")) {
  for (String sentence : sentences) {
   sentence = TextUtil.normalizeApostrophes(sentence);
   sentence = TextUtil.normalizeQuotesHyphens(sentence);
   sentence = TextUtil.normalizeSpacesAndSoftHyphens(sentence);
   List<String> words = TurkishTokenizer.DEFAULT.tokenizeToStrings(sentence);
   tokenCount += words.size();
   NerSentence result = ner.findNamedEntities(sentence, words);
   pw.println(result.getAsTrainingSentence(annotationStyle));
  }
 }
 double secs = sw.elapsed(TimeUnit.MILLISECONDS) / 1000d;
 Log.info("Token count = %s", tokenCount);
 Log.info("File processed in %.4f seconds.", secs);
 Log.info("Speed = %.2f tokens/sec", tokenCount / secs);
 Log.info("Result is written in %s", out);
}

Javadoc

This method converts different apostrophe symbols to a unified form.

Popular methods of TextUtil

normalizeQuotesHyphens
This method converts different single and double quote symbols to a unified form. also it reduces tw
normalizeSpacesAndSoftHyphens
Replaces all unicode space like characters with " " and replaces soft hyphens [u00ad].
convertAmpersandStrings
replaces all special html Strings such as(&....; or &#dddd;) with their original characters.
cleanHtmlTagsAndComments
cleanScripts
containsCombiningDiacritics
Returns true iff input contains Combining Diacritics symbols. These characters sometimes appear in d
getAttributes
returns a map with attributes of an xml line. For example if [content] is `` and [element] is `Foo`
getHtmlBody
removeAmpresandStrings
This method removes all &....; type strings form html.
separateWords

Popular in Java

Creating JSON documents from java classes using gson
onCreateOptionsMenu (Activity)
getSystemService (Context)
scheduleAtFixedRate (ScheduledExecutorService)
List (java.util)
An ordered collection (also known as a sequence). The user of this interface has precise control ove
PriorityQueue (java.util)
A PriorityQueue holds elements on a priority heap, which orders the elements according to their natu
BlockingQueue (java.util.concurrent)
A java.util.Queue that additionally supports operations that wait for the queue to become non-empty
Collectors (java.util.stream)
Font (java.awt)
The Font class represents fonts, which are used to render text in a visible way. A font provides the
GridLayout (java.awt)
The GridLayout class is a layout manager that lays out a container's components in a rectangular gri
Best plugins for Eclipse

How to use normalizeApostrophesmethodin zemberek.core.text.TextUtil

Best Java code snippets using zemberek.core.text.TextUtil.normalizeApostrophes (Showing top 6 results out of 315)

How to use
normalizeApostrophes
method
in
zemberek.core.text.TextUtil