edu.illinois.cs.cogcomp.annotation.XmlTextAnnotationMaker java code examples

return new XmlTextAnnotationMaker(textAnnotationBuilder, xmlProcessor);

/**
 * given an entry from the corpus file list generated by {@link #getFileListing()} , parse its
 * contents and get zero or more TextAnnotation objects. This allows for the case where corpus
 * annotations are provided in standoff format in one or more files separate from the source
 * document.  In such cases, the first file in the list should contain the source document
 * and the rest should be the corresponding markup files.
 *
 * In this default implementation, it is assumed that a single file contains both source and markup.
 *
 * @param corpusFileListEntry a list of files, the first of which is a source file.
 * @return List of TextAnnotation objects extracted from the corpus file.
 */
@Override
public List<XmlTextAnnotation> getAnnotationsFromFile(List<Path> corpusFileListEntry) throws Exception {
  Path sourceTextAndAnnotationFile = corpusFileListEntry.get(0);
  fileId =
      sourceTextAndAnnotationFile.getName(sourceTextAndAnnotationFile.getNameCount() - 1)
          .toString();
  logger.debug("read source file {}", fileId);
  numFiles++;
  String fileText = LineIO.slurp(sourceTextAndAnnotationFile.toString());
  List<XmlTextAnnotation> xmlTaList = new ArrayList<>(1);
  XmlTextAnnotation xmlTa = xmlTextAnnotationMaker.createTextAnnotation(fileText, this.corpusName, fileId);
  if (null != xmlTa) {
    xmlTaList.add(xmlTa);
    numTextAnnotations++;
  }
  return xmlTaList;
}

StatefulTokenizer st = new StatefulTokenizer();
TokenizerTextAnnotationBuilder taBuilder = new TokenizerTextAnnotationBuilder(st);
XmlTextAnnotationMaker xtam = new XmlTextAnnotationMaker(taBuilder, xmlProcessor);
XmlTextAnnotation xta = xtam.createTextAnnotation(data, "OntoNotes 5.0", docid);
TextAnnotation ta = xta.getTextAnnotation();
List<SpanInfo> fudge = xta.getXmlMarkup();

StatefulTokenizer st = new StatefulTokenizer();
TokenizerTextAnnotationBuilder taBuilder = new TokenizerTextAnnotationBuilder(st);
XmlTextAnnotationMaker xtam = new XmlTextAnnotationMaker(taBuilder, xmlProcessor);
XmlTextAnnotation xta = xtam.createTextAnnotation(data, "OntoNotes 5.0", docid);
TextAnnotation ta = xta.getTextAnnotation();
List<SpanInfo> fudge = xta.getXmlMarkup();

StatefulTokenizer st = new StatefulTokenizer();
TokenizerTextAnnotationBuilder taBuilder = new TokenizerTextAnnotationBuilder(st);
XmlTextAnnotationMaker xtam = new XmlTextAnnotationMaker(taBuilder, xmlProcessor);
XmlTextAnnotation xta = xtam.createTextAnnotation(document, "OntoNotes 5.0", "test");
TextAnnotation ta = xta.getTextAnnotation();
List<SpanInfo> fudge = xta.getXmlMarkup();

return new XmlTextAnnotationMaker(textAnnotationBuilder, xmlProcessor);

/**
 * given an entry from the corpus file list generated by {@link #getFileListing()} , parse its
 * contents and get zero or more TextAnnotation objects. This allows for the case where corpus
 * annotations are provided in standoff format in one or more files separate from the source
 * document.  In such cases, the first file in the list should contain the source document
 * and the rest should be the corresponding markup files.
 *
 * In this default implementation, it is assumed that a single file contains both source and markup.
 *
 * @param corpusFileListEntry a list of files, the first of which is a source file.
 * @return List of TextAnnotation objects extracted from the corpus file.
 */
@Override
public List<XmlTextAnnotation> getAnnotationsFromFile(List<Path> corpusFileListEntry) throws Exception {
  Path sourceTextAndAnnotationFile = corpusFileListEntry.get(0);
  fileId =
      sourceTextAndAnnotationFile.getName(sourceTextAndAnnotationFile.getNameCount() - 1)
          .toString();
  logger.debug("read source file {}", fileId);
  numFiles++;
  String fileText = LineIO.slurp(sourceTextAndAnnotationFile.toString());
  List<XmlTextAnnotation> xmlTaList = new ArrayList<>(1);
  XmlTextAnnotation xmlTa = xmlTextAnnotationMaker.createTextAnnotation(fileText, this.corpusName, fileId);
  if (null != xmlTa) {
    xmlTaList.add(xmlTa);
    numTextAnnotations++;
  }
  return xmlTaList;
}

StatefulTokenizer st = new StatefulTokenizer();
TokenizerTextAnnotationBuilder taBuilder = new TokenizerTextAnnotationBuilder(st);
XmlTextAnnotationMaker xtam = new XmlTextAnnotationMaker(taBuilder, xmlProcessor);
XmlTextAnnotation xta = xtam.createTextAnnotation(document, "OntoNotes 5.0", "test");
TextAnnotation ta = xta.getTextAnnotation();
List<SpanInfo> fudge = xta.getXmlMarkup();

return new XmlTextAnnotationMaker(textAnnotationBuilder, xmlProcessor);

return new XmlTextAnnotationMaker(textAnnotationBuilder, xmlProcessor);

Javadoc

Instantiates a XmlTextAnnotation object from xml text. The xml is parsed into body text (which is further cleaned up as needed), and this cleaned text is used to create a TextAnnotation. Additional information is extracted from the xml source. The mapping between the xml source and the cleaned text (i.e. mapping between character offsets) is also derived. The goal is to provide text that can be processed easily with an NLP pipeline without a lot of hacks to work around ill-formatted text. The annotations so produced can then be mapped to the offsets in the original xml text, and combined with supplementary information extracted from the xml markup.

Most used methods

<init>
Specifies the behavior of the XmlTextAnnotationMaker: tokenization (via the TextAnnotationBuilder),
createTextAnnotation
A method for creating TextAnnotation by tokenizing the given text string.

Popular in Java

Running tasks concurrently on multiple threads
putExtra (Intent)
onRequestPermissionsResult (Fragment)
getExternalFilesDir (Context)
UnknownHostException (java.net)
Thrown when a hostname can not be resolved.
Format (java.text)
The base class for all formats. This is an abstract base class which specifies the protocol for clas
Random (java.util)
This class provides methods that return pseudo-random values.It is dangerous to seed Random with the
Collectors (java.util.stream)
Reference (javax.naming)
JOptionPane (javax.swing)
CodeWhisperer alternatives

How to useXmlTextAnnotationMaker in edu.illinois.cs.cogcomp.annotation

Best Java code snippets using edu.illinois.cs.cogcomp.annotation.XmlTextAnnotationMaker (Showing top 10 results out of 315)

How to use
XmlTextAnnotationMaker
in
edu.illinois.cs.cogcomp.annotation