How to use
getContent
method
in
de.l3s.boilerpipe.document.TextDocument

Best Java code snippets using de.l3s.boilerpipe.document.TextDocument.getContent (Showing top 5 results out of 315)

  /**
   * Extracts text from the given {@link TextDocument} object.
   * 
   * @param doc The {@link TextDocument}.
   * @return  The extracted text.
   * @throws BoilerpipeProcessingException
   */
  public String getText(TextDocument doc)
      throws BoilerpipeProcessingException {
    process(doc);
    return doc.getContent();
  }    
}

  /**
   * Extracts text from the given {@link TextDocument} object.
   * 
   * @param doc The {@link TextDocument}.
   * @return  The extracted text.
   * @throws BoilerpipeProcessingException
   */
  public String getText(TextDocument doc)
      throws BoilerpipeProcessingException {
    process(doc);
    return doc.getContent();
  }    
}

  /**
   * Extracts text from the given {@link TextDocument} object.
   * 
   * @param doc The {@link TextDocument}.
   * @return  The extracted text.
   * @throws BoilerpipeProcessingException
   */
  public String getText(TextDocument doc)
      throws BoilerpipeProcessingException {
    process(doc);
    return doc.getContent();
  }    
}

  /**
   * Extracts text from the given {@link TextDocument} object.
   * 
   * @param doc The {@link TextDocument}.
   * @return  The extracted text.
   * @throws BoilerpipeProcessingException
   */
  public String getText(TextDocument doc)
      throws BoilerpipeProcessingException {
    process(doc);
    return doc.getContent();
  }    
}

@Override
public void execute(Tuple input)
{
  String url = input.getStringByField("url");
  String html = input.getStringByField("html");
  Object date = input.getValueByField("date");
  if (html == null)
  {
    logger.error("No content for : {}", url);
    collector.ack(input);
    return;
  }
  try
  {
    TextDocument td = new BoilerpipeSAXInput(new InputSource(
        new StringReader(html))).getTextDocument();
    ArticleSentencesExtractor.INSTANCE.process(td);
    collector.emit(input, new Values(td.getContent(), url, date));
    collector.ack(input);
    logger.info("extracted text for {}", url);
  }
  catch (Exception e)
  {
    collector.fail(input);
    logger.error("error extracting text from {} {}", url, e);
    collector.reportError(e);
  }
}

Javadoc

Returns the TextDocument's content.

Popular methods of TextDocument

getTextBlocks
Returns the TextBlocks of this document.
<init>
Creates a new TextDocument with given TextBlocks, and no title.
getText
Returns the TextDocument's content, non-content or both
getTitle
Returns the "main" title for this document, or null if no such title has ben set.
debugString
Returns detailed debugging information about the contained TextBlocks.

Popular in Java

Creating JSON documents from java classes using gson
orElseThrow (Optional)
Return the contained value, if present, otherwise throw an exception to be created by the provided s
compareTo (BigDecimal)
putExtra (Intent)
SocketException (java.net)
This SocketException may be thrown during socket creation or setting options, and is the superclass
Dictionary (java.util)
Note: Do not use this class since it is obsolete. Please use the Map interface for new implementatio
BlockingQueue (java.util.concurrent)
A java.util.Queue that additionally supports operations that wait for the queue to become non-empty
ReentrantLock (java.util.concurrent.locks)
A reentrant mutual exclusion Lock with the same basic behavior and semantics as the implicit monitor
JCheckBox (javax.swing)
BasicDataSource (org.apache.commons.dbcp)
Basic implementation of javax.sql.DataSource that is configured via JavaBeans properties. This is no
Top 12 Jupyter Notebook extensions

How to use getContentmethodin de.l3s.boilerpipe.document.TextDocument

Best Java code snippets using de.l3s.boilerpipe.document.TextDocument.getContent (Showing top 5 results out of 315)

How to use
getContent
method
in
de.l3s.boilerpipe.document.TextDocument