How to use
newExtractingInstance
method
in
de.l3s.boilerpipe.sax.HTMLHighlighter

Best Java code snippets using de.l3s.boilerpipe.sax.HTMLHighlighter.newExtractingInstance (Showing top 1 results out of 315)

/**
 * returns the article from an document with its basic html structure. 
 * 
 * @param HTMLDocument
 * @param URI the uri from the document for resolving the relative anchors in the document to absolute anchors
 * @return String
 */
public String process(HTMLDocument htmlDoc, URI docUri, final BoilerpipeExtractor extractor) {
  final HTMLHighlighter hh = HTMLHighlighter.newExtractingInstance();
  hh.setOutputHighlightOnly(true);
  TextDocument doc;
  String text = "";
  try {
    doc = new BoilerpipeSAXInput(htmlDoc.toInputSource()).getTextDocument();
    extractor.process(doc);
    final InputSource is = htmlDoc.toInputSource();
    text = hh.process(doc, is);
  } catch (Exception ex) {
    return null;
  }
  return removeNotAllowedTags(text, docUri);
}

Javadoc

Creates a new HTMLHighlighter, which is set-up to return only the extracted HTML text, including enclosed markup.

Popular methods of HTMLHighlighter

<init>
process
Fetches the given URL using HTMLFetcher and processes the retrieved HTML using the specified Boilerp
setExtraStyleSheet
Sets the extra stylesheet definition that will be inserted in the HEAD element. To disable, set it t
setOutputHighlightOnly
Sets whether only HTML enclosed within highlighted content will be returned, or the whole HTML docum
setPostHighlight
Sets the string that will be inserted after any highlighted HTML block. To disable, set it to the em
setPreHighlight
Sets the string that will be inserted prior to any highlighted HTML block. To disable, set it to the

Popular in Java

Finding current android device location
orElseThrow (Optional)
Return the contained value, if present, otherwise throw an exception to be created by the provided s
getOriginalFilename (MultipartFile)
Return the original filename in the client's filesystem.This may contain path information depending
startActivity (Activity)
ObjectMapper (com.fasterxml.jackson.databind)
ObjectMapper provides functionality for reading and writing JSON, either to and from basic POJOs (Pl
FileReader (java.io)
A specialized Reader that reads from a file in the file system. All read requests made by calling me
Runnable (java.lang)
Represents a command that can be executed. Often used to run code in a different Thread.
ExecutorService (java.util.concurrent)
An Executor that provides methods to manage termination and methods that can produce a Future for tr
JarFile (java.util.jar)
JarFile is used to read jar entries and their associated data from jar files.
XPath (javax.xml.xpath)
XPath provides access to the XPath evaluation environment and expressions. Evaluation of XPath Expr
CodeWhisperer alternatives

How to use newExtractingInstancemethodin de.l3s.boilerpipe.sax.HTMLHighlighter

Best Java code snippets using de.l3s.boilerpipe.sax.HTMLHighlighter.newExtractingInstance (Showing top 1 results out of 315)

How to use
newExtractingInstance
method
in
de.l3s.boilerpipe.sax.HTMLHighlighter