Codota Logo
HTMLHighlighter
Code IndexAdd Codota to your IDE (free)

How to use
HTMLHighlighter
in
de.l3s.boilerpipe.sax

Best Java code snippets using de.l3s.boilerpipe.sax.HTMLHighlighter (Showing top 20 results out of 315)

  • Common ways to obtain HTMLHighlighter
private void myMethod () {
HTMLHighlighter h =
  • Codota Iconnew HTMLHighlighter(true)
  • Codota Iconnew HTMLHighlighter(false)
  • Smart code suggestions by Codota
}
origin: com.syncthemall/boilerpipe

/**
 * Creates a new {@link HTMLHighlighter}, which is set-up to return the full
 * HTML text, with the extracted text portion <b>highlighted</b>.
 */
public static HTMLHighlighter newHighlightingInstance() {
  return new HTMLHighlighter(false);
}
origin: de.l3s.boilerpipe/boilerpipe

private HTMLHighlighter(final boolean extractHTML) {
  if (extractHTML) {
    setOutputHighlightOnly(true);
    setExtraStyleSheet("");
    setPreHighlight("");
    setPostHighlight("");
  }
}
origin: Netbreeze-GmbH/boilerpipe

/**
 * returns the article from an document with its basic html structure. 
 * 
 * @param HTMLDocument
 * @param URI the uri from the document for resolving the relative anchors in the document to absolute anchors
 * @return String
 */
public String process(HTMLDocument htmlDoc, URI docUri, final BoilerpipeExtractor extractor) {
  final HTMLHighlighter hh = HTMLHighlighter.newExtractingInstance();
  hh.setOutputHighlightOnly(true);
  TextDocument doc;
  String text = "";
  try {
    doc = new BoilerpipeSAXInput(htmlDoc.toInputSource()).getTextDocument();
    extractor.process(doc);
    final InputSource is = htmlDoc.toInputSource();
    text = hh.process(doc, is);
  } catch (Exception ex) {
    return null;
  }
  return removeNotAllowedTags(text, docUri);
}
origin: de.l3s.boilerpipe/boilerpipe

/**
 * Processes the given {@link TextDocument} and the original HTML text (as a
 * String).
 * 
 * @param doc
 *            The processed {@link TextDocument}.
 * @param origHTML
 *            The original HTML document.
 * @throws BoilerpipeProcessingException
 */
public String process(final TextDocument doc, final String origHTML)
    throws BoilerpipeProcessingException {
  return process(doc, new InputSource(new StringReader(origHTML)));
}
origin: com.syncthemall/boilerpipe

/**
 * Processes the given {@link TextDocument} and the original HTML text (as a
 * String).
 * 
 * @param doc
 *            The processed {@link TextDocument}.
 * @param origHTML
 *            The original HTML document.
 * @return The highlighted HTML.
 * @throws BoilerpipeProcessingException
 */
public String process(final TextDocument doc, final String origHTML)
    throws BoilerpipeProcessingException {
  return process(doc, new InputSource(new StringReader(origHTML)));
}
origin: pvdlg/boilerpipe

private HTMLHighlighter(final boolean extractHTML) {
  if (extractHTML) {
    setOutputHighlightOnly(true);
    setExtraStyleSheet("\n<style type=\"text/css\">\n"
        + "A:before { content:' '; } \n" //
        + "A:after { content:' '; } \n" //
        + "SPAN:before { content:' '; } \n" //
        + "SPAN:after { content:' '; } \n" //
        + "</style>\n");
    setPreHighlight("");
    setPostHighlight("");
  }
}
origin: Netbreeze-GmbH/boilerpipe

/**
 * Creates a new {@link HTMLHighlighter}, which is set-up to return only the
 * extracted HTML text, including enclosed markup.
 */
public static HTMLHighlighter newExtractingInstance() {
  return new HTMLHighlighter(true);
}
origin: pvdlg/boilerpipe

/**
 * Processes the given {@link TextDocument} and the original HTML text (as a
 * String).
 * 
 * @param doc
 *            The processed {@link TextDocument}.
 * @param origHTML
 *            The original HTML document.
 * @return The highlighted HTML.
 * @throws BoilerpipeProcessingException
 */
public String process(final TextDocument doc, final String origHTML)
    throws BoilerpipeProcessingException {
  return process(doc, new InputSource(new StringReader(origHTML)));
}
origin: Netbreeze-GmbH/boilerpipe

private HTMLHighlighter(final boolean extractHTML) {
  if (extractHTML) {
    setOutputHighlightOnly(true);
    setExtraStyleSheet("\n<style type=\"text/css\">\n"
        + "A:before { content:' '; } \n" //
        + "A:after { content:' '; } \n" //
        + "SPAN:before { content:' '; } \n" //
        + "SPAN:after { content:' '; } \n" //
        + "</style>\n");
    setPreHighlight("");
    setPostHighlight("");
  }
}
origin: pvdlg/boilerpipe

/**
 * Creates a new {@link HTMLHighlighter}, which is set-up to return the full
 * HTML text, with the extracted text portion <b>highlighted</b>.
 */
public static HTMLHighlighter newHighlightingInstance() {
  return new HTMLHighlighter(false);
}
origin: Netbreeze-GmbH/boilerpipe

/**
 * Processes the given {@link TextDocument} and the original HTML text (as a
 * String).
 * 
 * @param doc
 *            The processed {@link TextDocument}.
 * @param origHTML
 *            The original HTML document.
 * @return The highlighted HTML.
 * @throws BoilerpipeProcessingException
 */
public String process(final TextDocument doc, final String origHTML)
    throws BoilerpipeProcessingException {
  return process(doc, new InputSource(new StringReader(origHTML)));
}
origin: com.syncthemall/boilerpipe

private HTMLHighlighter(final boolean extractHTML) {
  if (extractHTML) {
    setOutputHighlightOnly(true);
    setExtraStyleSheet("\n<style type=\"text/css\">\n"
        + "A:before { content:' '; } \n" //
        + "A:after { content:' '; } \n" //
        + "SPAN:before { content:' '; } \n" //
        + "SPAN:after { content:' '; } \n" //
        + "</style>\n");
    setPreHighlight("");
    setPostHighlight("");
  }
}
origin: Netbreeze-GmbH/boilerpipe

/**
 * Creates a new {@link HTMLHighlighter}, which is set-up to return the full
 * HTML text, with the extracted text portion <b>highlighted</b>.
 */
public static HTMLHighlighter newHighlightingInstance() {
  return new HTMLHighlighter(false);
}
origin: de.l3s.boilerpipe/boilerpipe

public String process(final URL url, final BoilerpipeExtractor extractor)
    throws IOException, BoilerpipeProcessingException, SAXException {
  final HTMLDocument htmlDoc = HTMLFetcher.fetch(url);
  final TextDocument doc = new BoilerpipeSAXInput(htmlDoc.toInputSource())
      .getTextDocument();
  extractor.process(doc);
  final InputSource is = htmlDoc.toInputSource();
  return process(doc, is);
}
origin: de.l3s.boilerpipe/boilerpipe

/**
 * Creates a new {@link HTMLHighlighter}, which is set-up to return only the
 * extracted HTML text, including enclosed markup.
 */
public static HTMLHighlighter newExtractingInstance() {
  return new HTMLHighlighter(true);
}
origin: com.syncthemall/boilerpipe

/**
 * Fetches the given {@link URL} using {@link HTMLFetcher} and processes the
 * retrieved HTML using the specified {@link BoilerpipeExtractor}.
 * 
 *            The processed {@link TextDocument}.
 *            The original HTML document.
 * @return The highlighted HTML.
 * @throws BoilerpipeProcessingException
 */
public String process(final URL url, final BoilerpipeExtractor extractor)
    throws IOException, BoilerpipeProcessingException, SAXException {
  final HTMLDocument htmlDoc = HTMLFetcher.fetch(url);
  final TextDocument doc = new BoilerpipeSAXInput(htmlDoc.toInputSource())
      .getTextDocument();
  extractor.process(doc);
  final InputSource is = htmlDoc.toInputSource();
  return process(doc, is);
}
origin: com.syncthemall/boilerpipe

/**
 * Creates a new {@link HTMLHighlighter}, which is set-up to return only the
 * extracted HTML text, including enclosed markup.
 */
public static HTMLHighlighter newExtractingInstance() {
  return new HTMLHighlighter(true);
}
origin: pvdlg/boilerpipe

/**
 * Fetches the given {@link URL} using {@link HTMLFetcher} and processes the
 * retrieved HTML using the specified {@link BoilerpipeExtractor}.
 * 
 *            The processed {@link TextDocument}.
 *            The original HTML document.
 * @return The highlighted HTML.
 * @throws BoilerpipeProcessingException
 */
public String process(final URL url, final BoilerpipeExtractor extractor)
    throws IOException, BoilerpipeProcessingException, SAXException {
  final HTMLDocument htmlDoc = HTMLFetcher.fetch(url);
  final TextDocument doc = new BoilerpipeSAXInput(htmlDoc.toInputSource())
      .getTextDocument();
  extractor.process(doc);
  final InputSource is = htmlDoc.toInputSource();
  return process(doc, is);
}
origin: de.l3s.boilerpipe/boilerpipe

/**
 * Creates a new {@link HTMLHighlighter}, which is set-up to return the full
 * HTML text, with the extracted text portion <b>highlighted</b>.
 */
public static HTMLHighlighter newHighlightingInstance() {
  return new HTMLHighlighter(false);
}
origin: Netbreeze-GmbH/boilerpipe

/**
 * Fetches the given {@link URL} using {@link HTMLFetcher} and processes the
 * retrieved HTML using the specified {@link BoilerpipeExtractor}.
 * 
 * @param doc
 *            The processed {@link TextDocument}.
 * @param is
 *            The original HTML document.
 * @return The highlighted HTML.
 * @throws BoilerpipeProcessingException
 */
public String process(final URL url, final BoilerpipeExtractor extractor)
    throws IOException, BoilerpipeProcessingException, SAXException {
  final HTMLDocument htmlDoc = HTMLFetcher.fetch(url);
  final TextDocument doc = new BoilerpipeSAXInput(htmlDoc.toInputSource())
      .getTextDocument();
  extractor.process(doc);
  final InputSource is = htmlDoc.toInputSource();
  return process(doc, is);
}
de.l3s.boilerpipe.saxHTMLHighlighter

Javadoc

Highlights text blocks in an HTML document that have been marked as "content" in the corresponding TextDocument.

Most used methods

  • <init>
  • process
    Fetches the given URL using HTMLFetcher and processes the retrieved HTML using the specified Boilerp
  • setExtraStyleSheet
    Sets the extra stylesheet definition that will be inserted in the HEAD element. To disable, set it t
  • setOutputHighlightOnly
    Sets whether only HTML enclosed within highlighted content will be returned, or the whole HTML docum
  • setPostHighlight
    Sets the string that will be inserted after any highlighted HTML block. To disable, set it to the em
  • setPreHighlight
    Sets the string that will be inserted prior to any highlighted HTML block. To disable, set it to the
  • newExtractingInstance
    Creates a new HTMLHighlighter, which is set-up to return only the extracted HTML text, including enc

Popular in Java

  • Making http post requests using okhttp
  • scheduleAtFixedRate (Timer)
  • setContentView (Activity)
  • addToBackStack (FragmentTransaction)
  • EOFException (java.io)
    Thrown when a program encounters the end of a file or stream during an input operation.
  • Calendar (java.util)
    Calendar is an abstract base class for converting between a Date object and a set of integer fields
  • Hashtable (java.util)
    Hashtable is a synchronized implementation of Map. All optional operations are supported.Neither key
  • SortedSet (java.util)
    A Set that further provides a total ordering on its elements. The elements are ordered using their C
  • Modifier (javassist)
    The Modifier class provides static methods and constants to decode class and member access modifiers
  • JList (javax.swing)
Codota Logo
  • Products

    Search for Java codeSearch for JavaScript codeEnterprise
  • IDE Plugins

    IntelliJ IDEAWebStormAndroid StudioEclipseVisual Studio CodePyCharmSublime TextPhpStormVimAtomGoLandRubyMineEmacsJupyter
  • Company

    About UsContact UsCareers
  • Resources

    FAQBlogCodota Academy Plugin user guide Terms of usePrivacy policyJava Code IndexJavascript Code Index
Get Codota for your IDE now