How to use
ImageExtractor
in
de.l3s.boilerpipe.sax

Best Java code snippets using de.l3s.boilerpipe.sax.ImageExtractor (Showing top 6 results out of 315)

/**
 * Processes the given {@link TextDocument} and the original HTML text (as a
 * String).
 * 
 * @param doc
 *            The processed {@link TextDocument}.
 * @param origHTML
 *            The original HTML document.
 * @return A List of enclosed {@link Image}s
 * @throws BoilerpipeProcessingException
 */
public List<Image> process(final TextDocument doc,
    final String origHTML) throws BoilerpipeProcessingException {
  return process(doc, new InputSource(
      new StringReader(origHTML)));
}

/**
 * Processes the given {@link TextDocument} and the original HTML text (as a
 * String).
 * 
 * @param doc
 *            The processed {@link TextDocument}.
 * @param origHTML
 *            The original HTML document.
 * @return A List of enclosed {@link Image}s
 * @throws BoilerpipeProcessingException
 */
public List<Image> process(final TextDocument doc,
    final String origHTML) throws BoilerpipeProcessingException {
  return process(doc, new InputSource(
      new StringReader(origHTML)));
}

/**
 * Processes the given {@link TextDocument} and the original HTML text (as a
 * String).
 * 
 * @param doc
 *            The processed {@link TextDocument}.
 * @param origHTML
 *            The original HTML document.
 * @return A List of enclosed {@link Image}s
 * @throws BoilerpipeProcessingException
 */
public List<Image> process(final TextDocument doc,
    final String origHTML) throws BoilerpipeProcessingException {
  return process(doc, new InputSource(
      new StringReader(origHTML)));
}

/**
 * Fetches the given {@link URL} using {@link HTMLFetcher} and processes the
 * retrieved HTML using the specified {@link BoilerpipeExtractor}.
 * 
 *            The processed {@link TextDocument}.
 *            The original HTML document.
 * @return A List of enclosed {@link Image}s
 * @throws BoilerpipeProcessingException
 */
public List<Image> process(final URL url, final BoilerpipeExtractor extractor)
    throws IOException, BoilerpipeProcessingException, SAXException {
  final HTMLDocument htmlDoc = HTMLFetcher.fetch(url);
  final TextDocument doc = new BoilerpipeSAXInput(htmlDoc.toInputSource())
      .getTextDocument();
  extractor.process(doc);
  final InputSource is = htmlDoc.toInputSource();
  return process(doc, is);
}

/**
 * Fetches the given {@link URL} using {@link HTMLFetcher} and processes the
 * retrieved HTML using the specified {@link BoilerpipeExtractor}.
 * 
 *            The processed {@link TextDocument}.
 *            The original HTML document.
 * @return A List of enclosed {@link Image}s
 * @throws BoilerpipeProcessingException
 */
public List<Image> process(final URL url, final BoilerpipeExtractor extractor)
    throws IOException, BoilerpipeProcessingException, SAXException {
  final HTMLDocument htmlDoc = HTMLFetcher.fetch(url);
  final TextDocument doc = new BoilerpipeSAXInput(htmlDoc.toInputSource())
      .getTextDocument();
  extractor.process(doc);
  final InputSource is = htmlDoc.toInputSource();
  return process(doc, is);
}

/**
 * Fetches the given {@link URL} using {@link HTMLFetcher} and processes the
 * retrieved HTML using the specified {@link BoilerpipeExtractor}.
 * 
 * @param doc
 *            The processed {@link TextDocument}.
 * @param is
 *            The original HTML document.
 * @return A List of enclosed {@link Image}s
 * @throws BoilerpipeProcessingException
 */
public List<Image> process(final URL url, final BoilerpipeExtractor extractor)
    throws IOException, BoilerpipeProcessingException, SAXException {
  final HTMLDocument htmlDoc = HTMLFetcher.fetch(url);
  final TextDocument doc = new BoilerpipeSAXInput(htmlDoc.toInputSource())
      .getTextDocument();
  extractor.process(doc);
  final InputSource is = htmlDoc.toInputSource();
  return process(doc, is);
}

Javadoc

Extracts the images that are enclosed by extracted content.

Most used methods

process
Fetches the given URL using HTMLFetcher and processes the retrieved HTML using the specified Boilerp

Popular in Java

Running tasks concurrently on multiple threads
runOnUiThread (Activity)
setContentView (Activity)
getSharedPreferences (Context)
BufferedInputStream (java.io)
A BufferedInputStream adds functionality to another input stream-namely, the ability to buffer the i
EOFException (java.io)
Thrown when a program encounters the end of a file or stream during an input operation.
ServerSocket (java.net)
This class represents a server-side socket that waits for incoming client connections. A ServerSocke
Component (java.awt)
A component is an object having a graphical representation that can be displayed on the screen and t
GridLayout (java.awt)
The GridLayout class is a layout manager that lays out a container's components in a rectangular gri
ImageIO (javax.imageio)
Top PhpStorm plugins

How to useImageExtractor in de.l3s.boilerpipe.sax

Best Java code snippets using de.l3s.boilerpipe.sax.ImageExtractor (Showing top 6 results out of 315)

How to use
ImageExtractor
in
de.l3s.boilerpipe.sax