Codota Logo
HTMLFetcher.fetch
Code IndexAdd Codota to your IDE (free)

How to use
fetch
method
in
de.l3s.boilerpipe.sax.HTMLFetcher

Best Java code snippets using de.l3s.boilerpipe.sax.HTMLFetcher.fetch (Showing top 15 results out of 315)

  • Add the Codota plugin to your IDE and get smart completions
private void myMethod () {
FileOutputStream f =
  • Codota IconFile file;new FileOutputStream(file)
  • Codota IconString name;new FileOutputStream(name)
  • Codota IconFile file;new FileOutputStream(file, true)
  • Smart code suggestions by Codota
}
origin: Netbreeze-GmbH/boilerpipe

/**
 * returns the article from an url with its basic html structure. 
 * 
 */
public String process(final BoilerpipeExtractor extractor, final URL url)
    throws IOException, BoilerpipeProcessingException, SAXException, URISyntaxException {
  final HTMLDocument htmlDoc = HTMLFetcher.fetch(url);
  return process(htmlDoc, url.toURI(), extractor);
}

origin: com.syncthemall/boilerpipe

/**
 * Extracts text from the HTML code available from the given {@link URL}.
 * NOTE: This method is mainly to be used for show case purposes. If you are
 * going to crawl the Web, consider using {@link #getText(InputSource)}
 * instead.
 * 
 * @param url  The URL pointing to the HTML code.
 * @return  The extracted text.
 * @throws BoilerpipeProcessingException
 */
public String getText(final URL url) throws BoilerpipeProcessingException {
  try {
    return getText(HTMLFetcher.fetch(url).toInputSource());
  } catch (IOException e) {
    throw new BoilerpipeProcessingException(e);
  }
}
origin: Netbreeze-GmbH/boilerpipe

/**
 * Extracts text from the HTML code available from the given {@link URL}.
 * NOTE: This method is mainly to be used for show case purposes. If you are
 * going to crawl the Web, consider using {@link #getText(InputSource)}
 * instead.
 * 
 * @param url  The URL pointing to the HTML code.
 * @return  The extracted text.
 * @throws BoilerpipeProcessingException
 */
public String getText(final URL url) throws BoilerpipeProcessingException {
  try {
    return getText(HTMLFetcher.fetch(url).toInputSource());
  } catch (IOException e) {
    throw new BoilerpipeProcessingException(e);
  }
}
origin: de.l3s.boilerpipe/boilerpipe

/**
 * Extracts text from the HTML code available from the given {@link URL}.
 * NOTE: This method is mainly to be used for show case purposes. If you are
 * going to crawl the Web, consider using {@link #getText(InputSource)}
 * instead.
 * 
 * @param url  The URL pointing to the HTML code.
 * @return  The extracted text.
 * @throws BoilerpipeProcessingException
 */
public String getText(final URL url) throws BoilerpipeProcessingException {
  try {
    return getText(HTMLFetcher.fetch(url).toInputSource());
  } catch (IOException e) {
    throw new BoilerpipeProcessingException(e);
  }
}
origin: pvdlg/boilerpipe

/**
 * Extracts text from the HTML code available from the given {@link URL}.
 * NOTE: This method is mainly to be used for show case purposes. If you are
 * going to crawl the Web, consider using {@link #getText(InputSource)}
 * instead.
 * 
 * @param url  The URL pointing to the HTML code.
 * @return  The extracted text.
 * @throws BoilerpipeProcessingException
 */
public String getText(final URL url) throws BoilerpipeProcessingException {
  try {
    return getText(HTMLFetcher.fetch(url).toInputSource());
  } catch (IOException e) {
    throw new BoilerpipeProcessingException(e);
  }
}
origin: de.l3s.boilerpipe/boilerpipe

public String process(final URL url, final BoilerpipeExtractor extractor)
    throws IOException, BoilerpipeProcessingException, SAXException {
  final HTMLDocument htmlDoc = HTMLFetcher.fetch(url);
  final TextDocument doc = new BoilerpipeSAXInput(htmlDoc.toInputSource())
      .getTextDocument();
  extractor.process(doc);
  final InputSource is = htmlDoc.toInputSource();
  return process(doc, is);
}
origin: com.syncthemall/boilerpipe

/**
 * Fetches the given {@link URL} using {@link HTMLFetcher} and processes the
 * retrieved HTML using the specified {@link BoilerpipeExtractor}.
 * 
 *            The processed {@link TextDocument}.
 *            The original HTML document.
 * @return The highlighted HTML.
 * @throws BoilerpipeProcessingException
 */
public String process(final URL url, final BoilerpipeExtractor extractor)
    throws IOException, BoilerpipeProcessingException, SAXException {
  final HTMLDocument htmlDoc = HTMLFetcher.fetch(url);
  final TextDocument doc = new BoilerpipeSAXInput(htmlDoc.toInputSource())
      .getTextDocument();
  extractor.process(doc);
  final InputSource is = htmlDoc.toInputSource();
  return process(doc, is);
}
origin: pvdlg/boilerpipe

/**
 * Fetches the given {@link URL} using {@link HTMLFetcher} and processes the
 * retrieved HTML using the specified {@link BoilerpipeExtractor}.
 * 
 *            The processed {@link TextDocument}.
 *            The original HTML document.
 * @return The highlighted HTML.
 * @throws BoilerpipeProcessingException
 */
public String process(final URL url, final BoilerpipeExtractor extractor)
    throws IOException, BoilerpipeProcessingException, SAXException {
  final HTMLDocument htmlDoc = HTMLFetcher.fetch(url);
  final TextDocument doc = new BoilerpipeSAXInput(htmlDoc.toInputSource())
      .getTextDocument();
  extractor.process(doc);
  final InputSource is = htmlDoc.toInputSource();
  return process(doc, is);
}
origin: com.syncthemall/boilerpipe

/**
 * Fetches the given {@link URL} using {@link HTMLFetcher} and processes the
 * retrieved HTML using the specified {@link BoilerpipeExtractor}.
 * 
 *            The processed {@link TextDocument}.
 *            The original HTML document.
 * @return A List of enclosed {@link Image}s
 * @throws BoilerpipeProcessingException
 */
public List<Image> process(final URL url, final BoilerpipeExtractor extractor)
    throws IOException, BoilerpipeProcessingException, SAXException {
  final HTMLDocument htmlDoc = HTMLFetcher.fetch(url);
  final TextDocument doc = new BoilerpipeSAXInput(htmlDoc.toInputSource())
      .getTextDocument();
  extractor.process(doc);
  final InputSource is = htmlDoc.toInputSource();
  return process(doc, is);
}

origin: pvdlg/boilerpipe

/**
 * Fetches the given {@link URL} using {@link HTMLFetcher} and processes the
 * retrieved HTML using the specified {@link BoilerpipeExtractor}.
 * 
 *            The processed {@link TextDocument}.
 *            The original HTML document.
 * @return A List of enclosed {@link Image}s
 * @throws BoilerpipeProcessingException
 */
public List<Image> process(final URL url, final BoilerpipeExtractor extractor)
    throws IOException, BoilerpipeProcessingException, SAXException {
  final HTMLDocument htmlDoc = HTMLFetcher.fetch(url);
  final TextDocument doc = new BoilerpipeSAXInput(htmlDoc.toInputSource())
      .getTextDocument();
  extractor.process(doc);
  final InputSource is = htmlDoc.toInputSource();
  return process(doc, is);
}

origin: com.syncthemall/boilerpipe

/**
 * Fetches the given {@link URL} using {@link HTMLFetcher} and processes the retrieved HTML using the specified
 * {@link BoilerpipeExtractor}.
 * 
 * @param url the url of the document to fetch
 * @param extractor extractor to use
 * 
 * @return A List of enclosed {@link Image}s
 * @throws IOException
 * @throws BoilerpipeProcessingException
 * @throws SAXException
 */
@SuppressWarnings("javadoc")
public List<Media> process(final URL url, final BoilerpipeExtractor extractor) throws IOException,
    BoilerpipeProcessingException, SAXException {
  final HTMLDocument htmlDoc = HTMLFetcher.fetch(url);
  final TextDocument doc = new BoilerpipeSAXInput(htmlDoc.toInputSource()).getTextDocument();
  extractor.process(doc);
  final InputSource is = htmlDoc.toInputSource();
  return process(doc, is);
}
origin: Netbreeze-GmbH/boilerpipe

/**
 * Fetches the given {@link URL} using {@link HTMLFetcher} and processes the
 * retrieved HTML using the specified {@link BoilerpipeExtractor}.
 * 
 * @param doc
 *            The processed {@link TextDocument}.
 * @param is
 *            The original HTML document.
 * @return The highlighted HTML.
 * @throws BoilerpipeProcessingException
 */
public String process(final URL url, final BoilerpipeExtractor extractor)
    throws IOException, BoilerpipeProcessingException, SAXException {
  final HTMLDocument htmlDoc = HTMLFetcher.fetch(url);
  final TextDocument doc = new BoilerpipeSAXInput(htmlDoc.toInputSource())
      .getTextDocument();
  extractor.process(doc);
  final InputSource is = htmlDoc.toInputSource();
  return process(doc, is);
}
origin: Netbreeze-GmbH/boilerpipe

/**
 * Fetches the given {@link URL} using {@link HTMLFetcher} and processes the
 * retrieved HTML using the specified {@link BoilerpipeExtractor}.
 * 
 * @param doc
 *            The processed {@link TextDocument}.
 * @param is
 *            The original HTML document.
 * @return A List of enclosed {@link Image}s
 * @throws BoilerpipeProcessingException
 */
public List<Image> process(final URL url, final BoilerpipeExtractor extractor)
    throws IOException, BoilerpipeProcessingException, SAXException {
  final HTMLDocument htmlDoc = HTMLFetcher.fetch(url);
  final TextDocument doc = new BoilerpipeSAXInput(htmlDoc.toInputSource())
      .getTextDocument();
  extractor.process(doc);
  final InputSource is = htmlDoc.toInputSource();
  return process(doc, is);
}

origin: pvdlg/boilerpipe

/**
 * Fetches the given {@link URL} using {@link HTMLFetcher} and processes the retrieved HTML using the specified
 * {@link BoilerpipeExtractor}.
 * 
 * @param url the url of the document to fetch
 * @param extractor extractor to use
 * 
 * @return A List of enclosed {@link Image}s
 * @throws IOException
 * @throws BoilerpipeProcessingException
 * @throws SAXException
 */
@SuppressWarnings("javadoc")
public List<Media> process(final URL url, final BoilerpipeExtractor extractor) throws IOException,
    BoilerpipeProcessingException, SAXException {
  final HTMLDocument htmlDoc = HTMLFetcher.fetch(url);
  final TextDocument doc = new BoilerpipeSAXInput(htmlDoc.toInputSource()).getTextDocument();
  extractor.process(doc);
  final InputSource is = htmlDoc.toInputSource();
  return process(doc, is);
}
origin: Netbreeze-GmbH/boilerpipe

/**
 * Fetches the given {@link URL} using {@link HTMLFetcher} and processes the
 * retrieved HTML using the specified {@link BoilerpipeExtractor}.
 * @param url the url of the document to fetch
 * @param extractor extractor to use
 *
 * @return A List of enclosed {@link Image}s
 * @throws IOException
 * @throws BoilerpipeProcessingException
 * @throws SAXException
 */
@SuppressWarnings("javadoc")
public List<Media> process(final URL url, final BoilerpipeExtractor extractor)
        throws IOException, BoilerpipeProcessingException, SAXException {
    final HTMLDocument htmlDoc = HTMLFetcher.fetch(url);
    final TextDocument doc = new BoilerpipeSAXInput(htmlDoc.toInputSource())
            .getTextDocument();
    extractor.process(doc);
    final InputSource is = htmlDoc.toInputSource();
    return process(doc, is);
}
de.l3s.boilerpipe.saxHTMLFetcherfetch

Javadoc

Fetches the document at the given URL, using URLConnection.

Popular methods of HTMLFetcher

    Popular in Java

    • Making http post requests using okhttp
    • compareTo (BigDecimal)
    • requestLocationUpdates (LocationManager)
    • getExternalFilesDir (Context)
    • ObjectMapper (com.fasterxml.jackson.databind)
      This mapper (or, data binder, or codec) provides functionality for converting between Java objects (
    • URL (java.net)
      A Uniform Resource Locator that identifies the location of an Internet resource as specified by RFC
    • Date (java.util)
      A specific moment in time, with millisecond precision. Values typically come from System#currentTime
    • Map (java.util)
      A Map is a data structure consisting of a set of keys and values in which each key is mapped to a si
    • StringTokenizer (java.util)
      The string tokenizer class allows an application to break a string into tokens. The tokenization met
    • Stream (java.util.stream)
      A sequence of elements supporting sequential and parallel aggregate operations. The following exampl
    Codota Logo
    • Products

      Search for Java codeSearch for JavaScript codeEnterprise
    • IDE Plugins

      IntelliJ IDEAWebStormAndroid StudioEclipseVisual Studio CodePyCharmSublime TextPhpStormVimAtomGoLandRubyMineEmacsJupyter
    • Company

      About UsContact UsCareers
    • Resources

      FAQBlogCodota Academy Plugin user guide Terms of usePrivacy policyJava Code IndexJavascript Code Index
    Get Codota for your IDE now