Codota Logo
Tika.detect
Code IndexAdd Codota to your IDE (free)

How to use
detect
method
in
org.apache.tika.Tika

Best Java code snippets using org.apache.tika.Tika.detect (Showing top 20 results out of 405)

  • Common ways to obtain Tika
private void myMethod () {
Tika t =
  • Codota Iconnew Tika()
  • Codota IconTikaConfig config;new Tika(config)
  • Smart code suggestions by Codota
}
origin: stackoverflow.com

Tika tika = new Tika();
File file = ...
String mimeType = tika.detect(file);
origin: BroadleafCommerce/BroadleafCommerce

protected void getMimeType(InputStream inputStream, String fileName, StaticAsset newAsset) {
  Tika tika = new Tika();
  String tikaMimeType = tika.detect(fileName);
  if (tikaMimeType == null) {
    try {
      tikaMimeType = tika.detect(inputStream);
    } catch (IOException e) {
      //if tika can't resolve, don't throw exception
    }
  }
  if (tikaMimeType != null) {
    newAsset.setMimeType(tikaMimeType);
  }
}
origin: apache/tika

/**
 * Detects the media type of a document with the given file name.
 * The type detection is based on known file name extensions.
 * <p>
 * The given name can also be a URL or a full file path. In such cases
 * only the file name part of the string is used for type detection. 
 *
 * @param name the file name of the document
 * @return detected media type
 */
public String detect(String name) {
  try {
    return detect((InputStream) null, name);
  } catch (IOException e) {
    throw new IllegalStateException("Unexpected IOException", e);
  }
}
origin: apache/tika

@Override
public String probeContentType(Path path) throws IOException {
  // Try to detect based on the file name only for efficiency
  String fileNameDetect = tika.detect(path.toString());
  if(!fileNameDetect.equals(MimeTypes.OCTET_STREAM)) {
    return fileNameDetect;
  }
  
  // Then check the file content if necessary
  String fileContentDetect = tika.detect(path);
  if(!fileContentDetect.equals(MimeTypes.OCTET_STREAM)) {
    return fileContentDetect;
  }
  
  // Specification says to return null if we could not 
  // conclusively determine the file type
  return null;
}

origin: apache/tika

  public static void main(String[] args) throws Exception {
    Tika tika = new Tika();

    for (String file : args) {
      String type = tika.detect(new File(file));
      System.out.println(file + ": " + type);
    }
  }
}
origin: apache/tika

/**
 * Detects the media type of the given document. The type detection is
 * based on the content of the given document stream.
 * <p>
 * If the document stream supports the
 * {@link InputStream#markSupported() mark feature}, then the stream is
 * marked and reset to the original position before this method returns.
 * Only a limited number of bytes are read from the stream.
 * <p>
 * The given document stream is <em>not</em> closed by this method.
 *
 * @param stream the document stream
 * @return detected media type
 * @throws IOException if the stream can not be read
 */
public String detect(InputStream stream) throws IOException {
  return detect(stream, new Metadata());
}
origin: apache/tika

/**
 * Detects the media type of the given document. The type detection is
 * based on the first few bytes of a document.
 * <p>
 * For best results at least a few kilobytes of the document data
 * are needed. See also the other detect() methods for better
 * alternatives when you have more than just the document prefix
 * available for type detection.
 *
 * @since Apache Tika 0.9
 * @param prefix first few bytes of the document
 * @return detected media type
 */
public String detect(byte[] prefix) {
  try {
    try (InputStream stream = TikaInputStream.get(prefix)) {
      return detect(stream);
    }
  } catch (IOException e) {
    throw new IllegalStateException("Unexpected IOException", e);
  }
}
origin: apache/tika

public static String customMimeInfo() throws Exception {
  String path = "file:///path/to/prescription-type.xml";
  MimeTypes typeDatabase = MimeTypesFactory.create(new URL(path));
  Tika tika = new Tika(typeDatabase);
  String type = tika.detect("/path/to/prescription.xpd");
  return type;
}
origin: apache/tika

public static String detectWithCustomConfig(String name) throws Exception {
  String config = "/org/apache/tika/mime/tika-mimetypes.xml";
  Tika tika = new Tika(MimeTypesFactory.create(config));
  return tika.detect(name);
}
origin: apache/tika

/**
 * Find the Mime Content Type of a document stored in the given file.
 * Returns application/octet-stream if no better match is found.
 *
 * @deprecated Use {@link Tika#detect(File)} instead
 * @param file file to analyze
 * @return the Mime Content Type of the specified document
 * @throws MimeTypeException if the type can't be detected
 * @throws IOException if the file can't be read
 */
public MimeType getMimeType(File file)
    throws MimeTypeException, IOException {
  return forName(new Tika(this).detect(file));
}
origin: apache/tika

/**
 * Detects the media type of the file at the given path. The type
 * detection is based on the document content and a potential known
 * file extension.
 * <p>
 * Use the {@link #detect(String)} method when you want to detect the
 * type of the document without actually accessing the file.
 *
 * @param path the path of the file
 * @return detected media type
 * @throws IOException if the file can not be read
 */
public String detect(Path path) throws IOException {
  Metadata metadata = new Metadata();
  try (InputStream stream = TikaInputStream.get(path, metadata)) {
    return detect(stream, metadata);
  }
}
origin: apache/tika

/**
 * Detects the media type of the resource at the given URL. The type
 * detection is based on the document content and a potential known
 * file extension included in the URL.
 * <p>
 * Use the {@link #detect(String)} method when you want to detect the
 * type of the document without actually accessing the URL.
 *
 * @param url the URL of the resource
 * @return detected media type
 * @throws IOException if the resource can not be read
 */
public String detect(URL url) throws IOException {
  Metadata metadata = new Metadata();
  try (InputStream stream = TikaInputStream.get(url, metadata)) {
    return detect(stream, metadata);
  }
}
origin: apache/tika

/**
 * Detects the media type of the given file. The type detection is
 * based on the document content and a potential known file extension.
 * <p>
 * Use the {@link #detect(String)} method when you want to detect the
 * type of the document without actually accessing the file.
 *
 * @param file the file
 * @return detected media type
 * @throws IOException if the file can not be read
 * @see #detect(Path)
 */
public String detect(File file) throws IOException {
  Metadata metadata = new Metadata();
  try (@SuppressWarnings("deprecation")
  InputStream stream = TikaInputStream.get(file, metadata)) {
    return detect(stream, metadata);
  }
}
origin: apache/tika

private static void benchmark(File file) throws Exception {
  if (file.isHidden()) {
    // ignore
  } else if (file.isFile()) {
    try (InputStream input = new FileInputStream(file)) {
      byte[] content = IOUtils.toByteArray(input);
      String type =
          tika.detect(new ByteArrayInputStream(content));
      long start = System.currentTimeMillis();
      for (int i = 0; i < 1000; i++) {
        tika.detect(new ByteArrayInputStream(content));
      }
      System.out.printf(
          Locale.ROOT,
          "%6dns per Tika.detect(%s) = %s%n",
          System.currentTimeMillis() - start, file, type);
    }
  } else if (file.isDirectory()) {
    for (File child : file.listFiles()) {
      benchmark(child);
    }
  }
}
origin: apache/tika

public static String customCompositeDetector() throws Exception {
  String path = "file:///path/to/prescription-type.xml";
  MimeTypes typeDatabase = MimeTypesFactory.create(new URL(path));
  Tika tika = new Tika(new CompositeDetector(typeDatabase,
      new EncryptedPrescriptionDetector()));
  String type = tika.detect("/path/to/tmp/prescription.xpd");
  return type;
}
origin: apache/tika

  public static String detectWithCustomDetector(String name) throws Exception {
    String config = "/org/apache/tika/mime/tika-mimetypes.xml";
    Detector detector = MimeTypesFactory.create(config);

    Detector custom = new Detector() {
      private static final long serialVersionUID = -5420638839201540749L;

      public MediaType detect(InputStream input, Metadata metadata) {
        String type = metadata.get("my-custom-type-override");
        if (type != null) {
          return MediaType.parse(type);
        } else {
          return MediaType.OCTET_STREAM;
        }
      }
    };

    Tika tika = new Tika(new CompositeDetector(custom, detector));
    return tika.detect(name);
  }
}
origin: apache/tika

@Test
public void testByteOrderMark() throws Exception {
  assertEquals(MediaType.TEXT_PLAIN.toString(), tika.detect(
      new ByteArrayInputStream("\ufefftest".getBytes(UTF_16LE)),
      new Metadata()));
  assertEquals(MediaType.TEXT_PLAIN.toString(), tika.detect(
      new ByteArrayInputStream("\ufefftest".getBytes(UTF_16BE)),
      new Metadata()));
  assertEquals(MediaType.TEXT_PLAIN.toString(), tika.detect(
      new ByteArrayInputStream("\ufefftest".getBytes(UTF_8)),
      new Metadata()));
}
origin: apache/tika

private void testStream(String expected, String urlOrFileName,
    InputStream in) throws IOException {
  assertNotNull("Test stream: [" + urlOrFileName + "] is null!", in);
  if (!in.markSupported()) {
    in = new java.io.BufferedInputStream(in);
  }
  try {
    Metadata metadata = new Metadata();
    // String mime = this.proDetector.detect(in, metadata).toString();
    String mime = tika.detect(in, metadata).toString();
    assertEquals(
        urlOrFileName + " is not properly detected: detected.",
        expected, mime);
    // Add resource name and test again
    metadata.set(TikaCoreProperties.RESOURCE_NAME_KEY, urlOrFileName);
    // mime = this.proDetector.detect(in, metadata).toString();
    mime = tika.detect(in, metadata).toString();
    assertEquals(urlOrFileName
        + " is not properly detected after adding resource name.",
        expected, mime);
  } finally {
    in.close();
  }
}
origin: apache/tika

/**
 * Test for things like javascript files whose content is enclosed in XML
 * comment delimiters, but that aren't actually XML.
 * 
 * @see <a
 *      href="https://issues.apache.org/jira/browse/TIKA-426">TIKA-426</a>
 */
@Test
public void testNotXML() throws IOException {
  assertEquals(MediaType.TEXT_PLAIN.toString(), tika.detect(
      new ByteArrayInputStream("<!-- test -->".getBytes(UTF_8)),
      new Metadata()));
}
origin: apache/tika

/**
 * Test for type detection of empty documents.
 * 
 * @see <a
 *      href="https://issues.apache.org/jira/browse/TIKA-483">TIKA-483</a>
 */
@Test
public void testEmptyDocument() throws IOException {
  assertEquals(MediaType.OCTET_STREAM.toString(), tika.detect(
      new ByteArrayInputStream(new byte[0]), new Metadata()));
  Metadata namehint = new Metadata();
  namehint.set(TikaCoreProperties.RESOURCE_NAME_KEY, "test.txt");
  assertEquals(MediaType.TEXT_PLAIN.toString(),
      tika.detect(new ByteArrayInputStream(new byte[0]), namehint));
  Metadata typehint = new Metadata();
  typehint.set(Metadata.CONTENT_TYPE, "text/plain");
  assertEquals(MediaType.TEXT_PLAIN.toString(),
      tika.detect(new ByteArrayInputStream(new byte[0]), typehint));
}
org.apache.tikaTikadetect

Javadoc

Detects the media type of the given file. The type detection is based on the document content and a potential known file extension.

Use the #detect(String) method when you want to detect the type of the document without actually accessing the file.

Popular methods of Tika

  • <init>
    Creates a Tika facade using the given detector, parser, and translator instances.
  • parseToString
    Parses the file at the given path and returns the extracted text content. To avoid unpredictable exc
  • parse
    Parses the file at the given path and returns the extracted text content. Metadata information extr
  • toString
  • getParser
    Returns the parser instance used by this facade.
  • setMaxStringLength
    Sets the maximum length of strings returned by the parseToString methods.

Popular in Java

  • Creating JSON documents from java classes using gson
  • findViewById (Activity)
  • getResourceAsStream (ClassLoader)
    Returns a stream for the resource with the specified name. See #getResource(String) for a descriptio
  • onCreateOptionsMenu (Activity)
  • FileInputStream (java.io)
    A FileInputStream obtains input bytes from a file in a file system. What files are available depends
  • InetAddress (java.net)
    This class represents an Internet Protocol (IP) address. An IP address is either a 32-bit or 128-bit
  • ServerSocket (java.net)
    This class represents a server-side socket that waits for incoming client connections. A ServerSocke
  • Properties (java.util)
    The Properties class represents a persistent set of properties. The Properties can be saved to a st
  • ExecutorService (java.util.concurrent)
    An Executor that provides methods to manage termination and methods that can produce a Future for tr
  • Pattern (java.util.regex)
    A compiled representation of a regular expression. A regular expression, specified as a string, must
Codota Logo
  • Products

    Search for Java codeSearch for JavaScript codeEnterprise
  • IDE Plugins

    IntelliJ IDEAWebStormAndroid StudioEclipseVisual Studio CodePyCharmSublime TextPhpStormVimAtomGoLandRubyMineEmacsJupyter
  • Company

    About UsContact UsCareers
  • Resources

    FAQBlogCodota Academy Plugin user guide Terms of usePrivacy policyJava Code IndexJavascript Code Index
Get Codota for your IDE now