How to use
PDFEncodedStringDecoder
in
org.apache.tika.parser.pdf

Best Java code snippets using org.apache.tika.parser.pdf.PDFEncodedStringDecoder (Showing top 4 results out of 315)

private String decode(String value) {
  if (PDFEncodedStringDecoder.shouldDecode(value)) {
    PDFEncodedStringDecoder d = new PDFEncodedStringDecoder();
    return d.decode(value);
  }
  return value;
}

private String decode(String value) {
  if (PDFEncodedStringDecoder.shouldDecode(value)) {
    PDFEncodedStringDecoder d = new PDFEncodedStringDecoder();
    return d.decode(value);
  }
  return value;
}

private String decode(String value) {
  if (PDFEncodedStringDecoder.shouldDecode(value)) {
    PDFEncodedStringDecoder d = new PDFEncodedStringDecoder();
    return d.decode(value);
  }
  return value;
}

private String decode(String value) {
  if (PDFEncodedStringDecoder.shouldDecode(value)) {
    PDFEncodedStringDecoder d = new PDFEncodedStringDecoder();
    return d.decode(value);
  }
  return value;
}

Javadoc

In fairly rare cases, a PDF's XMP will contain a string that has incorrectly been encoded with PDFEncoding: an octal for non-ascii and ascii for ascii, e.g. "\376\377\000M\000i\000c\000r\000o\000s\000o\000f\000t\000"

This class can be used to decode those strings.

See TIKA-1678. Many thanks to Andrew Jackson for raising this issue and Tilman Hausherr for the solution.

As of this writing, we are only handling strings that start with an encoded BOM. Andrew Jackson found a handful of other examples (e.g. this ISO-8859-7 string: "Microsoft Word - \\323\\365\\354\\354\\345\\364\\357\\367\\336 \\364\\347\\362 PRAKSIS \\363\\364\\357") that we aren't currently handling.

Most used methods

<init>
decode
This assumes that #shouldDecode(String) has been called and has returned true. If you run this on a
shouldDecode
Does this string contain an octal-encoded UTF BOM? Call this statically to determine if you should b

Popular in Java

Making http post requests using okhttp
getContentResolver (Context)
startActivity (Activity)
getExternalFilesDir (Context)
FileReader (java.io)
A specialized Reader that reads from a file in the file system. All read requests made by calling me
MalformedURLException (java.net)
This exception is thrown when a program attempts to create an URL from an incorrect specification.
SocketTimeoutException (java.net)
This exception is thrown when a timeout expired on a socket read or accept operation.
DateFormat (java.text)
Formats or parses dates and times.This class provides factories for obtaining instances configured f
Collectors (java.util.stream)
StringUtils (org.apache.commons.lang)
Operations on java.lang.String that arenull safe. * IsEmpty/IsBlank - checks if a String contains
Top Vim plugins

How to usePDFEncodedStringDecoder in org.apache.tika.parser.pdf

Best Java code snippets using org.apache.tika.parser.pdf.PDFEncodedStringDecoder (Showing top 4 results out of 315)

How to use
PDFEncodedStringDecoder
in
org.apache.tika.parser.pdf