- setExtractInlineImages
If true, extract inline embedded OBXImages.Beware: some PDF documents of modest
size (~4MB) can cont
- <init>
Loads properties from InputStream and then tries to close InputStream. If there
is an IOException, t
- setExtractUniqueInlineImagesOnly
Multiple pages within a PDF file might refer to the same underlying image. If
#extractUniqueInlineIm
- setOcrStrategy
Which strategy to use for OCR
- setSuppressDuplicateOverlappingText
If true, the parser should try to remove duplicated text over the same region.
This is needed for so
- configure
Configures the given pdf2XHTML.
- setEnableAutoSpace
If true (the default), the parser should estimate where spaces should be
inserted between words. For
- setExtractAcroFormContent
If true (the default), extract content from AcroForms at the end of the
document. If an XFA is found
- setExtractAnnotationText
If true (the default), text in annotations will be extracted.
- setSortByPosition
If true, sort text tokens by their x/y position before extracting text. This may
be necessary for so
- getAccessChecker
- getAverageCharTolerance