- normalizeApostrophes
This method converts different apostrophe symbols to a unified form.
- normalizeQuotesHyphens
This method converts different single and double quote symbols to a unified
form. also it reduces tw
- normalizeSpacesAndSoftHyphens
Replaces all unicode space like characters with " " and replaces soft hyphens
[u00ad].
- convertAmpersandStrings
replaces all special html Strings such as(&....; or dddd;) with their original
characters.
- cleanHtmlTagsAndComments
- containsCombiningDiacritics
Returns true iff input contains Combining Diacritics symbols. These characters
sometimes appear in d
- getAttributes
returns a map with attributes of an xml line. For example if [content] is `` and
[element] is `Foo`
- getHtmlBody
- removeAmpresandStrings
This method removes all &....; type strings form html.
- separateWords