Represents either a
StartTag or
EndTag in a specific
Source document.
Take the following HTML segment as an example:
<p>This is a sample paragraph.</p>
The "<p>
" is represented by a
StartTag object, and the "</p>
" is represented by an
EndTag object,
both of which are subclasses of the Tag
class.
The whole segment, including the start tag, its corresponding end tag and all of the content in between, is represented by an
Element object.
The following process describes how each tag is identified by the parser:
-
Every '
<
' character found in the source document is considered to be the start of a tag.
The characters following it are compared with the
TagType#getStartDelimiter()of all the
TagType#register()
TagType, and a list of matching tag types
is determined.
-
A more detailed analysis of the source is performed according to the features of each matching tag type from the first step,
in order of precedence, until a valid tag is able to be constructed.
The analysis performed in relation to each candidate tag type is a two-stage process:
-
The position of the tag is checked to determine whether it is
TagType#isValidPosition(Source,int,int[]).
In theory, a
TagType#isServerTag() is valid in any position, but a non-server tag is not valid inside any other tag,
nor inside elements with CDATA content such as
HTMLElementName#SCRIPT and
HTMLElementName#STYLE elements.
Theory dictates therefore that
StartTagType#COMMENT and explicit
StartTagType#CDATA_SECTIONinside script elements should not be recognised as tags.
The behaviour of the parser however does not always strictly adhere to the theory, to maintain compatability with major browsers
and also for efficiency reasons.
The
TagType#isValidPosition(Source, int pos, int[] fullSequentialParseData) method is responsible for this check
and has a common default implementation for all tag types
(although custom tag types can override it if necessary).
Its behaviour differs depending on whether or not a
Source#fullSequentialParse() is peformed.
See the documentation of the
TagType#isValidPosition(Source,int,int[]) method for full details.
-
A final analysis is performed by the
TagType#constructTagAt(Source, int pos) method of the candidate tag type.
This method returns a valid
Tag object if all conditions of the candidate tag type are met, otherwise it returns
null
and the process continues with the next candidate tag type.
If the source does not match the start delimiter or syntax of any registered tag type, the segment spanning it and the next
'>
' character is taken to be an
#isUnregistered() tag.
Some tag search methods ignore unregistered tags. See the
#isUnregistered() method for more information.
See the documentation of the
TagType class for more details on how tags are recognised.
Methods that get tags in a source document are collectively referred to as Tag Search Methods.
They are found mostly in the
Source and
Segment classes, and can be generally categorised as follows:
Open Search:
These methods search for tags of any
#getName() and
#getTagType().
-
Tag#getNextTag()
-
Tag#getPreviousTag()
-
Segment#getAllElements()
-
Segment#getFirstElement()
-
Source#getTagAt(int pos)
-
Source#getPreviousTag(int pos)
-
Source#getNextTag(int pos)
-
Source#getEnclosingTag(int pos)
-
Segment#getAllTags()
-
Segment#getAllStartTags()
-
Segment#getFirstStartTag()
-
Source#getPreviousStartTag(int pos)
-
Source#getNextStartTag(int pos)
-
Source#getPreviousEndTag(int pos)
-
Source#getNextEndTag(int pos)
Named Search:
These methods include a parameter called
name
which is used to specify the
#getName() of the tag to search for.
Specifying a name that ends in a colon (
:
) searches for all elements or tags in the specified XML namespace.
-
Segment#getAllElements(String name)
-
Segment#getFirstElement(String name)
-
Segment#getAllStartTags(String name)
-
Segment#getFirstStartTag(String name)
-
Source#getPreviousStartTag(int pos,String name)
-
Source#getNextStartTag(int pos,String name)
-
Source#getPreviousEndTag(int pos,String name)
-
Source#getNextEndTag(int pos,String name)
-
Source#getNextEndTag(int pos, String name, EndTagType)
Tag Type Search:
These methods typically include a parameter called
tagType
which is used to specify the
#getTagType() of the tag to search for.
In some methods the search parameter is restricted to the
StartTagType or
EndTagType subclass of
TagType
.
-
Segment#getAllElements(StartTagType)
-
Segment#getAllTags(TagType)
-
Segment#getAllStartTags(StartTagType)
-
Segment#getFirstStartTag(StartTagType)
-
Source#getPreviousTag(int pos, TagType)
-
Source#getPreviousStartTag(int pos, StartTagType)
-
Source#getPreviousEndTag(int pos, EndTagType)
-
Source#getNextTag(int pos, TagType)
-
Source#getNextStartTag(int pos, StartTagType)
-
Source#getNextEndTag(int pos, EndTagType)
-
Source#getEnclosingTag(int pos, TagType)
-
Source#getNextEndTag(int pos, String name, EndTagType)
Attribute Search:
These methods perform the search based on an attribute name and value.
-
Segment#getAllElements(String attributeName,String value,boolean valueCaseSensitive)
-
Segment#getFirstElement(String attributeName,String value,boolean valueCaseSensitive)
-
Segment#getAllStartTags(String attributeName,String value,boolean valueCaseSensitive)
-
Segment#getFirstStartTag(String attributeName,String value,boolean valueCaseSensitive)
-
Segment#getAllElements(String attributeName,Pattern valueRegexPattern)
-
Segment#getFirstElement(String attributeName,Pattern valueRegexPattern)
-
Segment#getAllStartTags(String attributeName,Pattern valueRegexPattern)
-
Segment#getFirstStartTag(String attributeName,Pattern valueRegexPattern)
-
Segment#getAllElementsByClass(String className)
-
Segment#getFirstElementByClass(String className)
-
Segment#getAllStartTagsByClass(String className)
-
Segment#getFirstStartTagByClass(String className)
-
Source#getElementById(String id)
-
Source#getNextElement(int pos,String attributeName,Pattern valueRegexPattern)
-
Source#getNextElement(int pos,String attributeName,String value,boolean valueCaseSensitive)
-
Source#getNextElementByClass(int pos,String className)
-
Source#getNextStartTag(int pos,String attributeName,Pattern valueRegexPattern)
-
Source#getNextStartTag(int pos,String attributeName,String value,boolean valueCaseSensitive)
-
Source#getNextStartTagByClass(int pos,String className)