Represents an
element
in a specific
Source document, which encompasses a
#getStartTag(),
an optional
#getEndTag() and all
#getContent() in between.
Take the following HTML segment as an example:
<p>This is a sample paragraph.</p>
The whole segment is represented by an Element
object. This is comprised of the
StartTag "<p>
",
the
EndTag "</p>
", as well as the text in between.
An element may also contain other elements between its start and end tags.
The term normal element refers to an element having a
#getStartTag()with a
StartTag#getStartTagType() of
StartTagType#NORMAL.
This comprises all
HTMLElements and non-HTML elements.
Element
instances are obtained using one of the following methods:
-
StartTag#getElement()
-
EndTag#getElement()
-
Segment#getAllElements()
-
Segment#getAllElements(String name)
-
Segment#getAllElements(StartTagType)
See also the
HTMLElements class, and the
XML 1.0 specification for elements.
The three possible structures of an element are listed below:
Single Tag Element:
Example:
<img src="mypicture.jpg">
The element consists only of a single
#getStartTag() and has no
#getContent()(although the start tag itself may have
StartTag#getTagContent()).
#getEndTag()==null
#isEmpty()==true
#getEnd()==
#getStartTag().
#getEnd()
This occurs in the following situations:
- An HTML element for which the
HTMLElements#getEndTagForbiddenElementNames().
- An HTML element for which the
HTMLElements#getEndTagRequiredElementNames(),
but the end tag is not present in the source document.
- An HTML element for which the
HTMLElements#getEndTagOptionalElementNames(),
where the implicitly terminating tag is situated immediately after the element's
#getStartTag().
- An
#isEmptyElementTag()
- A non-HTML element that is not an
#isEmptyElementTag() but is missing its end tag.
- An element with a start tag of a
StartTag#getStartTagType() that does not define a
StartTagType#getCorrespondingEndTagType().
- An element with a start tag of a
StartTag#getStartTagType() that does define a
StartTagType#getCorrespondingEndTagType() but is missing its end tag.
Explicitly Terminated Element:
Example:
<p>This is a sample paragraph.</p>
The element consists of a
#getStartTag(),
#getContent(),
and an
#getEndTag().
#getEndTag()!=null
.
#isEmpty()==false
(provided the end tag doesn't immediately follow the start tag)
#getEnd()==
#getEndTag().
#getEnd().
This occurs in the following situations, assuming the start tag's matching end tag is present in the source document:
- An HTML element for which the end tag is either
HTMLElements#getEndTagRequiredElementNames() or
HTMLElements#getEndTagOptionalElementNames().
- A non-HTML element that is not an
#isEmptyElementTag().
- An element with a start tag of a
StartTag#getStartTagType() that defines a
StartTagType#getCorrespondingEndTagType().
Implicitly Terminated Element:
Example:
<p>This text is included in the paragraph element even though no end tag is present.
<p>This is the next paragraph.
The element consists of a
#getStartTag() and
#getContent(),
but no
#getEndTag().
#getEndTag()==null
.
#isEmpty()==false
#getEnd()!=
#getStartTag().
#getEnd().
This only occurs in an HTML element for which the
HTMLElements#getEndTagOptionalElementNames().
The element ends at the start of a tag which implies the termination of the element, called the implicitly terminating tag.
If the implicitly terminating tag is situated immediately after the element's
#getStartTag(),
the element is classed as a single tag element.
See the element parsing rules for HTML elements with optional end tags
for details on which tags can implicitly terminate a given element.
See also the documentation of the
HTMLElements#getEndTagOptionalElementNames() method.
The following rules describe the algorithm used in the
StartTag#getElement() method to construct an element.
The detection of the start tag's matching end tag or other terminating tags always takes into account the possible nesting of elements.
-
If the start tag has a
StartTag#getStartTagType() of
StartTagType#NORMAL:
-
If the
StartTag#getName() of the start tag matches one of the
recognised
HTMLElementName (indicating an HTML element):
-
If the end tag for an element of this
StartTag#getName() is
HTMLElements#getEndTagForbiddenElementNames(),
the parser does not conduct any search for an end tag and a single tag element is created.
-
If the end tag for an element of this
StartTag#getName() is
HTMLElements#getEndTagRequiredElementNames(), the parser searches for the start tag's matching end tag.
-
If the matching end tag is found, an explicitly terminated element is created.
-
If no matching end tag is found, the source document is not valid HTML and the incident is
Source#getLogger() as a missing required end tag.
In this situation a single tag element is created.
-
If the end tag for an element of this
StartTag#getName() is
HTMLElements#getEndTagOptionalElementNames(), the parser searches not only for the start tag's matching end tag,
but also for any other tag that implicitly terminates the element.
For each tag (T2) following the start tag (ST1) of this element (E1):
-
If T2 is a start tag:
-
If the
StartTag#getName() of T2 is in the list of
HTMLElements#getNonterminatingElementNames(String) for E1,
then continue evaluating tags from the
Element#getEnd() of T2's corresponding
StartTag#getElement().
-
If the
StartTag#getName() of T2 is in the list of
HTMLElements#getTerminatingStartTagNames(String) for E1,
then E1 ends at the
StartTag#getBegin() of T2.
If T2 follows immediately after ST1, a single tag element is created,
otherwise an implicitly terminated element is created.
-
If T2 is an end tag:
-
If no more tags are present in the source document, then E1 ends at the end of the file, and an
implicitly terminated element is created.
Note that the syntactical indication of an
StartTag#isSyntacticalEmptyElementTag() in the start tag
is ignored when determining the end of HTML elements.
See the documentation of the
#isEmptyElementTag() method for more information.
-
If the
StartTag#getName() of the start tag does not match one of the
recognised
HTMLElementName (indicating a non-HTML element):
-
If the start tag is
StartTag#isSyntacticalEmptyElementTag(),
the parser does not conduct any search for an end tag and a single tag element is created.
-
Otherwise, section 3.1
of the XML 1.0 specification states that a matching end tag MUST be present, and
the parser searches for the start tag's matching end tag.
-
If the matching end tag is found, an explicitly terminated element is created.
-
If no matching end tag is found, the source document is not valid XML and the incident is
Source#getLogger() as a missing required end tag.
In this situation a single tag element is created.
-
If the start tag has any
StartTag#getStartTagType() other than
StartTagType#NORMAL:
-
If the start tag's type does not define a
StartTagType#getCorrespondingEndTagType(),
the parser does not conduct any search for an end tag and a single tag element is created.
-
If the start tag's type does define a
StartTagType#getCorrespondingEndTagType(),
the parser assumes that a matching end tag is required and searches for it.