net.htmlparser.jericho.Source.charAt java code examples

/**
 * Returns the character at the specified index.
 * <p>
 * This is logically equivalent to <code>toString().charAt(index)</code>
 * for valid argument values <code>0 <= index < length()</code>.
 * <p>
 * However because this implementation works directly on the underlying document source string,
 * it should not be assumed that an <code>IndexOutOfBoundsException</code> is thrown
 * for an invalid argument value.
 *
 * @param index  the index of the character.
 * @return the character at the specified index.
 */
public char charAt(final int index) {
  return source.charAt(begin+index);
}

/**
 * Indicates whether this character reference is terminated by a semicolon (<code>;</code>).
 * <p>
 * Conversely, this library defines an <i><a name="Unterminated">unterminated</a></i> character reference as one which does
 * not end with a semicolon.
 * <p>
 * The SGML specification allows unterminated character references in some circumstances, and because the
 * HTML 4.01 specification states simply that
 * "<a target="_blank" href="http://www.w3.org/TR/REC-html40/charset.html#entities">authors may use SGML character references</a>",
 * it follows that they are also valid in HTML documents, although their use is strongly discouraged.
 * <p>
 * Unterminated character references are not allowed in <a target="_blank" href="http://www.w3.org/TR/xhtml1/">XHTML</a> documents.
 *
 * @return <code>true</code> if this character reference is terminated by a semicolon, otherwise <code>false</code>.
 * @see #decode(CharSequence encodedText, boolean insideAttributeValue)
 */
public boolean isTerminated() {
  return source.charAt(end-1)==';';
}

/**
 * Indicates whether this start tag is syntactically an <a target="_blank" href="http://www.w3.org/TR/REC-xml#dt-eetag">empty-element tag</a>.
 * <p>
 * This is signified by the characters "/&gt;" at the end of the start tag.
 * <p>
 * Only a {@linkplain StartTagType#NORMAL normal} start tag can be syntactically an empty-element tag.
 * <p>
 * This property simply reports whether the syntax of the start tag is consistent with that of an empty-element tag,
 * it does not guarantee that this start tag's {@linkplain #getElement() element} is actually {@linkplain Element#isEmpty() empty}.
 * <p>
 * This possible discrepancy reflects the way major browsers interpret illegal empty element tags used in
 * <a href="HTMLElements.html#HTMLElement">HTML elements</a>, and is explained further in the documentation of the
 * {@link #isEmptyElementTag()} property.
 *
 * @return <code>true</code> if this start tag is syntactically an <a target="_blank" href="http://www.w3.org/TR/REC-xml#dt-eetag">empty-element tag</a>, otherwise <code>false</code>.
 * @see #isEmptyElementTag()
 */
public boolean isSyntacticalEmptyElementTag() {
  return startTagType==StartTagType.NORMAL && source.charAt(end-2)=='/';
}

/**
 * Indicates whether this segment consists entirely of {@linkplain #isWhiteSpace(char) white space}.
 * @return <code>true</code> if this segment consists entirely of {@linkplain #isWhiteSpace(char) white space}, otherwise <code>false</code>.
 */
public final boolean isWhiteSpace() {
  for (int i=begin; i<end; i++)
    if (!isWhiteSpace(source.charAt(i))) return false;
  return true;
}

/**
 * Returns the character used to quote the value.
 * <p>
 * The return value is either a double-quote (<code>"</code>), a single-quote (<code>'</code>), or a space.
 *
 * @return the character used to quote the value, or a space if the value is not quoted or this attribute has no value.
 */
public char getQuoteChar() {
  if (valueSegment==valueSegmentIncludingQuotes) return ' '; // no quotes
  return source.charAt(valueSegmentIncludingQuotes.getBegin());
}

static RowColumnVector[] getCacheArray(final Source source) {
  if (source.isStreamed()) return STREAMED;
  final int lastSourcePos=source.end-1;
  final ArrayList<RowColumnVector> list=new ArrayList<RowColumnVector>();
  int pos=0;
  list.add(FIRST);
  int row=1;
  while (pos<=lastSourcePos) {
    final char ch=source.charAt(pos);
    if (ch=='\n' || (ch=='\r' && (pos==lastSourcePos || source.charAt(pos+1)!='\n'))) list.add(new RowColumnVector(++row,1,pos+1));
    pos++;
  }
  return list.toArray(new RowColumnVector[list.size()]);
}

private void appendNonPreformattedSegment(final int begin, final int end) throws IOException {
  assert begin<end;
  assert begin>=renderedIndex;
  final String text=CharacterReference.decodeCollapseWhiteSpace(source.subSequence(begin,end),convertNonBreakingSpaces);
  if (text.length()==0) {
    // collapsed text is zero length but original segment wasn't, meaning it consists purely of white space.
    if (!ignoreInitialWhiteSpace) lastCharWhiteSpace=true;
    return;
  }
  appendNonPreformattedText(text,Segment.isWhiteSpace(source.charAt(begin)),Segment.isWhiteSpace(source.charAt(end-1)));
}

/**
 * Parses the attributes specified in this start tag, regardless of the type of start tag.
 * This method is only required in the unusual situation where attributes exist in a start tag whose 
 * {@linkplain #getStartTagType() type} doesn't {@linkplain StartTagType#hasAttributes() have attributes}.
 * <p>
 * See the documentation of the {@link #parseAttributes()} method for more information.
 *
 * @param maxErrorCount  the maximum number of minor errors allowed while parsing
 * @return the attributes specified in this start tag, or <code>null</code> if too many errors occur while parsing.
 * @see #getAttributes()
 */
public Attributes parseAttributes(final int maxErrorCount) {
  if (attributes!=null) return attributes;
  final int maxEnd=end-startTagType.getClosingDelimiter().length();
  int attributesBegin=begin+1+name.length();
  // skip any non-name characters directly after the name (which are quite common)
  while (!isXMLNameStartChar(source.charAt(attributesBegin))) {
    attributesBegin++;
    if (attributesBegin==maxEnd) return null;
  }
  Attributes attributes=Attributes.construct(source,begin,attributesBegin,maxEnd,startTagType,name,maxErrorCount);
  if (attributes!=null) attributes.setStartTag(this);
  return attributes;
}

final int nameSegmentEnd=i+name.length();
while (i<nameSegmentEnd) {
  sb.append(source.charAt(i));
  i++;

boolean unterminated=false;
while (true) {
  final char ch=source.charAt(x);
  if (ch==';') {
    end=x+1;

  protected Tag constructTagAt(final Source source, final int pos) {
    final Tag tag=super.constructTagAt(source,pos);
    if (tag==null) return null;
    // A mason named block does not have a '%' before its closing '>' delimiter and requires a matching end tag.
    if (source.charAt(tag.getEnd()-2)=='%') return null; // this is a common server tag, not a named block
    if (source.getNextEndTag(tag.getEnd(),tag.getName(),getCorrespondingEndTagType())==null) return null;
    return tag;
  }
}

Popular methods of Source

<init>
getAllElements
getChildElements
Returns a list of the top-level Element in the document element hierarchy. The objects in the list a
fullSequentialParse
Parses all of the Tag in this source document sequentially from beginning to end. Calling this metho
getRow
Returns the row number of the specified character position in the source document.
setLogger
Sets the Logger that handles log messages. Specifying a null argument disables logging completely fo
subSequence
Returns a new character sequence that is a subsequence of this source document.
getAllStartTags
getNextEndTag
Returns the EndTag of the specified EndTagType beginning at or immediately following the specified p
toString
Returns the source text as a String.
getAllTags
Returns a list of all Tag in this source document. Calling this method on the Source object performs
getEnd

Popular in Java

Finding current android device location
getOriginalFilename (MultipartFile)
Return the original filename in the client's filesystem.This may contain path information depending
notifyDataSetChanged (ArrayAdapter)
findViewById (Activity)
ServerSocket (java.net)
This class represents a server-side socket that waits for incoming client connections. A ServerSocke
HashMap (java.util)
HashMap is an implementation of Map. All optional operations are supported.All elements are permitte
Iterator (java.util)
An iterator over a sequence of objects, such as a collection.If a collection has been changed since
Options (org.apache.commons.cli)
Main entry-point into the library. Options represents a collection of Option objects, which describ
Rectangle (java.awt)
A Rectangle specifies an area in a coordinate space that is enclosed by the Rectangle object's top-
Location (org.springframework.beans.factory.parsing)
Class that models an arbitrary location in a Resource.Typically used to track the location of proble
Top 12 Jupyter Notebook extensions

How to use charAtmethodin net.htmlparser.jericho.Source

Best Java code snippets using net.htmlparser.jericho.Source.charAt (Showing top 11 results out of 315)

How to use
charAt
method
in
net.htmlparser.jericho.Source