Generic Sequence Pattern for regular expressions.
Similar to Java's
java.util.regex.Pattern except it is for sequences over arbitrary types T instead
of just characters.
A regular expression must first be compiled into
an instance of this class. The resulting pattern can then be used to create
a
SequenceMatcher object that can match arbitrary sequences of type T
against the regular expression. All of the state involved in performing a match
resides in the matcher, so many matchers can share the same pattern.
To support sequence matching on a new type T, the following is needed:
- Implement a
NodePattern
- Optionally define a language for node matches and implement
SequencePattern.Parser to compile a
regular expression into a SequencePattern.
- Optionally implement a
MultiPatternMatcher.NodePatternTriggerfor optimizing matches across multiple patterns
- Optionally implement a
NodesMatchChecker to support backreferences
See
TokenSequencePattern for an example of how this class can be extended
to support a specific type
T.
To use
SequencePattern p = SequencePattern.compile("....");
To support a new type
T:
- For a type
T to be matchable, it has to have a corresponding
NodePattern that indicates
whether a node is matched or not (see
CoreMapNodePattern for example)
- To compile a string into corresponding pattern, will need to create a parser
(see inner class
Parser,
TokenSequencePattern and
TokenSequenceParser.jj)
SequencePattern supports the following standard regex features:
- Concatenation
- Or
- Groups (capturing / noncapturing )
- Quantifiers (greedy / nongreedy)
SequencePattern also supports the following less standard features:
- Environment (see
Env) with respect to which the patterns are compiled
- Binding of variables
Use
Env to bind variables for use when compiling patterns
Can also bind names to groups (see
SequenceMatchResult for accessor methods to retrieve matched groups)
- Backreference matches - need to specify how back references are to be matched using
NodesMatchChecker
- Multinode matches - for matching of multiple nodes using non-regex (at least not regex over nodes) patterns
(need to have corresponding
MultiNodePattern,
see
MultiCoreMapNodePattern for example)
- Conjunctions - conjunctions of sequence patterns (works for some cases)
Note that this and the inherited classes do not implement any custom equals and hashCode functions.