Process some input data according to the expected layout above. However, this
is not a dumb lexer and intelligently processes the data to allow for some
errors (which will be reported but are recovered from) and automatically
processes some constructs so that the real parser building on top of this
lexer does not have to (ie. it identifies ; = , := in the values)
The end of result of a 'lex' is that the data is broken into these tokens:
NAME, COLON, VALUE, NEWLINE. Importantly:
- value continuations are dealt with during the lex stage so only complete
VALUE tokens are in the lex token stream
- tokens are inserted where they will help the next stage cope with the
data. For example if the input data is apparently missing a COLON to
terminate a value, this lexer will report the problem but also insert a COLON
- According to the specification for processing this data, two newlines are
appended to data for processing - this allows for a file that is missing a
final return and cleanly marks the end of a section. These 2 newline tokens
will appear in the token stream output from this lexer
- Due to the lexer automatically handling values commencing on the line
after the name and values being spread across several lines, the NEWLINEs
that may be in the input whilst lexing a "NAME: VALUE" sequence are not
contained in the output token stream. Only the NEWLINEs at the end of a value
or for blank lines are included.
Hopefully building the above knowledge into the lexer isn't make it too
difficult to understand...
Concurrent Semantics
This class is thread safe.