Analysis Components are the primitive "building blocks" from which UIMA solutions are built. This
is the common superinterface for all user-developed components that take a CAS as input and may
produce CASes as output.
Typically, developers do not implement this interface directly. There are several abstract
classes that you can inherit from depending on the function that your component performs and
which CAS interface it uses:
- Annotator: Receives an input CAS and updates it
-
JCasAnnotator_ImplBase: Uses JCas interface
-
CasAnnotator_ImplBase: Uses CASinterface
-
org.apache.uima.collection.CasConsumer_ImplBase: Receives an input CAS but does not
update it. May update a data structure based on information in the CASes it receives.
- CasMultiplier: Receives an input CAS and, in addition to updating it, may output new CASes.
One common use of this is to split a CAS into pieces, emitting each piece as a separate output
CAS.
-
JCasMultiplier_ImplBase: Uses JCas interface
-
CasMultiplier_ImplBase: Uses CAS interface
-
org.apache.uima.collection.CollectionReader_ImplBase: A special type of
CasMultiplier that, for historical reasons, does not take an input CAS.
The framework interacts with AnalysisComponents as follows:
- The framework calls the AnalysisComponent's
#process(AbstractCas) method with an
input CAS.
- The framework then calls the AnalysisComponent's
#hasNext() method, which should
return
true
if the AnalysisComponent intends to produce new output CASes, or
false
if the AnalysisComponent will not produce new output CASes.
- If the AnalysisComponent returns
true
, the framework will then call the
#next() method.
- The AnalysisComponent, in its
next
method, can create a new CAS by calling
UimaContext#getEmptyCas(Class) (or instead, one of the helper methods in the ImplBase
class that it extended). It then populates the empty CAS and returns it.
- Steps 2 & 3 continue for each subsequent output CAS, until
hasNext()
returns
false.
From the time when
process
is called until the time when
hasNext
returns false, the AnalysisComponent "owns" the CAS that was passed to
process
.
The AnalysisComponent is permitted to make changes to this CAS. Once
hasNext
returns false, the AnalysisComponent releases control of the initial CAS. This means that the
AnalysisComponent must finish all updates to the initial CAS prior to returning false from
hasNext
.
However, if the process
method is called a second time, before hasNext
has returned
false, this is a signal to the AnalysisComponent to cancel all processing of the previous CAS and begin
processing the new CAS instead.