An Analysis Engine is a component responsible for analyzing unstructured information, discovering and representing semantic content. Unstructured information includes, but is not restricted to, text documents.
An AnalysisEngine operates on an "analysis structure" (implemented by {@link org.apache.uima.cas.CAS}). The CAS contains the artifact to be processed as well as semantic information already inferred from that artifact. The AnalysisEngine analyzes this information and adds new information to the CAS.
To create an instance of an Analysis Engine, an application should call {@link org.apache.uima.UIMAFramework#produceAnalysisEngine(ResourceSpecifier)}.
A typical application interacts with the Analysis Engine interface as follows:
- Call {@link #newCAS()} to create a new Common Analysis System appropriate for thisAnalysisEngine.
- Use the {@link CAS} interface to populate the
CAS with the artifact to beanalyzed any information known about this document (e.g. the language of a text document). - Optionally, create a {@link org.apache.uima.analysis_engine.ResultSpecification} thatidentifies the results you would like this AnalysisEngine to generate (e.g. people, places, and dates), and call the {#link {@link #setResultSpecification(ResultSpecification)} method.
- Call {@link #process(CAS)} - the AnalysisEngine will perform its analysis.
- Retrieve the results from the {@link CAS}.
- Call {@link CAS#reset()} to clear out the
CAS and prepare for processing anew artifact. - Repeat steps 2 through 6 for each artifact to be processed.
Important: It is highly recommended that you reuse CAS objects rather than calling newCAS() prior to each analysis. This is because CAS objects may be expensive to create and may consume a significant amount of memory.
Instead of using the {@link CAS} interface, applications may wish to use the Java-object-based{@link JCas} interface. In that case, the call to newCAS from step 1 above wouldbe replaced by {@link #newJCas()}, and the {@link #process(JCas)} method would be used.
Analysis Engine implementations may or may not be capable of simultaneously processing multiple documents in a multithreaded environment. See the documentation associated with the implementation or factory method (e.g. ( {@link org.apache.uima.UIMAFramework#produceAnalysisEngine(ResourceSpecifier)}) that you are using.