From 996727f1ea6da67f0dd652fcdd2a1d1a7fb5c81f Mon Sep 17 00:00:00 2001 From: Richard Eckart de Castilho Date: Sun, 4 Feb 2024 11:46:19 +0100 Subject: [PATCH] #250 - Convenience for setting the document language - Update documentation --- README.rst | 41 ++++++++++++++++++++++++++++++++++------- 1 file changed, 34 insertions(+), 7 deletions(-) diff --git a/README.rst b/README.rst index 1604dc5..d9417bf 100644 --- a/README.rst +++ b/README.rst @@ -72,8 +72,9 @@ Usage Example CAS XMI and types system files can be found under :code:`tests\test_files`. -Loading a CAS -~~~~~~~~~~~~~ +Reading a CAS file +~~~~~~~~~~~~~~~~~~ +.. _reading_a_cas_file: **From XMI:** A CAS can be deserialized from the UIMA CAS XMI (XML 1.0) format either by reading from a file or string using :code:`load_cas_from_xmi`. @@ -98,8 +99,9 @@ Most UIMA JSON CAS files come with an embedded typesystem, so it is not necessar with open('cas.json', 'rb') as f: cas = load_cas_from_json(f) -Writing a CAS -~~~~~~~~~~~~~ +Writing a CAS file +~~~~~~~~~~~~~~~~~~ +.. _writing_a_cas_file: **To XMI:** A CAS can be serialized to XMI either by writing to a file or be returned as a string using :code:`cas.to_xmi()`. @@ -126,6 +128,29 @@ returned as a string using :code:`cas.to_xmi()`. # Written to file cas.to_json("my_cas.json") +Creating a CAS +~~~~~~~~~~~~~~ +.. _creating_a_cas: + +A CAS (Common Analysis System) object typically represents a (text) document. When using cassis, +you will likely most often :ref:`reading ` existing CAS files, modify them and then +:ref:`writing ` them out again. But you can also create CAS objects from scratch, +e.g. if you want to convert some data into a CAS object in order to create a pre-annotated text. +If you do not have a pre-defined typesystem to work with, you will have to :ref:`define one `. + +.. code:: python + + typesystem = TypeSystem() + + cas = Cas( + sofa_string = "Joe waited for the train . The train was late .", + document_language = "en", + typesystem = typesystem) + + print(cas.sofa_string) + print(cas.sofa_mime) + print(cas.document_language) + Adding annotations ~~~~~~~~~~~~~~~~~~ @@ -239,6 +264,7 @@ The same goes for setting: Creating types and adding features ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +.. _creating_a_typesystem: .. code:: python @@ -269,12 +295,13 @@ properties of the Sofa can be read and written: .. code:: python - cas = Cas() - cas.sofa_string = "Joe waited for the train . The train was late ." - cas.sofa_mime = "text/plain" + cas = Cas( + sofa_string = "Joe waited for the train . The train was late .", + document_language = "en") print(cas.sofa_string) print(cas.sofa_mime) + print(cas.document_language) Array support ~~~~~~~~~~~~~