Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#250 - Convenience for setting the document language #304

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
45 changes: 38 additions & 7 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -72,8 +72,10 @@ Usage

Example CAS XMI and types system files can be found under :code:`tests\test_files`.

Loading a CAS
~~~~~~~~~~~~~
.. _reading_a_cas_file:

Reading a CAS file
~~~~~~~~~~~~~~~~~~

**From XMI:** A CAS can be deserialized from the UIMA CAS XMI (XML 1.0) format either
by reading from a file or string using :code:`load_cas_from_xmi`.
Expand All @@ -98,8 +100,10 @@ Most UIMA JSON CAS files come with an embedded typesystem, so it is not necessar
with open('cas.json', 'rb') as f:
cas = load_cas_from_json(f)

Writing a CAS
~~~~~~~~~~~~~
.. _writing_a_cas_file:

Writing a CAS file
~~~~~~~~~~~~~~~~~~

**To XMI:** A CAS can be serialized to XMI either by writing to a file or be
returned as a string using :code:`cas.to_xmi()`.
Expand All @@ -126,6 +130,30 @@ returned as a string using :code:`cas.to_xmi()`.
# Written to file
cas.to_json("my_cas.json")

.. _creating_a_cas:

Creating a CAS
~~~~~~~~~~~~~~

A CAS (Common Analysis System) object typically represents a (text) document. When using cassis,
you will likely most often :ref:`reading <reading_a_cas_file>` existing CAS files, modify them and then
:ref:`writing <writing_a_cas_file>` them out again. But you can also create CAS objects from scratch,
e.g. if you want to convert some data into a CAS object in order to create a pre-annotated text.
If you do not have a pre-defined typesystem to work with, you will have to :ref:`define one <creating_a_typesystem>`.

.. code:: python

typesystem = TypeSystem()

cas = Cas(
sofa_string = "Joe waited for the train . The train was late .",
document_language = "en",
typesystem = typesystem)

print(cas.sofa_string)
print(cas.sofa_mime)
print(cas.document_language)

Adding annotations
~~~~~~~~~~~~~~~~~~

Expand Down Expand Up @@ -237,6 +265,8 @@ The same goes for setting:
assert lst["tail.tail.head"] == "newer_baz"


.. _creating_a_typesystem:

Creating types and adding features
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Expand Down Expand Up @@ -269,12 +299,13 @@ properties of the Sofa can be read and written:

.. code:: python

cas = Cas()
cas.sofa_string = "Joe waited for the train . The train was late ."
cas.sofa_mime = "text/plain"
cas = Cas(
sofa_string = "Joe waited for the train . The train was late .",
document_language = "en")

print(cas.sofa_string)
print(cas.sofa_mime)
print(cas.document_language)

Array support
~~~~~~~~~~~~~
Expand Down
35 changes: 35 additions & 0 deletions cassis/cas.py
Original file line number Diff line number Diff line change
Expand Up @@ -210,6 +210,7 @@ def __init__(
lenient: bool = False,
sofa_string: str = None,
sofa_mime: str = None,
document_language: str = None,
):
"""Creates a CAS with the specified typesystem. If no typesystem is given, then the default one
is used which only contains UIMA-predefined types.
Expand Down Expand Up @@ -241,6 +242,9 @@ def __init__(
else:
self.sofa_mime = "text/plain"

if document_language is not None:
self.document_language = document_language

@property
def typesystem(self) -> TypeSystem:
return self._typesystem
Expand Down Expand Up @@ -512,6 +516,19 @@ def get_sofa(self) -> Sofa:
"""
return self._current_view.sofa

def get_document_annotation(self) -> FeatureStructure:
"""Get the DocumentAnnotation feature structure associated with this CAS view. If none exists, one is created.

Returns:
The DocumentAnnotation associated with this CAS view.
"""
try:
return self.select(TYPE_NAME_DOCUMENT_ANNOTATION)[0]
except IndexError:
document_annotation = self.typesystem.get_type(TYPE_NAME_DOCUMENT_ANNOTATION)()
self.add(document_annotation)
return document_annotation

@property
def sofas(self) -> List[Sofa]:
"""Finds all sofas that this CAS manages
Expand Down Expand Up @@ -598,6 +615,24 @@ def sofa_array(self, value):
"""
self.get_sofa().sofaArray = value

@property
def document_language(self) -> str:
"""The document language contains the language code for the document.

Returns: The document language.

"""
return self.get_document_annotation().get(FEATURE_BASE_NAME_LANGUAGE)

@document_language.setter
def document_language(self, value) -> str:
"""Sets document language.

Args:
value: The document language
"""
self.get_document_annotation().set(FEATURE_BASE_NAME_LANGUAGE, value)

def to_xmi(self, path: Union[str, Path, None] = None, pretty_print: bool = False) -> Optional[str]:
"""Creates a XMI representation of this CAS.

Expand Down
8 changes: 8 additions & 0 deletions tests/test_cas.py
Original file line number Diff line number Diff line change
Expand Up @@ -127,6 +127,14 @@ def test_sofa_string_and_mime_type_can_be_set_using_constructor():
assert cas.sofa_mime == "text/html"


def test_document_language_can_be_set_using_constructor():
cas = Cas(sofa_string="Ich bin ein test!", document_language="de")

assert cas.sofa_string == "Ich bin ein test!"
assert cas.sofa_mime == "text/plain"
assert cas.document_language == "de"


# Select


Expand Down
6 changes: 4 additions & 2 deletions tests/test_documentation.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,5 +10,7 @@ def test_readme_is_proper_rst():
with path_to_readme.open() as f:
rst = f.read()

errors = list(rstcheck.check(rst))
assert len(errors) == 0, "; ".join(str(e) for e in errors)
errors = [str(e) for e in list(rstcheck.check(rst))]
# https://github.com/rstcheck/rstcheck-core/issues/4
errors = [s for s in errors if not ("Hyperlink target" in s and "is not referenced." in s)]
assert len(errors) == 0, "; ".join(errors)
Loading