Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Python types to declare document fields #1845

Merged
merged 12 commits into from
Jun 21, 2024
175 changes: 169 additions & 6 deletions docs/persistence.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ layer for your application.
For more comprehensive examples have a look at the examples_ directory in the
repository.

.. _examples: https://github.com/elastic/elasticsearch-dsl-py/tree/master/examples
.. _examples: https://github.com/elastic/elasticsearch-dsl-py/tree/main/examples

.. _doc_type:

Expand Down Expand Up @@ -66,14 +66,14 @@ settings in elasticsearch (see :ref:`life-cycle` for details).
Data types
~~~~~~~~~~

The ``Document`` instances should be using native python types like
The ``Document`` instances use native python types like ``str`` and
``datetime``. In case of ``Object`` or ``Nested`` fields an instance of the
``InnerDoc`` subclass should be used just like in the ``add_comment`` method in
the above example where we are creating an instance of the ``Comment`` class.
``InnerDoc`` subclass is used, as in the ``add_comment`` method in the above
example where we are creating an instance of the ``Comment`` class.

There are some specific types that were created as part of this library to make
working with specific field types easier, for example the ``Range`` object used
in any of the `range fields
working with some field types easier, for example the ``Range`` object used in
any of the `range fields
<https://www.elastic.co/guide/en/elasticsearch/reference/current/range.html>`_:

.. code:: python
Expand Down Expand Up @@ -103,6 +103,169 @@ in any of the `range fields
# empty range is unbounded
Range().lower # None, False

Python Type Hints
~~~~~~~~~~~~~~~~~

Document fields can be defined using standard Python type hints if desired.
Here are some simple examples:

.. code:: python

from typing import Optional

class Post(Document):
title: str # same as title = Text(required=True)
created_at: Optional[datetime] # same as created_at = Date(required=False)
published: bool # same as published = Boolean(required=True)

It is important to note that when using ``Field`` subclasses such as ``Text``,
``Date`` and ``Boolean``, they must be given in the right-side of an assignment,
as shown in examples above. Using these classes as type hints will result in
errors.

Python types are mapped to their corresponding field type according to the
following table:

.. list-table:: Python type to DSL field mappings
:header-rows: 1

* - Python type
- DSL field
* - ``str``
- ``Text(required=True)``
* - ``bool``
- ``Boolean(required=True)``
* - ``int``
- ``Integer(required=True)``
* - ``float``
- ``Float(required=True)``
* - ``bytes``
- ``Binary(required=True)``
* - ``datetime``
- ``Date(required=True)``
* - ``date``
- ``Date(format="yyyy-MM-dd", required=True)``
Comment on lines +129 to +147
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very helpful, should we also add List and Optional?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call, I'll add them to the table.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you decide against adding Optional?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't. Sorry, I added List, but left out Optional. I'm adding it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, so I tried a couple of different formats and found that growing the table makes it too long and more difficult to visually parse. So in the end I kept the table small with just the required variants of the Python native types, and right after the table I show examples of Optional, List, Object and Nested. Let me know what you think.

* - ``InnerDocSubclass``
- ``Object(InnerDocSubclass)``
* - ``List(InnerDocSubclass)``
- ``Nested(InnerDocSubclass)``

As noted in the last two rows of the table, a field can also be given a type
hint of an ``InnerDoc`` subclass, in which case it becomes an ``Object`` field
of that class. When the ``InnerDoc`` subclass is wrapped with ``List``, a
``Nested`` field is created instead.

.. code:: python

from typing import List

class Address(InnerDoc):
...

class Comment(InnerDoc):
...

class Post(Document):
address: Address # same as address = Object(Address)
comments: List[Comment] # same as comments = Nested(Comment)

Unfortunately it is impossible to have Python type hints that uniquely
identify every possible Elasticsearch field type. To choose a field type that
is different than the ones in the table above, the field instance can be added
explicitly as a right-side assignment in the field declaration. The next
example creates a field that is typed as ``str``, but is mapped to ``Keyword``
instead of ``Text``:

.. code:: python

class MyDocument(Document):
category: str = Keyword(required=True)

This form can also be used when additional options need to be given to
initialize the field, such as when using custom analyzer settings:

.. code:: python

class Comment(InnerDoc):
content: str = Text(analyzer='snowball', required=True)

The standard ``Optional`` modifier from the Python ``typing`` package can be
used to change a typed field from required to optional. The ``List`` modifier
can be added to a field to convert it to an array, similar to using the
``multi=True`` argument on the field object.

When using type hints as above, subclasses of ``Document`` and ``InnerDoc``
inherit some of the behaviors associated with Python dataclasses, as defined by
`PEP 681 <https://peps.python.org/pep-0681/>`_ and the
`dataclass_transform decorator <https://typing.readthedocs.io/en/latest/spec/dataclasses.html#dataclass-transform>`_.
To add per-field dataclass options such as ``default`` or ``default_factory``,
the ``mapped_field()`` wrapper can be used on the right side of a typed field
declaration:

.. code:: python

class MyDocument(Document):
title: str = mapped_field(default="no title")
created_at: datetime = mapped_field(default_factory=datetime.now)
published: bool = mapped_field(default=False)
category: str = mapped_field(Keyword(required=True), default="general")

When using the ``mapped_field()`` wrapper function, an explicit field type
instance can be passed as a first positional argument, as the ``category``
field does in the example above.

Static type checkers such as `mypy <https://mypy-lang.org/>`_ and
`pyright <https://github.com/microsoft/pyright>`_ can use the type hints and
the dataclass-specific options added to the ``mapped_field()`` function to
improve type inference and provide better real-time suggestions in IDEs.

One situation in which type checkers can't infer the correct type is when
using fields as class attributes. Consider the following example:

.. code:: python

class MyDocument(Document):
title: str

doc = MyDocument()
# doc.title is typed as "str" (correct)
# MyDocument.title is also typed as "str" (incorrect)

To help type checkers correctly identify class attributes as such, the ``M``
generic must be used as a wrapper to the type hint, as shown in the next
examples:

.. code:: python

from elasticsearch_dsl import M

class MyDocument(Document):
title: M[str]
created_at: M[datetime] = mapped_field(default_factory=datetime.now)

doc = MyDocument()
# doc.title is typed as "str"
# doc.created_at is typed as "datetime"
# MyDocument.title is typed as "InstrumentedField"
# MyDocument.created_at is typed as "InstrumentedField"

Note that the ``M`` type hint does not provide any runtime behavior and its use
is not required, but it can be useful to eliminate spurious type errors in IDEs
or type checking builds.

The ``InstrumentedField`` objects returned when fields are accessed as class
attributes are proxies for the field instances that can be used anywhere a
field needs to be referenced, such as when specifying sort options in a
``Search`` object:

.. code:: python

# sort by creation date descending, and title ascending
s = MyDocument.search().sort(-MyDocument.created_at, MyDocument.title)
miguelgrinberg marked this conversation as resolved.
Show resolved Hide resolved

When specifying sorting order, the ``+`` and ``-`` unary operators can be used
on the class field attributes to indicate ascending and descending order.

Note on dates
~~~~~~~~~~~~~

Expand Down
4 changes: 3 additions & 1 deletion elasticsearch_dsl/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@
from .aggs import A
from .analysis import analyzer, char_filter, normalizer, token_filter, tokenizer
from .document import AsyncDocument, Document
from .document_base import InnerDoc, MetaField
from .document_base import InnerDoc, M, MetaField, mapped_field
from .exceptions import (
ElasticsearchDslException,
IllegalOperation,
Expand Down Expand Up @@ -148,6 +148,7 @@
"Keyword",
"Long",
"LongRange",
"M",
"Mapping",
"MetaField",
"MultiSearch",
Expand Down Expand Up @@ -178,6 +179,7 @@
"char_filter",
"connections",
"construct_field",
"mapped_field",
"normalizer",
"token_filter",
"tokenizer",
Expand Down
4 changes: 3 additions & 1 deletion elasticsearch_dsl/_async/document.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,10 +18,11 @@
import collections.abc

from elasticsearch.exceptions import NotFoundError, RequestError
from typing_extensions import dataclass_transform

from .._async.index import AsyncIndex
from ..async_connections import get_connection
from ..document_base import DocumentBase, DocumentMeta
from ..document_base import DocumentBase, DocumentMeta, mapped_field
from ..exceptions import IllegalOperation
from ..utils import DOC_META_FIELDS, META_FIELDS, merge
from .search import AsyncSearch
Expand Down Expand Up @@ -62,6 +63,7 @@ def construct_index(cls, opts, bases):
return i


@dataclass_transform(field_specifiers=(mapped_field,))
class AsyncDocument(DocumentBase, metaclass=AsyncIndexMeta):
"""
Model-like class for persisting documents in elasticsearch.
Expand Down
4 changes: 3 additions & 1 deletion elasticsearch_dsl/_sync/document.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,10 +18,11 @@
import collections.abc

from elasticsearch.exceptions import NotFoundError, RequestError
from typing_extensions import dataclass_transform

from .._sync.index import Index
from ..connections import get_connection
from ..document_base import DocumentBase, DocumentMeta
from ..document_base import DocumentBase, DocumentMeta, mapped_field
from ..exceptions import IllegalOperation
from ..utils import DOC_META_FIELDS, META_FIELDS, merge
from .search import Search
Expand Down Expand Up @@ -60,6 +61,7 @@ def construct_index(cls, opts, bases):
return i


@dataclass_transform(field_specifiers=(mapped_field,))
class Document(DocumentBase, metaclass=IndexMeta):
"""
Model-like class for persisting documents in elasticsearch.
Expand Down
Loading
Loading