Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Python types to declare document fields #1845

Merged
merged 12 commits into from
Jun 21, 2024
61 changes: 41 additions & 20 deletions docs/persistence.rst
Original file line number Diff line number Diff line change
Expand Up @@ -114,9 +114,14 @@ Here are some simple examples:
from typing import Optional

class Post(Document):
title: str # same as Text(required=True)
created_at: Optional[datetime] # same as Date(required=False)
published: bool # same as Boolean(required=True)
title: str # same as title = Text(required=True)
created_at: Optional[datetime] # same as created_at = Date(required=False)
published: bool # same as published = Boolean(required=True)

It is important to note that when using ``Field`` subclasses such as ``Text``,
``Date`` and ``Boolean``, they must be given in the right-side of an assignment,
as shown in examples above. Using these classes as type hints will result in
errors.

Python types are mapped to their corresponding field type according to the
following table:
Expand All @@ -140,10 +145,14 @@ following table:
- ``Date(required=True)``
* - ``date``
- ``Date(format="yyyy-MM-dd", required=True)``

In addition to the above native types, a field can also be given a type hint
of an ``InnerDoc`` subclass, in which case it becomes an ``Object`` field of
that class. When the ``InnerDoc`` subclass is wrapped with ``List``, a
* - ``InnerDocSubclass``
- ``Object(InnerDocSubclass)``
* - ``List(InnerDocSubclass)``
- ``Nested(InnerDocSubclass)``

As noted in the last two rows of the table, a field can also be given a type
hint of an ``InnerDoc`` subclass, in which case it becomes an ``Object`` field
of that class. When the ``InnerDoc`` subclass is wrapped with ``List``, a
``Nested`` field is created instead.

.. code:: python
Expand All @@ -157,38 +166,40 @@ that class. When the ``InnerDoc`` subclass is wrapped with ``List``, a
...

class Post(Document):
address: Address # same as Object(Address)
comments: List[Comment] # same as Nested(Comment)
address: Address # same as address = Object(Address)
comments: List[Comment] # same as comments = Nested(Comment)

Unfortunately it is impossible to have Python type hints that uniquely
identify every possible Elasticsearch field type. To choose a field type that
is different thant the ones in the table above, the field instance can be added
is different than the ones in the table above, the field instance can be added
explicitly as a right-side assignment in the field declaration. The next
example creates a field that is typed as ``str``, but is mapped to ``Keyword``
instead of ``Text``:

.. code:: python

class MyDocument(Document):
category: str = Keyword()
category: str = Keyword(required=True)

This form can also be used when additional options need to be given to
initialize the field, such as when using custom analyzer settings:

.. code:: python

class Comment(InnerDoc):
content: str = Text(analyzer='snowball')
content: str = Text(analyzer='snowball', required=True)
miguelgrinberg marked this conversation as resolved.
Show resolved Hide resolved

The standard ``Optional`` modifier from the Python ``typing`` package can be
used to change a typed field from required to optional. The ``List`` modifier
can be added to a field to convert it to an array, similar to using the
``multi=True`` argument on the field object.

When using type hints as above, subclasses of ``Document`` and ``InnerDoc``
inherit some of the behaviors associated with Python dataclasses. To add
per-field dataclass options such as ``default`` or ``default_factory`` , the
``mapped_field()`` wrapper can be used on the right side of a typed field
inherit some of the behaviors associated with Python dataclasses, as defined by
`PEP 681 <https://peps.python.org/pep-0681/>`_ and the
`dataclass_transform decorator <https://typing.readthedocs.io/en/latest/spec/dataclasses.html#dataclass-transform>`_.
To add per-field dataclass options such as ``default`` or ``default_factory``,
the ``mapped_field()`` wrapper can be used on the right side of a typed field
declaration:

.. code:: python
Expand All @@ -197,7 +208,11 @@ declaration:
title: str = mapped_field(default="no title")
created_at: datetime = mapped_field(default_factory=datetime.now)
published: bool = mapped_field(default=False)
category: str = mapped_field(Keyword(), default="general")
category: str = mapped_field(Keyword(required=True), default="general")

When using the ``mapped_field()`` wrapper function, an explicit field type
instance can be passed as a first positional argument, as the ``category``
field does in the example above.

Static type checkers such as `mypy <https://mypy-lang.org/>`_ and
`pyright <https://github.com/microsoft/pyright>`_ can use the type hints and
Expand All @@ -210,15 +225,15 @@ using fields as class attributes. Consider the following example:
.. code:: python

class MyDocument(Document):
title: str = mapped_field(default="no title")
title: str

doc = MyDocument()
# doc.title is typed as "str" (correct)
# MyDocument.title is also typed as "str" (incorrect)

To help type checkers correctly identify class attributes as such, the ``M``
generic must be used as a wrapper to the type hint, as shown in the next
example:
examples:

.. code:: python

Expand All @@ -230,10 +245,13 @@ example:

doc = MyDocument()
# doc.title is typed as "str"
# doc.created_at is typed as "datetime"
# MyDocument.title is typed as "InstrumentedField"
# MyDocument.created_at is typed as "InstrumentedField"

Note that the ``M`` type hint does not provide any runtime behavior, it just
provides additional typing declarations for type checkers.
Note that the ``M`` type hint does not provide any runtime behavior and its use
is not required, but it can be useful to eliminate spurious type errors in IDEs
or type checking builds.

The ``InstrumentedField`` objects returned when fields are accessed as class
attributes are proxies for the field instances that can be used anywhere a
Expand All @@ -245,6 +263,9 @@ field needs to be referenced, such as when specifying sort options in a
# sort by creation date descending, and title ascending
s = MyDocument.search().sort(-MyDocument.created_at, MyDocument.title)

When specifying sorting order, the ``+`` and ``-`` unary operators can be used
on the class field attributes to indicate ascending and descending order.

Note on dates
~~~~~~~~~~~~~

Expand Down
16 changes: 9 additions & 7 deletions elasticsearch_dsl/document_base.py
Original file line number Diff line number Diff line change
Expand Up @@ -62,14 +62,16 @@ def __init__(self, name, field):
self._field = field

def __getattr__(self, attr):
f = None
try:
f = self._field[attr]
except KeyError:
pass
if isinstance(f, Field):
return InstrumentedField(f"{self._name}.{attr}", f)
return getattr(self._field, attr)
# first let's see if this is an attribute of this object
return super().__getattribute__(attr)
except AttributeError:
try:
# next we see if we have a sub-field with this name
return InstrumentedField(f"{self._name}.{attr}", self._field[attr])
except KeyError:
# lastly we let the wrapped field resolve this attribute
return getattr(self._field, attr)

def __pos__(self):
"""Return the field name representation for ascending sort order"""
Expand Down
62 changes: 62 additions & 0 deletions tests/_async/test_document.py
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@
mapped_field,
utils,
)
from elasticsearch_dsl.document_base import InstrumentedField
from elasticsearch_dsl.exceptions import IllegalOperation, ValidationException


Expand Down Expand Up @@ -755,3 +756,64 @@ class TypedDoc(AsyncDocument):

s = TypedDoc.search().sort(TypedDoc.st, -TypedDoc.dt, +TypedDoc.ob.st)
assert s.to_dict() == {"sort": ["st", {"dt": {"order": "desc"}}, "ob.st"]}


def test_instrumented_field():
class Child(InnerDoc):
st: M[str]

class Doc(AsyncDocument):
st: str
ob: Child
ns: List[Child]

doc = Doc(
st="foo",
ob=Child(st="bar"),
ns=[
Child(st="baz"),
Child(st="qux"),
],
)

assert type(doc.st) is str
assert doc.st == "foo"

assert type(doc.ob) is Child
assert doc.ob.st == "bar"

assert type(doc.ns) is utils.AttrList
assert doc.ns[0].st == "baz"
assert doc.ns[1].st == "qux"
assert type(doc.ns[0]) is Child
assert type(doc.ns[1]) is Child

assert type(Doc.st) is InstrumentedField
assert str(Doc.st) == "st"
assert +Doc.st == "st"
assert -Doc.st == "-st"
assert Doc.st.to_dict() == {"type": "text"}
with raises(AttributeError):
Doc.st.something

assert type(Doc.ob) is InstrumentedField
assert str(Doc.ob) == "ob"
assert str(Doc.ob.st) == "ob.st"
assert +Doc.ob.st == "ob.st"
assert -Doc.ob.st == "-ob.st"
assert Doc.ob.st.to_dict() == {"type": "text"}
with raises(AttributeError):
Doc.ob.something
with raises(AttributeError):
Doc.ob.st.something

assert type(Doc.ns) is InstrumentedField
assert str(Doc.ns) == "ns"
assert str(Doc.ns.st) == "ns.st"
assert +Doc.ns.st == "ns.st"
assert -Doc.ns.st == "-ns.st"
assert Doc.ns.st.to_dict() == {"type": "text"}
with raises(AttributeError):
Doc.ns.something
with raises(AttributeError):
Doc.ns.st.something
62 changes: 62 additions & 0 deletions tests/_sync/test_document.py
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@
mapped_field,
utils,
)
from elasticsearch_dsl.document_base import InstrumentedField
from elasticsearch_dsl.exceptions import IllegalOperation, ValidationException


Expand Down Expand Up @@ -755,3 +756,64 @@ class TypedDoc(Document):

s = TypedDoc.search().sort(TypedDoc.st, -TypedDoc.dt, +TypedDoc.ob.st)
assert s.to_dict() == {"sort": ["st", {"dt": {"order": "desc"}}, "ob.st"]}


def test_instrumented_field():
class Child(InnerDoc):
st: M[str]

class Doc(Document):
st: str
ob: Child
ns: List[Child]

doc = Doc(
st="foo",
ob=Child(st="bar"),
ns=[
Child(st="baz"),
Child(st="qux"),
],
)

assert type(doc.st) is str
assert doc.st == "foo"

assert type(doc.ob) is Child
assert doc.ob.st == "bar"

assert type(doc.ns) is utils.AttrList
assert doc.ns[0].st == "baz"
assert doc.ns[1].st == "qux"
assert type(doc.ns[0]) is Child
assert type(doc.ns[1]) is Child

assert type(Doc.st) is InstrumentedField
assert str(Doc.st) == "st"
assert +Doc.st == "st"
assert -Doc.st == "-st"
assert Doc.st.to_dict() == {"type": "text"}
with raises(AttributeError):
Doc.st.something

assert type(Doc.ob) is InstrumentedField
assert str(Doc.ob) == "ob"
assert str(Doc.ob.st) == "ob.st"
assert +Doc.ob.st == "ob.st"
assert -Doc.ob.st == "-ob.st"
assert Doc.ob.st.to_dict() == {"type": "text"}
with raises(AttributeError):
Doc.ob.something
with raises(AttributeError):
Doc.ob.st.something

assert type(Doc.ns) is InstrumentedField
assert str(Doc.ns) == "ns"
assert str(Doc.ns.st) == "ns.st"
assert +Doc.ns.st == "ns.st"
assert -Doc.ns.st == "-ns.st"
assert Doc.ns.st.to_dict() == {"type": "text"}
with raises(AttributeError):
Doc.ns.something
with raises(AttributeError):
Doc.ns.st.something
Loading