Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(DTO): Add DTO codegen backend #2388

Merged
merged 34 commits into from
Oct 7, 2023
Merged

feat(DTO): Add DTO codegen backend #2388

merged 34 commits into from
Oct 7, 2023

Conversation

provinzkraut
Copy link
Member

@provinzkraut provinzkraut commented Sep 29, 2023

Pull Request Checklist

Add a new _DTOCodegenBackend that aims to improve performance by generating optimised transfer functions ahead of time.

Some performance impressions:

method backend seconds
populate_from_raw standard 1.817
populate_from_raw codegen 0.487
populate_from_raw_collection standard 2.376
populate_from_raw_collection codegen 0.681
populate_from_builtins standard 1.772
populate_from_builtins codegen 0.470
populate_from_builtins_collection standard 2.407
populate_from_builtins_collection codegen 0.655
encode standard 1.539
encode codegen 0.285
encode_collection standard 1.897
encode_collection codegen 0.349

(times shown are for encoding the same data set 100,000 times)

What it does

  • Unrolling loops: Instead of iterating over the fields, instructions for each field are written out explicitly
  • Reduce function calls: Code of nested functions and for nested transfers is inlined into the main transfer function, reducing the number of overall function calls and assignments
  • Reduce branching: A lot of the branching done at runtime can be avoided since we have all the information beforehand (e.g. different paths for Mapping or regular object types, specific field types, which also involves avoid calling isinstance and friends)
  • Optimised field access: Wrapping field access in try/except block is faster if we expect no exception to be raised most of the time, and a hasattr/<field name> in <object> is faster for the average case where a field might not be present. Since we can guess which case we'll hit more frequently on average (by checking whether it's optional/excluded/UNSET/Optional[]) we can use the most performant method for every individual field
  • Zero cost renaming: Instead of checking which fields need to be renamed and using temporary values, renamed fields are assigned the same as regular fields, since all information is known beforehand
  • Zero cost exclusion: No code is generated for excluded fields, avoiding all checks related to this at runtime

Examples

This is the core loop of the regular DTO backend that we perform these operations for each field, every time we transfer data:

should_use_serialization_name = not override_serialization_name and is_data_field
source_name = field_definition.serialization_name if should_use_serialization_name else field_definition.name

if not is_data_field:
    if field_definition.is_excluded:
        continue
elif not (
    source_name in source_instance
    if isinstance(source_instance, Mapping)
    else hasattr(source_instance, source_name)
):
    continue

transfer_type = field_definition.transfer_type
destination_name = field_definition.name if is_data_field else field_definition.serialization_name
source_value = (
    source_instance[source_name]
    if isinstance(source_instance, Mapping)
    else getattr(source_instance, source_name)
)

if field_definition.is_partial and is_data_field and source_value is UNSET:
    continue

unstructured_data[destination_name] = _transfer_type_data(
    source_value=source_value,
    transfer_type=transfer_type,
    nested_as_dict=destination_type is dict,
    is_data_field=is_data_field,
    override_serialization_name=override_serialization_name,
)

Assuming our source is a Mapping, and we don't expect an optional value, we can generate the following code for this:

try:
    source_value = source_instance["some_name"]
    if source_value is not UNSET:
        unstructured_data["desitination_name"] = _transfer_type_data(source_value)
except KeyError:
    pass

The actual code will differ from this, as for example _transfer_type_data will be inlined as well with the appropriate function for the type and the if branch might be skipped as well, but this gives a good impression of what sort of things can be done.
Below is a full example of the generated code with some formatting and comments added.

Full example
def func(source_instance_0):
  unstructured_data_0 = {}
  
  # we have two main branches, one for a mapping and one for a regular object, 
  # since we accept both as input data. Having two big branches allows us to 
  # only perform this check once
  if isinstance(source_instance_0, Mapping):

      # we use try/except here since we expect the item to be present
      try:
          unstructured_data_0["a"] = source_instance_0["a"]
      except KeyError:
          pass

      # this is an inlined function; a transfer for another - nested - type
      try:

          unstructured_data_1 = {}

          # if we access this more than once, we assign it to a variable. we could do some 
          # fine tuning here to find the sweet spot where the assignment is cheaper than 
          # the lookup but this depends on the data type and varies case by case and between 
          # python versions, so I kept it fairly simple for now
          source_instance_0_nested_0 = source_instance_0["nested"]

          if isinstance(source_instance_0_nested_0, Mapping):

              try:
                  unstructured_data_1["a"] = source_instance_0_nested_0["a"]
              except KeyError:
                  pass

              try:
                  unstructured_data_1["b"] = source_instance_0_nested_0["b"]
              except KeyError:
                  pass

          else:
              try:
                  unstructured_data_1["a"] = source_instance_0_nested_0.a
              except AttributeError:
                  pass

              try:
                  unstructured_data_1["b"] = source_instance_0_nested_0.b
              except AttributeError:
                  pass

          # call the transfer model type with the kwargs we've built. In case the destination
          # type is a dict we well, we would actually skip this step and just do 
          # unstructured_data_0["nested"] = unstructured_data_1
          unstructured_data_0["nested"] = destination_type_1(**unstructured_data_1)

      except KeyError:
          pass

      try:
          # using a generator comprehension here for collections. in theory we could optimise
          # this for collections with a known size and unroll this loop as well
          unstructured_data_0["nested_list"] = origin_0(
              transfer_type_data_0(item) for item in source_instance_0["nested_list"]
          )
      except KeyError:
          pass

      try:
          unstructured_data_0["b"] = source_instance_0["b"]
      except KeyError:
          pass

      try:
          unstructured_data_0["c"] = origin_1(source_instance_0["c"])
      except KeyError:
          pass

      # using this check here because "optional" is optional, so we expect to hit the failure
      # case more frequently, which is more costly with a try/except 
      if "optional" in source_instance_0:
          unstructured_data_0["optional"] = source_instance_0["optional"]

  else:
      # this is the same as above, just for an object instead of a mapping

      try:
          unstructured_data_0["a"] = source_instance_0.a
      except AttributeError:
          pass

      try:
          unstructured_data_2 = {}
          source_instance_0_nested_1 = source_instance_0.nested

          if isinstance(source_instance_0_nested_1, Mapping):

              try:
                  unstructured_data_2["a"] = source_instance_0_nested_1["a"]
              except KeyError:
                  pass

              try:
                  unstructured_data_2["b"] = source_instance_0_nested_1["b"]
              except KeyError:
                  pass
          else:

              try:
                  unstructured_data_2["a"] = source_instance_0_nested_1.a
              except AttributeError:
                  pass

              try:
                  unstructured_data_2["b"] = source_instance_0_nested_1.b
              except AttributeError:
                  pass

          unstructured_data_0["nested"] = destination_type_2(**unstructured_data_2)

      except AttributeError:
          pass

      try:
          unstructured_data_0["nested_list"] = origin_2(
              transfer_type_data_1(item) for item in source_instance_0.nested_list
          )
      except AttributeError:
          pass

      try:
          unstructured_data_0["b"] = source_instance_0.b
      except AttributeError:
          pass

      try:
          unstructured_data_0["c"] = origin_3(source_instance_0.c)
      except AttributeError:
          pass

      if hasattr(source_instance_0, "optional"):
          unstructured_data_0["optional"] = source_instance_0.optional

  tmp_return_type_0 = destination_type_0(**unstructured_data_0)
  return tmp_return_type_0

Limitations

Inlining for nested collections isn't possible; Instead, for each collection a separate function is generated which will be called within a comprehension of the appropriate type. In theory this could still be optimised for collections of known size (e.g. tuples or collections that have a constraint on their length).

Drawbacks

This implementation is a bit harder to reason about than the regular backend and therefore might increase the maintenance effort.

provinzkraut and others added 29 commits September 29, 2023 15:09
Apply some micro-optimizations to _transfer_instance_data
Signed-off-by: Janek Nouvertné <25355197+provinzkraut@users.noreply.github.com>
…actored) (#2176)

'Refactored by Sourcery'

Co-authored-by: Sourcery AI <>
Signed-off-by: Janek Nouvertné <25355197+provinzkraut@users.noreply.github.com>
Signed-off-by: Janek Nouvertné <25355197+provinzkraut@users.noreply.github.com>
Signed-off-by: Janek Nouvertné <25355197+provinzkraut@users.noreply.github.com>
Signed-off-by: Janek Nouvertné <25355197+provinzkraut@users.noreply.github.com>
Signed-off-by: Janek Nouvertné <25355197+provinzkraut@users.noreply.github.com>
Signed-off-by: Janek Nouvertné <25355197+provinzkraut@users.noreply.github.com>
Signed-off-by: Janek Nouvertné <25355197+provinzkraut@users.noreply.github.com>
…efactored) (#2230)

'Refactored by Sourcery'

Co-authored-by: Sourcery AI <>
Signed-off-by: Janek Nouvertné <25355197+provinzkraut@users.noreply.github.com>
Signed-off-by: Janek Nouvertné <25355197+provinzkraut@users.noreply.github.com>
Signed-off-by: Janek Nouvertné <25355197+provinzkraut@users.noreply.github.com>
Signed-off-by: Janek Nouvertné <25355197+provinzkraut@users.noreply.github.com>
Signed-off-by: Janek Nouvertné <25355197+provinzkraut@users.noreply.github.com>
@provinzkraut provinzkraut requested a review from a team as a code owner September 29, 2023 14:03
@provinzkraut provinzkraut requested a review from a team as a code owner September 29, 2023 14:03
litestar/config/app.py Outdated Show resolved Hide resolved
litestar/dto/_codegen_backend.py Show resolved Hide resolved
litestar/dto/_codegen_backend.py Show resolved Hide resolved
litestar/dto/_codegen_backend.py Show resolved Hide resolved
litestar/dto/_codegen_backend.py Show resolved Hide resolved
litestar/dto/base_dto.py Show resolved Hide resolved
Signed-off-by: Janek Nouvertné <25355197+provinzkraut@users.noreply.github.com>
Signed-off-by: Janek Nouvertné <25355197+provinzkraut@users.noreply.github.com>
@cofin
Copy link
Member

cofin commented Oct 7, 2023

Given that these are hidden behind a feature flag, is there any reason not to move forward with this now?

Copy link
Member

@JacobCoffee JacobCoffee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dont trust me

@provinzkraut
Copy link
Member Author

Given that these are hidden behind a feature flag, is there any reason not to move forward with this now?

Actually no, you're right (=

@provinzkraut provinzkraut enabled auto-merge (squash) October 7, 2023 07:53
@sonarcloud
Copy link

sonarcloud bot commented Oct 7, 2023

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

99.1% 99.1% Coverage
0.0% 0.0% Duplication

@github-actions
Copy link

github-actions bot commented Oct 7, 2023

Documentation preview will be available shortly at https://litestar-org.github.io/litestar-docs-preview/2388

@provinzkraut provinzkraut merged commit 8874b03 into main Oct 7, 2023
19 checks passed
@provinzkraut provinzkraut deleted the dto-codegen branch October 7, 2023 08:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants