Skip to content

Commit

Permalink
[refactor] Remove frozen{list,dict,tags} classes (#12293)
Browse files Browse the repository at this point in the history
### Summary & Motivation

Internal companion PR: dagster-io/internal#5239

Relevant discussion:
dagster-io/internal#4859 (comment)

Removes the `frozenlist`, `frozendict` and `frozentags` classes.
Reasons:

- They don't play nicely with static type-checking and are a source of
type-checker diagnostic noise
- Static type-checking with Sequence/Mapping sort of solves the same
problem (this is kind of like removing runtime check calls from internal
methods)
- They are used inconsistently in the codebase (most places where we
have the intention of creating an immutable list/dict they are not
used). Where they are used, it is not at all obvious why at first
glance.
- They generally complicate the code

The main purpose the `frozen*` classes were serving in our code was to
make a few select `NamedTuple` classes hashable. If a `NamedTuple`
contains mutable collection members, its default hash function will
fail. Replacing those mutable collections with immutable ones lets the
hash succeed, which in turns lets the `NamedTuple` be cached via
`lru_cache` or used as a dict key.

The set of classes that need to be made hashable for `lru_cache`
purposes are:

- `CustomPointer`
- `AssetsDefinitionCacheableData`
- `RepositoryLoadData`
- `ReconstructableRepository`

These are part of a tree of objects rooted on `ReconstructablePipeline`,
which uses `lru_cache` in its `get_definition` method.

The above classes can be made hashable in a more legible and
type-annotation-friendly way by defining a `__hash__` method. This PR
does that-- wherever we have a `NamedTuple` that needed to be hashed,
`frozen*` instantiations in the constructor were removed and a
`__hash__` method was added. All of the `__hash__` methods are the same,
they just call `dagster._utils.hash_named_tuple` and cache the result.
This function just converts mutable collections to tuples prior to
hashing, which allows hashing to succeed as tuples are immutable.

Aside from cases where frozen classes were used to achieve `NamedTuple`
hashability, the other uses were:

- random? uses in some places where whoever wrote the code thought the
structure should be immutable. I removed these cases for the reasons
above (most of the time we want structures to be immutable and in most
of the codebase we signal this with `Sequence`/`Mapping`.
- A few other cases where dicts need to be hashable to sort them or use
them in a cache. For these, I provided simple substitute solutions that
achieve the same effect-- e.g. instead of using `frozendict`, converting
a dict to a tuple of 2-tuples before hashing.

---

- Resolves #3008
- Resolves #3641

### How I Tested These Changes

Existing test suite.
  • Loading branch information
smackesey authored Mar 23, 2023
1 parent 6ff1cd9 commit d5db43f
Show file tree
Hide file tree
Showing 35 changed files with 306 additions and 385 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -122,7 +122,7 @@ def _core_submit_execution(
run_config: Optional[Mapping[str, Any]] = None,
mode: Optional[str] = None,
preset: Optional[str] = None,
tags: Optional[Dict[str, Any]] = None,
tags: Optional[Mapping[str, str]] = None,
solid_selection: Optional[List[str]] = None,
is_using_job_op_graph_apis: Optional[bool] = False,
):
Expand Down
66 changes: 18 additions & 48 deletions python_modules/dagster/dagster/_check/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -245,11 +245,9 @@ def dict_param(
"""Ensures argument obj is a native Python dictionary, raises an exception if not, and otherwise
returns obj.
"""
from dagster._utils import frozendict

if not isinstance(obj, (frozendict, dict)):
if not isinstance(obj, dict):
raise _param_type_mismatch_exception(
obj, (frozendict, dict), param_name, additional_message=additional_message
obj, dict, param_name, additional_message=additional_message
)

if not (key_type or value_type):
Expand All @@ -268,12 +266,8 @@ def opt_dict_param(
"""Ensures argument obj is either a dictionary or None; if the latter, instantiates an empty
dictionary.
"""
from dagster._utils import frozendict

if obj is not None and not isinstance(obj, (frozendict, dict)):
raise _param_type_mismatch_exception(
obj, (frozendict, dict), param_name, additional_message
)
if obj is not None and not isinstance(obj, dict):
raise _param_type_mismatch_exception(obj, dict, param_name, additional_message)

if not obj:
return {}
Expand Down Expand Up @@ -312,12 +306,8 @@ def opt_nullable_dict_param(
additional_message: Optional[str] = None,
) -> Optional[Dict]:
"""Ensures argument obj is either a dictionary or None."""
from dagster._utils import frozendict

if obj is not None and not isinstance(obj, (frozendict, dict)):
raise _param_type_mismatch_exception(
obj, (frozendict, dict), param_name, additional_message
)
if obj is not None and not isinstance(obj, dict):
raise _param_type_mismatch_exception(obj, dict, param_name, additional_message)

if not obj:
return None if obj is None else {}
Expand Down Expand Up @@ -361,17 +351,15 @@ def dict_elem(
value_type: Optional[TypeOrTupleOfTypes] = None,
additional_message: Optional[str] = None,
) -> Dict:
from dagster._utils import frozendict

dict_param(obj, "obj")
str_param(key, "key")

if key not in obj:
raise CheckError(f"{key} not present in dictionary {obj}")

value = obj[key]
if not isinstance(value, (frozendict, dict)):
raise _element_check_error(key, value, obj, (frozendict, dict), additional_message)
if not isinstance(value, dict):
raise _element_check_error(key, value, obj, dict, additional_message)
else:
return _check_mapping_entries(value, key_type, value_type, mapping_type=dict)

Expand All @@ -383,16 +371,14 @@ def opt_dict_elem(
value_type: Optional[TypeOrTupleOfTypes] = None,
additional_message: Optional[str] = None,
) -> Dict:
from dagster._utils import frozendict

dict_param(obj, "obj")
str_param(key, "key")

value = obj.get(key)

if value is None:
return {}
elif not isinstance(value, (frozendict, dict)):
elif not isinstance(value, dict):
raise _element_check_error(key, value, obj, dict, additional_message)
else:
return _check_mapping_entries(value, key_type, value_type, mapping_type=dict)
Expand All @@ -405,16 +391,14 @@ def opt_nullable_dict_elem(
value_type: Optional[TypeOrTupleOfTypes] = None,
additional_message: Optional[str] = None,
) -> Optional[Dict]:
from dagster._utils import frozendict

dict_param(obj, "obj")
str_param(key, "key")

value = obj.get(key)

if value is None:
return None
elif not isinstance(value, (frozendict, dict)):
elif not isinstance(value, dict):
raise _element_check_error(key, value, obj, dict, additional_message)
else:
return _check_mapping_entries(value, key_type, value_type, mapping_type=dict)
Expand Down Expand Up @@ -446,10 +430,8 @@ def is_dict(
value_type: Optional[TypeOrTupleOfTypes] = None,
additional_message: Optional[str] = None,
) -> Dict:
from dagster._utils import frozendict

if not isinstance(obj, (frozendict, dict)):
raise _type_mismatch_error(obj, (frozendict, dict), additional_message)
if not isinstance(obj, dict):
raise _type_mismatch_error(obj, dict, additional_message)

if not (key_type or value_type):
return obj
Expand Down Expand Up @@ -768,12 +750,8 @@ def list_param(
of_type: Optional[TypeOrTupleOfTypes] = None,
additional_message: Optional[str] = None,
) -> List[Any]:
from dagster._utils import frozenlist

if not isinstance(obj, (frozenlist, list)):
raise _param_type_mismatch_exception(
obj, (frozenlist, list), param_name, additional_message
)
if not isinstance(obj, list):
raise _param_type_mismatch_exception(obj, list, param_name, additional_message)

if not of_type:
return obj
Expand All @@ -793,12 +771,8 @@ def opt_list_param(
If the of_type argument is provided, also ensures that list items conform to the type specified
by of_type.
"""
from dagster._utils import frozenlist

if obj is not None and not isinstance(obj, (frozenlist, list)):
raise _param_type_mismatch_exception(
obj, (frozenlist, list), param_name, additional_message
)
if obj is not None and not isinstance(obj, list):
raise _param_type_mismatch_exception(obj, list, param_name, additional_message)

if not obj:
return []
Expand Down Expand Up @@ -840,12 +814,8 @@ def opt_nullable_list_param(
If the of_type argument is provided, also ensures that list items conform to the type specified
by of_type.
"""
from dagster._utils import frozenlist

if obj is not None and not isinstance(obj, (frozenlist, list)):
raise _param_type_mismatch_exception(
obj, (frozenlist, list), param_name, additional_message
)
if obj is not None and not isinstance(obj, list):
raise _param_type_mismatch_exception(obj, list, param_name, additional_message)

if not obj:
return None if obj is None else []
Expand Down
12 changes: 5 additions & 7 deletions python_modules/dagster/dagster/_config/post_process.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
from typing import Any, Dict, List, Mapping, Optional, cast

import dagster._check as check
from dagster._utils import ensure_single_item, frozendict, frozenlist
from dagster._utils import ensure_single_item
from dagster._utils.error import serializable_error_info_from_exc_info

from .config_type import ConfigType, ConfigTypeKind
Expand Down Expand Up @@ -122,7 +122,7 @@ def _recurse_in_to_selector(
else incoming_field_value,
)
if field_evr.success:
return EvaluateValueResult.for_value(frozendict({field_name: field_evr.value}))
return EvaluateValueResult.for_value({field_name: field_evr.value})

return field_evr

Expand Down Expand Up @@ -183,7 +183,7 @@ def _recurse_in_to_shape(
return EvaluateValueResult.for_errors(errors)

return EvaluateValueResult.for_value(
frozendict({key: result.value for key, result in processed_fields.items()})
{key: result.value for key, result in processed_fields.items()}
)


Expand All @@ -210,7 +210,7 @@ def _recurse_in_to_array(context: TraversalContext, config_value: Any) -> Evalua
if errors:
return EvaluateValueResult.for_errors(errors)

return EvaluateValueResult.for_value(frozenlist([result.value for result in results]))
return EvaluateValueResult.for_value([result.value for result in results])


def _recurse_in_to_map(context: TraversalContext, config_value: Any) -> EvaluateValueResult[Any]:
Expand Down Expand Up @@ -243,6 +243,4 @@ def _recurse_in_to_map(context: TraversalContext, config_value: Any) -> Evaluate
if errors:
return EvaluateValueResult.for_errors(errors)

return EvaluateValueResult.for_value(
frozendict({key: result.value for key, result in results.items()})
)
return EvaluateValueResult.for_value({key: result.value for key, result in results.items()})
10 changes: 5 additions & 5 deletions python_modules/dagster/dagster/_config/validate.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
from typing import Any, Dict, List, Mapping, Optional, Sequence, Set, TypeVar, cast

import dagster._check as check
from dagster._utils import ensure_single_item, frozendict
from dagster._utils import ensure_single_item

from .config_type import ConfigScalarKind, ConfigType, ConfigTypeKind
from .errors import (
Expand Down Expand Up @@ -210,7 +210,7 @@ def validate_selector_config(

if child_evaluate_value_result.success:
return EvaluateValueResult.for_value( # type: ignore
frozendict({field_name: child_evaluate_value_result.value})
{field_name: child_evaluate_value_result.value}
)
else:
return child_evaluate_value_result
Expand Down Expand Up @@ -289,7 +289,7 @@ def _validate_shape_config(
if errors:
return EvaluateValueResult.for_errors(errors)
else:
return EvaluateValueResult.for_value(frozendict(config_value)) # type: ignore
return EvaluateValueResult.for_value(config_value) # type: ignore


def validate_permissive_shape_config(
Expand All @@ -304,7 +304,7 @@ def validate_permissive_shape_config(

def validate_map_config(
context: ValidationContext, config_value: object
) -> EvaluateValueResult[Mapping[str, object]]:
) -> EvaluateValueResult[Mapping[object, object]]:
check.inst_param(context, "context", ValidationContext)
check.invariant(context.config_type_snap.kind == ConfigTypeKind.MAP)
check.not_none_param(config_value, "config_value")
Expand All @@ -325,7 +325,7 @@ def validate_map_config(
if not result.success:
errors += cast(List, result.errors)

return EvaluateValueResult(not bool(errors), frozendict(config_value), errors)
return EvaluateValueResult(not bool(errors), config_value, errors)


def validate_shape_config(
Expand Down
19 changes: 11 additions & 8 deletions python_modules/dagster/dagster/_core/code_pointer.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
from dagster._core.errors import DagsterImportError, DagsterInvariantViolationError
from dagster._serdes import whitelist_for_serdes
from dagster._seven import get_import_error_message, import_module_from_path
from dagster._utils import alter_sys_path, frozenlist
from dagster._utils import alter_sys_path, hash_collection


class CodePointer(ABC):
Expand Down Expand Up @@ -296,13 +296,6 @@ def __new__(
),
)

# These are frozenlists, rather than lists, so that they can be hashed and the pointer
# stored in the lru_cache on the repository and pipeline get_definition methods
reconstructable_args = frozenlist(reconstructable_args)
reconstructable_kwargs = frozenlist(
[frozenlist(reconstructable_kwarg) for reconstructable_kwarg in reconstructable_kwargs]
)

return super(CustomPointer, cls).__new__(
cls,
reconstructor_pointer,
Expand All @@ -321,3 +314,13 @@ def describe(self) -> str:
return "reconstructable using {module}.{fn_name}".format(
module=self.reconstructor_pointer.module, fn_name=self.reconstructor_pointer.fn_name
)

# Allow this to be hashed for use in `lru_cache`. This is needed because:
# - `ReconstructablePipeline` uses `lru_cache`
# - `ReconstructablePipeline` has a `ReconstructableRepository` attribute
# - `ReconstructableRepository` has a `CodePointer` attribute
# - `CustomCodePointer` has collection attributes that are unhashable by default
def __hash__(self) -> int:
if not hasattr(self, "_hash"):
self._hash = hash_collection(self)
return self._hash
Loading

2 comments on commit d5db43f

@github-actions
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deploy preview for dagster ready!

✅ Preview
https://dagster-ezrv2zw0u-elementl.vercel.app

Built with commit d5db43f.
This pull request is being automatically deployed with vercel-action

@github-actions
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deploy preview for dagit-storybook ready!

✅ Preview
https://dagit-storybook-q7rfgew01-elementl.vercel.app

Built with commit d5db43f.
This pull request is being automatically deployed with vercel-action

Please sign in to comment.