feat: Content.form_with_key_path() #3311

jpivarski · 2024-11-20T17:52:41Z

As it stands, this PR lacks tests, but the new feature is:

>>> array = ak.Array([{"event": 123, "muons": [{"pt": 1.1}, {"pt": 2.2}]}])
>>> print(array.layout.form_with_key_path())
{
    "class": "RecordArray",
    "fields": [
        "event",
        "muons"
    ],
    "contents": [
        {
            "class": "NumpyArray",
            "primitive": "int64",
            "form_key": "('event',)"
        },
        {
            "class": "ListOffsetArray",
            "offsets": "i64",
            "content": {
                "class": "RecordArray",
                "fields": [
                    "pt"
                ],
                "contents": [
                    {
                        "class": "NumpyArray",
                        "primitive": "float64",
                        "form_key": "('muons', None, 'pt')"
                    }
                ],
                "form_key": "('muons', None)"
            },
            "form_key": "('muons',)"
        }
    ],
    "form_key": "()"
}

These form keys can be turned back into tuples with ast.literal_eval to get a full path from root to each node.

The new function takes root as an argument to put a prefix in the tuple, for uniqueness beyond the tree.

Maybe those None values should optionally be 0? They're needed for nodes to be unique, so that the form keys of this list are not all identical:

>>> array = ak.Array([[[[[1]]]]])
>>> print(array.layout.form_with_key_path())
{
    "class": "ListOffsetArray",
    "offsets": "i64",
    "content": {
        "class": "ListOffsetArray",
        "offsets": "i64",
        "content": {
            "class": "ListOffsetArray",
            "offsets": "i64",
            "content": {
                "class": "ListOffsetArray",
                "offsets": "i64",
                "content": {
                    "class": "NumpyArray",
                    "primitive": "int64",
                    "form_key": "(None, None, None, None)"
                },
                "form_key": "(None, None, None)"
            },
            "form_key": "(None, None)"
        },
        "form_key": "(None,)"
    },
    "form_key": "()"
}

But maybe we don't care about those extra 3 characters per nesting level. How much nesting is there going to be, anyway? Yeah, I think this is fine.

jpivarski · 2024-11-20T17:55:43Z

The type hint is wrong. One moment...

pfackeldey · 2024-11-21T09:09:08Z

This is great 🎉, and would help me with dask-contrib/dask-awkward#551 significantly!

Btw, I think 0 instead of None could be useful, because it allows to use the ast.literal_eval'd form_key directly for indexing into the corresponding buffer:

array[ast.literal_eval("('muons', 0, 'pt')")]
# >> <Array [1.1, 2.2] type='2 * float64'>

which is something that I'd like to do in dask-contrib/dask-awkward#551 for translating touched form_keys of a type tracer report into slices that I can use for explicit touching at a later stage (i.e. in a second function pass), something like:

def render_buffer_key(form: ak.forms.Form, form_key: str, attribute: str) -> str:
  return f"{form_key}"
  
tt, report = ak.typetracer.typetracer_with_report(
  array.layout.form_with_key_path(), 
  highlevel=True,
  buffer_key=render_buffer_key,
)

def fun(x):
  return x["muons"][0]["pt"]

_ = fun(tt)

...

# use the report later to touch the columns explicitly again,
# but don't run the full `fun` pass again
# ~something like:
if ak.backend(array) == "typetracer":
  for form_key in set(report.data_touched):
    array[ast.literal_eval(form_key)].layout._touch_data(recursive=True)
  
  for form_key in set(report.shape_touched):
    array[ast.literal_eval(form_key)].layout._touch_shape(recursive=True)
else:
  fun(array)

(PS: this PR reminds me of https://jax.readthedocs.io/en/latest/working-with-pytrees.html#explicit-key-paths. )

jpivarski · 2024-11-21T15:37:10Z

array[ast.literal_eval("('muons', 0, 'pt')")]
# >> <Array [1.1, 2.2] type='2 * float64'>

Careful: that only works if the lists are not empty. Also, what about the use of None for option-type and untyped indexes? Or the use of integers for UnionArrays (which are necessary, since UnionArrays branch). Similarly, tuple RecordArrays (RecordArrays in which fields is None) will need some sort of integer.

It won't be possible to use these as slices in general. But for ast.literal_eval, any constants or tuples/lists/dicts of constants are allowed:

>>> ast.literal_eval("('string', None, True, False, ['nested', {'key': 'value'}])")
('string', None, True, False, ['nested', {'key': 'value'}])

But Ellipsis/... are not. In the Python syntax, that qualifies as a ast.Name, rather than a constant known to the parser. Also, range is not allowed.

Any list could take [] (instead of None) as a slice item, since that's an empty slice. But there's no solution for IndexedArray, which needs something in the form_key to distinguish the IndexedArray from its own content, but this distinction is invisible to slicing.

pfackeldey · 2024-11-22T10:52:27Z

Hi @jpivarski,
I pushed the following updates to this PR: fixed type hints, enforced string-type path entries for records only, and added tests for each array type.
From my side this is ready to merge, what do you think?

feat: Content.form_with_key_path()

0dd0e73

jpivarski mentioned this pull request Nov 20, 2024

Using . as a delimiter in field names of ak.Records leads to wrong report of necessary columns dask-contrib/dask-awkward#552

Open

fixed type hint (tests would have caught it)

550c237

jpivarski temporarily deployed to docs November 20, 2024 18:07 — with GitHub Actions Inactive

pfackeldey and others added 5 commits November 22, 2024 09:46

gitignore: add ropeproject (IDE)

6007156

fix typing & enforce tuple record field names as strings

05a0fd2

style: pre-commit fixes

0f8ad9e

forgot one type hint...

24c0ffb

fix typo and sort import

5246841

pfackeldey temporarily deployed to docs November 22, 2024 09:06 — with GitHub Actions Inactive

add tests for for each array type

2ee9feb

pfackeldey deployed to docs November 22, 2024 09:58 — with GitHub Actions View deployment

jpivarski merged commit e359210 into main Nov 22, 2024
41 checks passed

jpivarski deleted the jpivarski/form-with-key-path branch November 22, 2024 15:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Content.form_with_key_path() #3311

feat: Content.form_with_key_path() #3311

jpivarski commented Nov 20, 2024

jpivarski commented Nov 20, 2024

pfackeldey commented Nov 21, 2024 •

edited

Loading

jpivarski commented Nov 21, 2024

pfackeldey commented Nov 22, 2024

feat: Content.form_with_key_path() #3311

feat: Content.form_with_key_path() #3311

Conversation

jpivarski commented Nov 20, 2024

jpivarski commented Nov 20, 2024

pfackeldey commented Nov 21, 2024 • edited Loading

jpivarski commented Nov 21, 2024

pfackeldey commented Nov 22, 2024

pfackeldey commented Nov 21, 2024 •

edited

Loading