feature: collection's path iteration functions #154

RP-pl · 2024-04-12T09:38:56Z

No description provided.

Suor · 2024-04-16T11:18:52Z

What's the purpose or use case for this?

RP-pl · 2024-04-19T07:06:27Z

When working with really complicated data formats, it may be easier to filter all possible paths instead of selecting them.
For example, if we have data formated in a following way:

{
   "key1":[1,2,3,4]
   "key2":[11,22]
   "key3":5
   "key4":[6]
}

If we would want to select only second elements of arrays, it would be way more practical to filter the paths.

Suor · 2024-04-21T03:06:52Z

Wouldn't tree_nodes() or tree_leaves() do the job for you?

RP-pl · 2024-04-24T10:31:09Z

When using tree_leaves, how would you know that in the example above we are getting the second element from the last sequence? While using tree_nodes, it would make it possible, but it would make the code really ugly.

The difference between tree_leaves and get_end_paths is that while tree_leaves returns only the value of the path, get_end_paths returns the path to that value. In my opinion, these functions are fundamentally different in their concepts.

Suor · 2024-04-25T10:01:44Z

from funcy import tree_leaves, is_mapping, lcat


data = {
    "key1": [1,2,3,4],
    "key2": [11,22],
    "key3": 5,
    "key4": [6],
}

def every_2nd(data):
    leaves = tree_leaves(data, follow=is_mapping, children=lambda x: x.values())
    return lcat(l[1::2] for l in leaves if isinstance(l, list))

Like this. Or you need to be more precise about what you are trying to achieve.

I really don't understand how paths will help you though. Do you plan to use get_in() with each path to get a value? Because that sounds inefficient.

RP-pl · 2024-04-30T14:24:37Z

In the code you provided, we are getting every even element in the array, which is not exactly what I am trying to achieve with this example (it's really simple for demonstration purposes). What I am aiming for with this example is to retrieve every second element of arrays embedded in a dictionary. You are right about using get_in to retrieve the value, and you are also correct about it being inefficient. However, this may still be the cleanest way to handle much more complicated data structures. Notice that if our arrays were more deeply embedded in dictionaries, the code to retrieve the second element of the array using tree_leaves would look really messy. The inefficiency would also not be a problem when the data we are looking for is sparse (e.g., out of 10,000 records, only 10 have the second element in the array).

Suor · 2024-04-30T14:28:03Z

Still don't understand what you are trying to achieve and why tree_leaves() will be ugly, it doesn't depend in any way on the nesting of dicts.

Maybe you need some tree_transform()? Not enough info to say.

RP-pl · 2024-05-02T11:02:20Z

Maybe more complex example would help.
For data given as:

data = {
    "key11": {
        "key1": [
            {
                "key111": [1, 2, 3, 4]
            },
            {
                "key222": [11, 22, 33, 44]
            },
            {
                "key333": [111, 222, 3333, 4444]
            },
            {
                "key444": [1111, 2222, 3333, 4444]

            }],
        "key2": [
            {
                "key111": [12, 23, 34, 45]
            },
            {
                "key222": [112, 223, 334, 445]
            },
            {
                "key333": [1112, 2223, 33334, 44445]
            },
            {
                "key444": [11112, 22223, 33334, 44445]

            }],
    },
    "key22": {
        "key1": 5,
        "key2": [6, [1, 2, 3, 4]],
    }
}

Suppose we want to get every second element of the most embedded list (let's stick to that version as for your last code snippet).
Then the code to get that would look like (if you have any cleaner idea please let me know):

from funcy import ltree_leaves, lcat, is_mapping, get_end_paths,get_in

def every_2nd(data):
    prev = lcat([data for data in ltree_leaves(data, follow=is_mapping, children=lambda x: x.values()) if isinstance(data, list)])
    all_leaves = [ltree_leaves(leaf, follow=is_mapping, children=lambda x: x.values()) for leaf in prev if isinstance(leaf, dict)]
    return lcat([it[1::2] for it in lcat(l for l in all_leaves if isinstance(l, list))])

def every_2nd_paths(data):
    return [get_in(data,path) for path in filter(lambda x: len(x) == 5 and (x[4]+1) %2 == 0,get_end_paths(data))]

print(every_2nd(data)) #[2, 4, 22, 44, 222, 4444, 2222, 4444, 23, 45, 223, 445, 2223, 44445, 22223, 44445]
print(every_2nd_paths(data)) #[2, 4, 22, 44, 222, 4444, 2222, 4444, 23, 45, 223, 445, 2223, 44445, 22223, 44445]

I believe every_2nd_paths looks cleaner

Suor · 2024-05-02T12:07:09Z

I would probably go with something like this:

follow = lambda c: isinstance(c, dict) or isinstance(c, list) and isinstance(c[0], dict)
lists = tree_leaves(data, follow, children=lambda x: x.values() if isinstance(x, dict) else x)
nums = lfilter(isa(int), cat(l[1::2] for l in items if isinstance(l, list)))

At least it won't contain utter magic like len(x) == 5 and (x[4]+1) %2 == 0 :)

But I would seriously look into structuring my data more propperly, it should not be require to do such elaborate effort to extract any semantically meaningfull part of the data.

Talking about traversing dict/list structures there might be easier way than tree_leaves() and some custom follow/children but iterating by paths is not it. It is not only inefficient but won't work without your magic filter, which knows way to much about seemingly chaotic data.

Adding functionalities for paths

41497e1

RP-pl closed this May 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feature: collection's path iteration functions #154

feature: collection's path iteration functions #154

RP-pl commented Apr 12, 2024

Suor commented Apr 16, 2024

RP-pl commented Apr 19, 2024

Suor commented Apr 21, 2024

RP-pl commented Apr 24, 2024

Suor commented Apr 25, 2024

RP-pl commented Apr 30, 2024

Suor commented Apr 30, 2024 •

edited

Loading

RP-pl commented May 2, 2024

Suor commented May 2, 2024

feature: collection's path iteration functions #154

feature: collection's path iteration functions #154

Conversation

RP-pl commented Apr 12, 2024

Suor commented Apr 16, 2024

RP-pl commented Apr 19, 2024

Suor commented Apr 21, 2024

RP-pl commented Apr 24, 2024

Suor commented Apr 25, 2024

RP-pl commented Apr 30, 2024

Suor commented Apr 30, 2024 • edited Loading

RP-pl commented May 2, 2024

Suor commented May 2, 2024

Suor commented Apr 30, 2024 •

edited

Loading