Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Subset variable names from nodes #232

Closed
TomNicholas opened this issue Mar 29, 2023 · 7 comments
Closed

Subset variable names from nodes #232

TomNicholas opened this issue Mar 29, 2023 · 7 comments
Labels
good first issue Good for newcomers

Comments

@TomNicholas
Copy link
Member

I've routinely wanted something that says select these variable names from all nodes.

This is way too much typing for that:

dailies.map_over_subtree(lambda n: n[["KT", "eps", "chi"]])

Perhaps a DataTree.subset_nodes?

Originally posted by @dcherian in #79 (comment)

@TomNicholas TomNicholas added the good first issue Good for newcomers label Mar 29, 2023
@Jyotsna1304
Copy link

I am an outreachy applicant, can you please assign me this issue?

@TomNicholas
Copy link
Member Author

Hi @Jyotsna1304 - we don't assign issue to people, but you are welcome to submit a Pull Request trying to address an issue you like!

@akanshajais
Copy link

Hey @TomNicholas , I am commenting to record my contribution for this issue .
Yes, it would be more convenient to have a method like DataTree.subset_nodes that allows to select a subset of nodes based on their variable names. Here's a possible implementation I write for such a method:

def subset_nodes(self, var_names):
    """
    Returns a new DataTree object containing nodes that have all the given variable names.
    """
    new_data = {}
    for node_id, node_data in self.data.items():
        if all(var_name in node_data for var_name in var_names):
            new_data[node_id] = {var_name: node_data[var_name] for var_name in var_names}
    return DataTree(new_data)

`
we can select nodes with specific variable names like this:

subset_tree = dailies.subset_nodes(["KT", "eps", "chi"])
We can also use the map_over_subtree method on this subset tree to perform operations on the selected nodes:

subset_tree.map_over_subtree(lambda n: n["KT"] + n["eps"] + n["chi"])

This will return a list of the sums of the KT, eps, and chi variables for each node in the subset tree.

@moraraba
Copy link

moraraba commented Apr 3, 2023

Hey @TomNicholas . I am commenting to record my contribution as an Outreachy applicant.

class DataTree:
def subset_nodes(self, var_names):
# Create a new DataTree object to hold the selected nodes
subset_tree = DataTree()

    # Iterate over all nodes in the original tree
    for node in self.traverse():
        # Create a new xarray Dataset that contains only the selected variables
        subset_data = node.data[var_names]

        # Add the subset data to the subset tree as a new node
        subset_tree.add_node(node.name, subset_data, **node.attrs)

    return subset_tree

With this method, users could select specific variables by passing a list of variable names to the subset_nodes method

subset_tree = data_tree.subset_nodes(["KT", "eps", "chi"])

This would create a new DataTree object that contains nodes with only the specified variables. This can make it easier to work with large DataTree objects and select only the variables that are of interest.

@TomNicholas
Copy link
Member Author

Hi @akanshajais @moraraba, thanks for your input here.

To be clear, commenting a suggestion on the issue will not be counted as a "contribution" for the purposes of the Outreachy program. We can discuss potential approaches here, but a contribution means that you submit a pull request which meets the standard to be merged.

@Jyotsna1304
Copy link

Hi @Jyotsna1304 - we don't assign issue to people, but you are welcome to submit a Pull Request trying to address an issue you like!

Thankyou for your response.

@TomNicholas
Copy link
Member Author

This is basically the same idea as the subset API suggested in pydata/xarray#9342, so as it is already tracked upstream, I'm closing this here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants