Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Proposal] let Minari Dataset's attribute ref_min/max score be attribute of environment #141

Closed
im-Kitsch opened this issue Sep 7, 2023 · 7 comments

Comments

@im-Kitsch
Copy link
Contributor

Hi,

would it be better to let "ref_max_score" and "ref_min_score" be attribute of environment but not dataset? So that different datasets have a uniform normalization metric?

in d4RL implementation it also actually do so, for different dataset's corresponding environemnt, like "env-xxx-xxx", their ref_max_score and ref_min_score comes from same macro definition. https://github.com/Farama-Foundation/D4RL/blob/71a9549f2091accff93eeff68f1f3ab2c0e0a288/d4rl/gym_mujoco/__init__.py#L23-L31

So why not we directly make it as attribute of environemnt or dictionary of minari package?

@im-Kitsch im-Kitsch changed the title [Proposal] Dataset's attribute ref_min/max score may need to be attribute of environment [Proposal] let Minari Dataset's attribute ref_min/max score be attribute of environment Sep 7, 2023
@balisujohn
Copy link
Collaborator

@rodrigodelazcano What are your thoughts on this?

@younik
Copy link
Member

younik commented Oct 12, 2023

Hello @im-Kitsch, we don't change/rewrite the environments as it is done in D4RL, so we cannot add this attributes as they did.

I agree, tho, that it should be an attribute of the environment. Unfortunately, they don't even represent a theoretical max/min, so their meaning is a bit vague for me. I remember we had a discussion on officially supporting them, but we did for backward compatibility of D4RL datasets.

What do you mean with "dictionary of minari package"?

@im-Kitsch
Copy link
Contributor Author

Hi, @younik
thanks for the reply. Sorry I think "dictionary of minari package" is not a good describe. I think we could make a official util function for normalization, minari.util.normalize_score(env: str | gym.Env, episode_return: float).

In this case we could maintain a dictionary of min/max score of different environments. For unkown enviorment raises warning. The main purpose is to have a unified min/max metirc for same environment so that we could compare different algorithm based on different datasets.

Currently we use the function minari.get_normalized_score(dataset: [MinariDataset], returns: float | float32) → float | float32. I don't think MinariDataset is a proper place to save min/max score. Actually we could check D4RL's min/max score: https://github.com/Farama-Foundation/D4RL/blob/71a9549f2091accff93eeff68f1f3ab2c0e0a288/d4rl/infos.py#L106

Actually mostly different datasets' min/max score are same for same base enviornment. I think the min/max is theoretical worst score and best score, i.e. random policy/expert policy, (it's discussed here Farama-Foundation/D4RL#48 (comment)). So I would say it's better to replay get normalized score's entry Dataset as environemnt.

Similary to d4RL, I think we could made a dictionary like D4RL, i.e. make a file like metrics.py in minari package, that's what I mean "dictionary of minari pckage". We could just use keys as different environment name like "Ant-v5", "Pendulum-v1".... but not like "ant-maze-medium-v1".......

Another choice I think we could do is that to given function call minari.get_normalized_score(dataset, score), we don't check the metadata ref_min/max score of dataset but check the official theoretical max/min score (random/expert score). In this case we still maintain an official dictionary about min/max score.

So in conclusion, we remove the metadata of ref_min_score and ref_max_score of MinariDataset and maintain it independently of dataset.

@younik
Copy link
Member

younik commented Oct 21, 2023

Similary to d4RL, I think we could made a dictionary like D4RL, i.e. make a file like metrics.py in minari package, that's what I mean "dictionary of minari pckage". We could just use keys as different environment name like "Ant-v5", "Pendulum-v1".... but not like "ant-maze-medium-v1".......

Another choice I think we could do is that to given function call minari.get_normalized_score(dataset, score), we don't check the metadata ref_min/max score of dataset but check the official theoretical max/min score (random/expert score). In this case we still maintain an official dictionary about min/max score.

So in conclusion, we remove the metadata of ref_min_score and ref_max_score of MinariDataset and maintain it independently of dataset.

This can work, but what would you do if the dataset has no associated environment?

@im-Kitsch
Copy link
Contributor Author

This can work, but what would you do if the dataset has no associated environment?

good question, but anyway, MinariDataset.get_normalized_score() is an optional feature of dataset, if it doesn' have ref_min/max_score, it will throw error. So if we make function minari.get_normalized_score(env_name, score) or minari.get_nomalized_score(dataset, score), we can throw error/warning too.

Recall that the main purpose is to have an common normalizing evaluation across different dataset quality. This could be helpful in research for comparison across different env/dataset/algorithm.

And currently there are two APIs for creating dataset: create_from_buffer and create_from_collector_env, it must have an asscoiated environment.

Minari/minari/utils.py

Lines 305 to 309 in dd8406e

def create_dataset_from_buffers(
dataset_id: str,
env: gym.Env,
buffer: List[Dict[str, Union[list, Dict]]],
algorithm_name: Optional[str] = None,

Minari/minari/utils.py

Lines 457 to 460 in dd8406e

def create_dataset_from_collector_env(
dataset_id: str,
collector_env: DataCollectorV0,
algorithm_name: Optional[str] = None,

Also, looks currently MinariDadataset need an associated environment for tinitialization since it assume MinariStorage has attribute env_spec.

@younik
Copy link
Member

younik commented Oct 24, 2023

This can work, but what would you do if the dataset has no associated environment?

good question, but anyway, MinariDataset.get_normalized_score() is an optional feature of dataset, if it doesn' have ref_min/max_score, it will throw error. So if we make function minari.get_normalized_score(env_name, score) or minari.get_nomalized_score(dataset, score), we can throw error/warning too.

Recall that the main purpose is to have an common normalizing evaluation across different dataset quality. This could be helpful in research for comparison across different env/dataset/algorithm.

And currently there are two APIs for creating dataset: create_from_buffer and create_from_collector_env, it must have an asscoiated environment.

Minari/minari/utils.py

Lines 305 to 309 in dd8406e

def create_dataset_from_buffers(
dataset_id: str,
env: gym.Env,
buffer: List[Dict[str, Union[list, Dict]]],
algorithm_name: Optional[str] = None,

Minari/minari/utils.py

Lines 457 to 460 in dd8406e

def create_dataset_from_collector_env(
dataset_id: str,
collector_env: DataCollectorV0,
algorithm_name: Optional[str] = None,

Also, looks currently MinariDadataset need an associated environment for tinitialization since it assume MinariStorage has attribute env_spec.

They will not need an environment soon, see #137
My rationale is: if the ref scores must be associated to the environment instead of the dataset, datasets without environment cannot have ref scores.
Do you believe ref scores change over time, so it is useful to have them "centralized" across datasets? Because if they are fixed values, having them as attribute of the dataset (and repeated on datasets with same env) is not a big deal, isn't it?

@im-Kitsch
Copy link
Contributor Author

Hi, hmm, I think it makes sense, so let's keep current implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants