Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RLlib] No Preprocessors (part 2). #18468

Merged

Conversation

sven1977
Copy link
Contributor

@sven1977 sven1977 commented Sep 9, 2021

This PR was motivated in preparation for soon allowing individual observation components to be addressed by the trajectory view API, for example to enable frame-stacking for individual observation components within a complex observation space (Tuple|Dict). Also, soon soft-deprecating RLlib's Preprocessor API should increase transparency for the users and allow batched, model-based preprocessing of observations. Observations will arrive in the model exactly as they are returned by the env.

This is the second part of this (already merged) PR here:
#18367

  • New config key "_disable_preprocessor_api" (default=False). Set this to True to disable any preprocessing of your env's observations: All obs data from the env will then arrive in the model as-is. No default Preprocessors will then be used, such as TupleFlattening/DictFlattening, etc.., not even the NoPreprocessor class.
  • The main changes here are in the SimpleListCollector due to the fact that observations are now possibly nested dicts/tuples of data. Such data arriving in the collector is dm_tree.flattened and stored per-column in the collector: e.g.
obs: {a={aa=np.array((2,), float), ab=np.array((), int8)}, b=np.array((5,), float)}
stored in collector as: [np.array((2,), float), np.array((), int8)), np.array((5,), float)]
  • A new example script was added (and is per-PR tested) that demonstrates the functionality of preprocessor_pref=None using a deeply nested obs space RandomEnv.

Why are these changes needed?

Related issue number

Checks

  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@sven1977 sven1977 added the tests-ok The tagger certifies test failures are unrelated and assumes personal liability. label Sep 9, 2021
@yangysc
Copy link

yangysc commented Sep 10, 2021

If preprocessor_pref is None, there are two places for torch needed to be adjusted as well.

  1. policy.py
    if we call the compute_single_action function, the obs is not a TensorType anymore

    ray/rllib/policy/policy.py

    Lines 255 to 263 in dd096c8

    out = self.compute_actions(
    [obs],
    state_batch,
    prev_action_batch=prev_action_batch,
    prev_reward_batch=prev_reward_batch,
    info_batch=info_batch,
    episodes=episodes,
    explore=explore,
    timestep=timestep)

    Since the obs now is a dict, we cannot use [obs] to add the dimension.
    We can change it to
    out = self.compute_actions(
            tree.map_structure(lambda s: s[None], obs),
            state_batch,
            prev_action_batch=prev_action_batch,
            prev_reward_batch=prev_reward_batch,
            info_batch=info_batch,
            episodes=episodes,
            explore=explore,
            timestep=timestep)
  2. The compute_actions function in torch_policy.py should be adjusted as well in case obs_batch is a dict
    SampleBatch.CUR_OBS: np.asarray(obs_batch),

@yangysc
Copy link

yangysc commented Sep 10, 2021

Another possible problem is the _multi_gpu_parallel_grad_calc in torch_policy.py.

It seems that for dict observations and without any preprocessor, the input sample_batches does not have the get_interceptor function to transform np.array to torch.Tensor.

The get_interceptor function disappears after the following line.

lock = threading.Lock()

@sven1977
Copy link
Contributor Author

Thanks for your comments @yangysc ! These were invaluable. I added another test case covering single action calculations (and training) for all frameworks with the no-preprocessor setting and a complex observation space. Seems to be solid enough now to give this a go :)

@yangysc
Copy link

yangysc commented Sep 23, 2021

Thanks for your comments @yangysc ! These were invaluable. I added another test case covering single action calculations (and training) for all frameworks with the no-preprocessor setting and a complex observation space. Seems to be solid enough now to give this a go :)

Very glad to help. Also, thanks for your excellent work.

@sven1977 sven1977 merged commit 61a1274 into ray-project:master Sep 23, 2021
@sven1977 sven1977 deleted the poc_deprecate_preprocessors_part_2 branch June 2, 2023 20:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
tests-ok The tagger certifies test failures are unrelated and assumes personal liability.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants