-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RLlib] SAC crashes on the env having the dict observation space #18418
Comments
the problem seems to be this line: @RuofanKong, can I ask when was the last time you successfully used our SAC agent for a Dict obs space? |
@gjoliver the earliest version that i tried was 0.8.6, and it crashed on different errors,
So far I've not successfully get SAC working with dict obs space. |
ah, ok, just want to confirm that it's not a recent regression. this error just means you need to pip install tensorflow-probability actually. don't know why it's not in the requirements.txt. |
@gjoliver I actually have |
Hmm, there is a SAC "compilation" test case in agents/sac/tests/test_sac.py that uses the RandomEnv with a Dict obs space. Maybe we can also work with that and reproduce? |
Turns out, this has nothing to do with Dict space, Tuple space would cause the same problem.
I can probably make a fix for this. |
had a quick chat with Sven. Sven is actually cleaning up our codebase to get rid of the preprocessor stuff. this problem should go away with that bigger cleanup effort. In the mean time, Dict space should run with the workaround. |
Hey @RuofanKong , sorry for the long delay. I think the root cause here is a different one, namely:
|
I'm prepping a PR that fixes this. |
Here is a PR that will fix this problem. We will also more and more roll out the support for using no-preprocessing across all algos. Due to their specific model constraints, SAC and DQN currently actually don't support this experimental flag. |
Not sure if its the correct place to ask, but #19101 only partly fixes the problem. When having an observation space like Tuple([Box(...), Repeated(Box(...), max_len=4)]) and I have a custom model to handle the Repeated observation, the flatten logic will not work as expected. My final hacky workaround is class CustomTorchModel(TorchModelV2, nn.Module):
def __init__(self, obs_space, action_space, num_outputs, model_config,
name):
super().__init__(obs_space, action_space, num_outputs, model_config,
name)
self.obs_space = obs_space.original_space
...
If it's not the correct place to ask, I could open another issue. |
Issue Description
I was using SAC to train an agent on an environment having the dictionary type observation space but it was crashed.
System Info
Repro Steps
Run the following code with above system info, and the issue could be reproduced.
By running the code, the following crash will show up,
NOTE: The above "dictionary observation based" pendulum is a mock, and the issue occurs on all the environment having the dictionary observation space.
The text was updated successfully, but these errors were encountered: