-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[drake_gym] Add info_handler callback #21900
[drake_gym] Add info_handler callback #21900
Conversation
I am not very familiar with Gym, so let me ask a few questions and then also invite others to comment. (1) Is the (2) If not, then for users who want this kind of metadata, why would the following not be a better solution (in their own code, with no changes to Drake)? observation, reward, terminated, truncated, info = env.step(action)
info["timestamp"] = env.simulator.get_context().get_time() With this technique, users can add any metadata they like. Perhaps the problem is that the user is not directly calling (3) If we do need to populate +@JoseBarreiros-TRI for discussion and feature review, please. |
Here are my thoughts:
|
@jwnimmer-tri This is a great suggestion and works, but I'd prefer to support multiple gym environments, some not using Since it is not a convention to provide time data in If this PR is dismissed, as the desired behavior is possible with the current setup (thanks @JoseBarreiros-TRI ). That's fine too. |
Ah, thanks for the pointer; I'd missed that detail. It sounds like everyone is happy with the "info handler" proposal, so that's probably the way to proceed. |
info
return value from DrakeGymEnv.step(...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@samzapo Jenkins is showing linter errors. You can run the linters locally via bazel test --config=lint //bindings/pydrake/...
.
@JoseBarreiros-TRI do you have time/interest in serving as the reviewer here? If not, I can recruit someone else.
Reviewable status: 1 unresolved discussion, LGTM missing from assignee JoseBarreiros-TRI, needs platform reviewer assigned, needs at least two assigned reviewers, commits need curation (https://drake.mit.edu/reviewable.html#curated-commits) (waiting on @samzapo)
bindings/pydrake/gym/_drake_gym_env.py
line 256 at r2 (raw file):
terminated = False reward = 0 info = dict() if self.info_handler is None else self.info_handler(self.simulator)
nit Instead of copy-pasting the if-None-else logic multiple times, we can probably do something less brittle, e.g.
(1) Don't allow the info handler be None
; instead, change the default value in the constructor args to bedict
(i.e., the callable constructor that does what we want as a default).
(2) Add a private helper method def _info(self):
on DrakeGymEnv
with the None-handling logic, and change this to info = self._info()
or similar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
happy to review this
Reviewable status: 1 unresolved discussion, LGTM missing from assignee JoseBarreiros-TRI, needs platform reviewer assigned, needs at least two assigned reviewers, commits need curation (https://drake.mit.edu/reviewable.html#curated-commits) (waiting on @samzapo)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: LGTM missing from assignee JoseBarreiros-TRI, needs platform reviewer assigned, needs at least two assigned reviewers, commits need curation (https://drake.mit.edu/reviewable.html#curated-commits) (waiting on @samzapo)
bindings/pydrake/gym/_drake_gym_env.py
line 256 at r2 (raw file):
Previously, jwnimmer-tri (Jeremy Nimmer) wrote…
nit Instead of copy-pasting the if-None-else logic multiple times, we can probably do something less brittle, e.g.
(1) Don't allow the info handler be
None
; instead, change the default value in the constructor args to bedict
(i.e., the callable constructor that does what we want as a default).(2) Add a private helper method
def _info(self):
onDrakeGymEnv
with the None-handling logic, and change this toinfo = self._info()
or similar.
Done. I've chosen to do (1), see assignment to self.info_handler
logic
Fixing this in a moment. Immediately validating why the complicated logic on each line is hard to maintain. Code quote: info = dict() if self.info_handler is None else self.info_handler(self.simulator) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 1 unresolved discussion, LGTM missing from assignee JoseBarreiros-TRI, needs platform reviewer assigned, needs at least two assigned reviewers, commits need curation (https://drake.mit.edu/reviewable.html#curated-commits) (waiting on @samzapo)
bindings/pydrake/gym/_drake_gym_env.py
line 256 at r2 (raw file):
Previously, samzapo (Samuel Zapolsky) wrote…
Done. I've chosen to do (1), see assignment to
self.info_handler
logic
BTW I expect that foo = lambda _: dict()
can be shortened to just foo = dict
. The constructor is already a callable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: LGTM missing from assignee JoseBarreiros-TRI, needs platform reviewer assigned, needs at least two assigned reviewers, commits need curation (https://drake.mit.edu/reviewable.html#curated-commits) (waiting on @samzapo)
bindings/pydrake/gym/_drake_gym_env.py
line 256 at r2 (raw file):
Previously, jwnimmer-tri (Jeremy Nimmer) wrote…
BTW I expect that
foo = lambda _: dict()
can be shortened to justfoo = dict
. The constructor is already a callable.
Would this work if self.info_handler(simulator)
is called (i.e., with a parameter)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: LGTM missing from assignee JoseBarreiros-TRI, needs platform reviewer assigned, needs at least two assigned reviewers, commits need curation (https://drake.mit.edu/reviewable.html#curated-commits) (waiting on @samzapo)
bindings/pydrake/gym/_drake_gym_env.py
line 256 at r2 (raw file):
Previously, samzapo (Samuel Zapolsky) wrote…
Would this work if
self.info_handler(simulator)
is called (i.e., with a parameter)?
Oops, nevermind. I forgot the callback took argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 2 unresolved discussions, LGTM missing from assignee JoseBarreiros-TRI, needs platform reviewer assigned, needs at least two assigned reviewers, commits need curation (https://drake.mit.edu/reviewable.html#curated-commits) (waiting on @samzapo)
bindings/pydrake/examples/gym/envs/cart_pole.py
line 418 at r4 (raw file):
observation_port_id="observations", reset_handler=reset_handler, info_handler=lambda _: dict(),
why not set this to None or better provide a nice example of an info handler
bindings/pydrake/gym/_drake_gym_env.py
line 257 at r4 (raw file):
terminated = False reward = 0 info = self.info_handler(self.simulator)
I'm not sure about the expected behavior here. I think that if the run gets to this point it means that the simulator faulted which would cause an unexpected behavior when called by the info_handler(). Consider setting the info to an empty dict or info["sim_runtime_error"]=True.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@samzapo I'm happy to help land this PR, let me know if you have any questions.
Reviewable status: 2 unresolved discussions, LGTM missing from assignee JoseBarreiros-TRI, needs platform reviewer assigned, needs at least two assigned reviewers, commits need curation (https://drake.mit.edu/reviewable.html#curated-commits) (waiting on @samzapo)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @JoseBarreiros-TRI , I apologize for the delay, I was on vacation last week. I'll resume working on this today.
Reviewable status: 2 unresolved discussions, LGTM missing from assignee JoseBarreiros-TRI, needs platform reviewer assigned, needs at least two assigned reviewers, commits need curation (https://drake.mit.edu/reviewable.html#curated-commits)
bindings/pydrake/gym/_drake_gym_env.py
line 257 at r4 (raw file):
Previously, JoseBarreiros-TRI wrote…
I'm not sure about the expected behavior here. I think that if the run gets to this point it means that the simulator faulted which would cause an unexpected behavior when called by the info_handler(). Consider setting the info to an empty dict or info["sim_runtime_error"]=True.
I don't want to add anything to info that isn't standard.
This is what motivated adding an info handler over explicitly adding info["timestamp"]
I think it's likely that simulator
will still behave normally after most runtime errors, but this is a good point
I'd go for just returning an empty dict in this case, and adding documentation that says, e.g.,
"if the simulator emits a runtime error while advancing, the provided info handler will not be called and will return an empty dict instead".
Would that work?
…o add_timestamp_to_step_info
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 2 unresolved discussions, LGTM missing from assignee JoseBarreiros-TRI, needs platform reviewer assigned, needs at least two assigned reviewers, commits need curation (https://drake.mit.edu/reviewable.html#curated-commits) (waiting on @samzapo)
bindings/pydrake/examples/gym/envs/cart_pole.py
line 418 at r4 (raw file):
Previously, JoseBarreiros-TRI wrote…
why not set this to None or better provide a nice example of an info handler
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 1 unresolved discussion, LGTM missing from assignee JoseBarreiros-TRI, needs platform reviewer assigned, needs at least two assigned reviewers, commits need curation (https://drake.mit.edu/reviewable.html#curated-commits) (waiting on @samzapo)
bindings/pydrake/gym/_drake_gym_env.py
line 257 at r4 (raw file):
Previously, samzapo (Samuel Zapolsky) wrote…
I don't want to add anything to info that isn't standard.
This is what motivated adding an info handler over explicitly addinginfo["timestamp"]
I think it's likely that
simulator
will still behave normally after most runtime errors, but this is a good point
I'd go for just returning an empty dict in this case, and adding documentation that says, e.g.,
"if the simulator emits a runtime error while advancing, the provided info handler will not be called and will return an empty dict instead".Would that work?
yes, that is reasonable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 1 unresolved discussion, LGTM missing from assignee JoseBarreiros-TRI, needs platform reviewer assigned, needs at least two assigned reviewers, commits need curation (https://drake.mit.edu/reviewable.html#curated-commits) (waiting on @samzapo)
bindings/pydrake/gym/_drake_gym_env.py
line 257 at r4 (raw file):
Previously, JoseBarreiros-TRI wrote…
yes, that is reasonable.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: needs platform reviewer assigned, needs at least two assigned reviewers, commits need curation (https://drake.mit.edu/reviewable.html#curated-commits) (waiting on @samzapo)
+@sammy-tri for platform review per schedule, please. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 2 files at r5, all commit messages.
Reviewable status: complete! all discussions resolved, LGTM from assignees JoseBarreiros-TRI,sammy-tri(platform)
Merge remote-tracking branch 'upstream/master' into gcs-preprocessing-parallel [tools] Suppress more Valgrind false positives in MOSEK (RobotLocomotion#22140) Update parsing infrastructure (RobotLocomotion#22102) 1. MultibodyPlant::RegisterVisualGeometry() can now take a GeometryInstance. - Clean up documentation. - Extend tests to include what is *actually* done. 2. SDF parser makes use of new API. - Stop returning optional<unique_ptr<>> (simply return the unique ptr). - Instead of constructing, deconstructing, and reconstructing geometry instances, we just pass the instance. [schema] Add Rotation::Sample (RobotLocomotion#22113) This bring it on par with Transform::Sample. Move the warning on small-size PSD matrix to each solver backend. (RobotLocomotion#22136) [solvers] Fix Gurobi console logging config to happen first (RobotLocomotion#22134) [bindings] Re-enable some gym test cases on macOS (RobotLocomotion#22112) [drake_gym] Add info_handler callback (RobotLocomotion#21900) Set gradient sparsity pattern in LorentzConeConstraint and RotatedLorentzConeConstraint. (RobotLocomotion#22125) Merge master. Drop a TODO. Fix bug when inadvertantly trying to preprocess edges into the start and goal vertex. Hold a limited number of programs in memory at one time. Move the construction of the preprocessing program to a separate function for better portability. Merge up-to-date version of SolveInParallel. Merge in preprocessing solver options behavior from branch. wip lint Use the preprocessing solver options even if a specific preprocessing solver is not specified. Fix another segfault. Bring back the parallelization, revealing the segfault. Update this test to reflect the new error message. First pass at parallelizing PreprocessShortestPath. Fix a segfault when initial_guesses is unset. Temporarily disable the parallelism for debugging. finalize interface wip
Closes #21899.
This change is