PEARLWorker #2310

braunjon · 2021-11-20T09:08:16Z

Hi, I have two issues that I do not quite understand why they are part of the code

The code overwrites the action a in the deterministic case:

Lines 743 to 746 in b4abe07

    
           a, agent_info = self.agent.get_action(self._prev_obs) 
        
           if self._deterministic: 
        
               a = agent_info['mean'] 
        
           a, agent_info = self.agent.get_action(self._prev_obs)

There is an open pull request about this here: Fix double action sampling in PEARLWorker #2275.

I was wondering if the context is ever used in self.agent. As far as I understand in the pearl.py file we never use the context of self._policy and it is also not used within the class ContextConditionedPolicy.

garage/src/garage/torch/algos/pearl.py

Lines 754 to 759 in b4abe07

    
           if self._accum_context: 
        
               s = TimeStep.from_env_step(env_step=es, 
        
                                          last_observation=self._prev_obs, 
        
                                          agent_info=agent_info, 
        
                                          episode_info=self._episode_info) 
        
               self.agent.update_context(s)

Some hints are appreciated
Thanks

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PEARLWorker #2310

PEARLWorker #2310

braunjon commented Nov 20, 2021 •

edited

Loading

PEARLWorker #2310

PEARLWorker #2310

Comments

braunjon commented Nov 20, 2021 • edited Loading

braunjon commented Nov 20, 2021 •

edited

Loading