Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace InstrumentedFunction by ExperimentFunction (different API) #421

Merged
merged 27 commits into from
Jan 6, 2020

Conversation

jrapin
Copy link
Contributor

@jrapin jrapin commented Dec 30, 2019

Types of changes

  • Docs change / refactoring / dependency upgrade
  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Motivation and Context / Related issue

Improve reproducibility of random functions, simplifies API, solves bugs

How Has This Been Tested (if it applies)

Checklist

  • The documentation is up-to-date with the changes I made.
  • I have read the CONTRIBUTING document and completed the CLA (see CONTRIBUTING).
  • All tests passed, and additional code has been covered with new tests.

@facebook-github-bot facebook-github-bot added the CLA Signed Do not delete this pull request or issue due to inactivity. label Dec 30, 2019
@jrapin jrapin changed the title Replace InstrumentedFunction by ParametrizedFunction (different API) [WIP] Replace InstrumentedFunction by ParametrizedFunction (different API) Dec 30, 2019
@jrapin jrapin changed the title [WIP] Replace InstrumentedFunction by ParametrizedFunction (different API) Replace InstrumentedFunction by ParametrizedFunction (different API) Dec 30, 2019
@jrapin
Copy link
Contributor Author

jrapin commented Dec 30, 2019

@teytaud this solves the reproducibility issues
Eventually (= as soon as I have time to update the code) InstrumentedFunction will disappear in favor of ExperimentFunction.
The main differences are:

  • the initialization takes 1 function and 1 instrumentation (instead of 1 function and possibly many variables)
  • __call__ directly forwards the arguments to the function, without converting to data space (while before it would take a np.ndarray from the data space
  • it has a copy method, which makes sure we can a new version with new random values we can seed (this is what makes everything work)

I also renamed instrumentation to parametrization ahead of the other PRs

@jrapin jrapin requested a review from teytaud December 30, 2019 21:13
@jrapin jrapin added Difficulty: Medium Priority: Critical Type: Breaking changes Breaking changes to the API, to be notified in changelogs Type: Bug Something isn't working labels Dec 30, 2019
@jrapin
Copy link
Contributor Author

jrapin commented Dec 30, 2019

TODO: update doc

Copy link
Contributor Author

@jrapin jrapin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now ready to review.

@@ -35,7 +35,7 @@ def repeated_basic(seed: Optional[int] = None) -> Iterator[Experiment]:
optims: List[Union[str, OptimizerFamily]] = ["OnePlusOne", optimizers.DifferentialEvolution()]
for _ in range(5):
for optim in optims:
yield Experiment(function.duplicate(), optimizer=optim, num_workers=2, budget=4, seed=next(seedg))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No more duplication here, since the function is now duplicated whenever it is actually running. This way seeding will be much safer (we seed right before duplication)

return x + y

# pylint: disable=unused-argument
def get_postponing_delay(self, args: Tuple[Any, ...], kwargs: Dict[str, Any], value: float) -> float:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a reminder, this is used to control "pseudo/mock" timing during the experiments (to compare steady and batch modes for instance)

@@ -21,8 +21,7 @@ def test_experiments_registry(name: str, maker: Callable[[], Iterator[experiment
with patch("shutil.which", return_value="here"): # do not check for missing packages
with datasets.mocked_data(): # mock mlda data that should be downloaded
check_maker(maker) # this is to extract the function for reuse if other external packages need it
if name not in {"realworld_oneshot", "mlda", "mldaas", "realworld", "powersystems", "powersystemsbig",
"fastgames", "bigfastgames"}:
if name not in {"realworld_oneshot", "mlda", "mldaas", "realworld"}:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seeding now works now (except mlda, but that's only because we cant download the data)

from ..optimization.optimizerlib import registry as optimizer_registry # import from optimizerlib so as to fill it
from . import execution

registry = decorators.Registry[Callable[..., Iterator['Experiment']]]()


class IFuncWrapper(execution.PostponedObject):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No more need for this wrapper, which is good :)

self.function = function
# Conjecture on the noise level.
if not self.function.instrumentation.probably_noisy:
if hasattr(self.function, "_parameters") and "noise_level" in self.function._parameters and self.function._parameters["noise_level"] > 0: # type: ignore
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@teytaud the probably_noisy tag is now dealt with directly in ArtificialFunction, which is safer

@@ -124,7 +125,8 @@ def __init__(self, name: str, dimension: int, transform: str = "tanh") -> None:
self._func = inst.CommandFunction(["octave-cli", "--no-gui", "--no-history", "--norc", "--quiet", "--no-window-system", path.name],
cwd=path.parent, verbose=False,
env=dict(os.environ, OMP_NUM_THREADS="1", OPENBLAS_NUM_THREADS="1"))
super().__init__(self._compute, *_make_instrumentation(name=name, dimension=dimension, transform=transform).args)
super().__init__(self._compute, _make_instrumentation(name=name, dimension=dimension, transform=transform))
self.register_initialization(name=name, dimension=dimension, transform=transform)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just to make it clear once: register_initialization keeps a record of the initialization parameters so that the "copy" method can reinstantiate a new object easily

self._descriptors.update(num_repetitions=self.runner.num_repetitions, instrumentation="")
super().__init__(self.compute, self.agent.instrumentation.copy().with_name(""))
self.register_initialization(agent=agent, env_runner=env_runner, reward_postprocessing=reward_postprocessing)
self._descriptors.update(num_repetitions=self.runner.num_repetitions, archi=self.agent.module.__class__.__name__)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added the archi descriptor here. It takes the module class name which is Perceptron or DenseNet. This will replace mono and multi  which were manually added in the experiments (which does not work anymore -> descriptors should preferably be automatic to avoid errors)

self.y = self.instrumentation.random_state.normal(size=self.dimension)
self.register_initialization(dimension=dimension)
self.order = np.arange(0, self.dimension)
self.x = self.parametrization.random_state.normal(size=self.dimension)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just to make it clear once, instrumentation becomes parametrization in ExperimentFunction for forward compatibility with future updates (see #391 ), but the underlying instance is still an Instrumentation. At some point in the near future, this will also be the case for optimizers (but after the merge of new parametrization framework).

@@ -27,30 +26,3 @@ def __call__(self, x: np.ndarray) -> np.ndarray:
if self.rotation_matrix is not None:
y = self.rotation_matrix.dot(y)
return y


class PostponedObject(abc.ABC):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both those classes are not necessary anymore since the methods are directly included in ExperimentFunction

@jrapin jrapin mentioned this pull request Jan 3, 2020
7 tasks
@jrapin jrapin changed the title Replace InstrumentedFunction by ParametrizedFunction (different API) Replace InstrumentedFunction by ExperimentFunction (different API) Jan 3, 2020
Copy link
Contributor

@teytaud teytaud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the moment I am doing tests by running NGO and checking if it correctly identifies the type of functions and everything is fine. I keep testing.

@jrapin
Copy link
Contributor Author

jrapin commented Jan 6, 2020

Reviewed with @teytaud, ready to go

@jrapin jrapin merged commit 594cf2c into master Jan 6, 2020
@jrapin jrapin deleted the paramfunc branch January 6, 2020 12:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed Do not delete this pull request or issue due to inactivity. Difficulty: Medium Priority: Critical Type: Breaking changes Breaking changes to the API, to be notified in changelogs Type: Bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants