Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add artifacts interface #1342

Merged
merged 110 commits into from
Feb 8, 2024
Merged

Conversation

coruscating
Copy link
Collaborator

@coruscating coruscating commented Dec 12, 2023

Summary

This PR adds the artifacts interface following the design in https://github.com/Qiskit/rfcs/blob/master/0007-experiment-dataframe.md.

Details and comments

  • Added the ArtifactData dataclass for representing artifacts.
  • Added ExperimentData.artifacts(), .add_artifacts(), and delete_artifact() for working with artifacts, which is stored in a thread safe list. Currently the ScatterTable and CurveFitResult objects are stored as artifacts, and experiment serialization data will be added in the future.
  • Artifacts are grouped by type and stored in a compressed format so that there aren't a huge number of individual files for composite experiments. As such, this PR depends on Add .zip format for artifact upload Qiskit-Extensions/qiskit-ibm-experiment#93 to allow .zip formats for uploading to the cloud service. Inside each zipped file is a list of JSON artifact files with the filename equal to their unique artifact ID. For composite experiments with flatten_results=True, all ScatterTable artifacts are stored in curve_data.zip in individual jsons and so forth.
  • Added a how-to for artifacts and updated documentation to demonstrate dataframe objects like AnalysisResults and the ScatterTable (dataframe.css is for styling these tables).
  • Deprecated accessing analysis results via numerical indices to anticipate removing the curve fit result from analysis results altogether in the next release.
  • Fixed bug where figure_names were being duplicated in a copied ExperimentData object.

Example experiment with artifacts (link):
image

@coruscating coruscating added this to the Release 0.6 milestone Dec 12, 2023
@coruscating coruscating marked this pull request as ready for review January 9, 2024 14:58
Copy link
Collaborator

@nkanazawa1989 nkanazawa1989 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @coruscating this looks good. Do you plan to add a tutorial for artifact? You also need to update unittest util for ExperimentData equality check.

qiskit_experiments/curve_analysis/curve_analysis.py Outdated Show resolved Hide resolved
qiskit_experiments/curve_analysis/curve_analysis.py Outdated Show resolved Hide resolved
qiskit_experiments/framework/artifact_data.py Outdated Show resolved Hide resolved
qiskit_experiments/framework/artifact_data.py Outdated Show resolved Hide resolved
qiskit_experiments/framework/artifact_data.py Outdated Show resolved Hide resolved
qiskit_experiments/framework/experiment_data.py Outdated Show resolved Hide resolved
qiskit_experiments/framework/experiment_data.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@nkanazawa1989 nkanazawa1989 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. @wshanks do you have a chance to check this PR? Newly introduced public APIs look reasonable to me. If we find any problem in their behavior I think we can fix without breaking API change. In that sense we can also merge this now for 0.6 release.

@coruscating coruscating added the Changelog: New Feature Include in the "Added" section of the changelog label Feb 7, 2024
Copy link
Collaborator

@wshanks wshanks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good. I have a couple of concerns to get your thoughts on. I also commented on several non-blocking things that could be turned into new issues.

docs/howtos/artifacts.rst Outdated Show resolved Hide resolved
def artifacts(
self,
artifact_key: int | str = None,
) -> ArtifactData | list[ArtifactData]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I don't like APIs that change output type based on the value. They are convenient in some situations like a REPL but make the user do extra work to be correct in general. Maybe artifacts could always return a list, and if artifacts()[0] seems too awkward there could be a separate artifact() method the only returns the first result (and maybe warns or errors if there were more than one result for query)?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be fair this pattern appears everywhere in experiments. I agree with your point though. I would cleanup the entire interface in next release.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair. It's avoiding adding opportunity for deprecated behavior vs keeping consistency across the API.

docs/howtos/artifacts.rst Show resolved Hide resolved
scatter_table.dataframe

The artifacts in a large composite experiment with ``flatten_results=True`` can be distinguished from
each other using the :attr:`~.ArtifactData.experiment` and :attr:`~.ArtifactData.device_components`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A good follow up would be to add more query parameters to the artifacts method, so you can do exp_data.artifacts("curve_data", device_components=[Qubit(1)]) instead of filtering the list manually after getting exp_data.artifacts("curve_data"). Maybe there could be a shortcut like qubits=[1] so you don't need to import the Qubit class to do this query.

docs/howtos/artifacts.rst Outdated Show resolved Hide resolved
for i in exp_data._figures.keys():
self.assertEqual(exp_data._figures[i], copied._figures[i])
for i in exp_data._artifacts.keys():
self.assertEqual(exp_data._artifacts[i], copied._artifacts[i])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use the public methods .artifacts(), .figure_names(), and .figure(name)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, that's better and I found a bug where figure_names were being duplicated in the copied object.

qiskit_experiments/framework/experiment_data.py Outdated Show resolved Hide resolved
if "artifact_files" in expdata.metadata:
for filename in expdata.metadata["artifact_files"]:
if service.experiment_has_file(experiment_id, filename):
artifact_file = service.file_download(experiment_id, filename)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have a lot of artifacts now so it shouldn't matter much, but we might want bulk upload/download functions?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's follow-up work for the cloud service.

@@ -2364,6 +2378,10 @@ def copy(self, copy_results: bool = True) -> "ExperimentData":
new_instance._figures = ThreadSafeOrderedDict()
new_instance.add_figures(self._figures.values())

with self._artifacts.lock:
new_instance._artifacts = ThreadSafeOrderedDict()
new_instance.add_artifacts(self._artifacts.values())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't do enough threading for it to matter but ExperimentData should probably have a top level lock. Ideally, all the data should be copied within a single lock instead of one per subcomponent.

qiskit_experiments/framework/experiment_data.py Outdated Show resolved Hide resolved
coruscating and others added 6 commits February 7, 2024 15:05
@@ -1599,6 +1545,21 @@ def analysis_results(
)
self._retrieve_analysis_results(refresh=refresh)

if index == 0:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you want to check that index 0 holds fit parameters and otherwise fall back to the other branch here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Added a check that the name starts with @ since importing PARAMS_ENTRY_PREFIX would be circular and we already use starting with @ as a filtering criteria when sending data to the plotter.

Copy link
Collaborator

@wshanks wshanks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! I will try to make issues from my other comments.

@coruscating coruscating added this pull request to the merge queue Feb 8, 2024
Merged via the queue into qiskit-community:main with commit a7d260a Feb 8, 2024
11 checks passed
@coruscating coruscating deleted the dataframe-pr3 branch February 8, 2024 04:41
github-merge-queue bot pushed a commit that referenced this pull request Apr 22, 2024
### Summary

Thanks to #1342 we can cleanup internals of `CompositeCurveAnalysis`.
Not API break and no feature upgrade with this PR.

### Details and comments

Previously the curve data and fit summary data are internally created in
`CurveAnalysis` but immediately discarded. The implementation in
`CurveAnalysis._run_analysis` is manually copied to
`CompositeCurveAnalysis._run_analysis` to access these artifact data to
create composite artifact data from them. This makes code fragile since
developers needed to manually update both base classes. With this PR,
implementation of component analysis is encapsulated.
mergify bot pushed a commit that referenced this pull request Apr 22, 2024
### Summary

Thanks to #1342 we can cleanup internals of `CompositeCurveAnalysis`.
Not API break and no feature upgrade with this PR.

### Details and comments

Previously the curve data and fit summary data are internally created in
`CurveAnalysis` but immediately discarded. The implementation in
`CurveAnalysis._run_analysis` is manually copied to
`CompositeCurveAnalysis._run_analysis` to access these artifact data to
create composite artifact data from them. This makes code fragile since
developers needed to manually update both base classes. With this PR,
implementation of component analysis is encapsulated.

(cherry picked from commit cb37d42)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Changelog: New Feature Include in the "Added" section of the changelog
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants