Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding circuit executor classes and shot-branching #1766

Merged
merged 63 commits into from
Aug 9, 2023

Conversation

doichanj
Copy link
Collaborator

Summary

This PR restructures parallel simulation classes that were implemented in StateChunk class.
Instead of StateChunk this PR introduces CircuitExecutor classes outside of State classes.

This PR also implements shot-branching revised from PR #1606
Now the implementation is simplified in CircuitExecutor::MultiStateExecutor class

Details and comments

image

@doichanj doichanj requested a review from hhorii March 29, 2023 07:50
@hhorii hhorii added the enhancement New feature or request label Apr 4, 2023
@hhorii hhorii added this to the Aer 0.13.0 milestone Apr 4, 2023
Copy link
Collaborator

@hhorii hhorii left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to leave comments on aer_controller.hpp first.

Comment on lines 18 to 26
#include "simulators/density_matrix/densitymatrix_state.hpp"
#include "simulators/extended_stabilizer/extended_stabilizer_state.hpp"
#include "simulators/matrix_product_state/matrix_product_state.hpp"
#include "simulators/stabilizer/stabilizer_state.hpp"
#include "simulators/statevector/qubitvector.hpp"
#include "simulators/statevector/statevector_state.hpp"
#include "simulators/superoperator/superoperator_state.hpp"
#include "simulators/tensor_network/tensor_net_state.hpp"
#include "simulators/unitary/unitary_state.hpp"
Copy link
Collaborator

@hhorii hhorii Apr 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The above include are not necessary if we do not refactor Controller::execute.

@@ -917,18 +565,7 @@ Result Controller::execute(std::vector<Circuit> &circuits,

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Controller::execute(std::vector<Circuit> &circuits, ..) method can be

  1. Determine methods (calling simulation_methods())
  2. Construct std::vector<std::shared_ptr<Executor>>
  3. Determine experiment parallelization (set_parallelization_experiments()) by using Executor.required_memory_mb()
  4. Call Executor. run_circuit() in parallel or serial

Construction of Executor can be via a static method in simulators.hpp.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I create Controller::make_circuit_executor function and removed required_memory_mb and validate_state from Controller.

Copy link
Collaborator

@hhorii hhorii left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments for circuit_executor.hpp is here:

uint_t distributed_procs_; // number of processes in communicator group
uint_t distributed_group_; // group id of distribution
int_t distributed_proc_bits_; // distributed_procs_=2^distributed_proc_bits_
// (if nprocs != power of 2, set -1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  uint_t myrank_;               // process ID
  uint_t nprocs_;               // number of processes
  uint_t distributed_rank_;     // process ID in communicator group
  uint_t distributed_procs_;    // number of processes in communicator group
  uint_t distributed_group_;    // group id of distribution
  int_t distributed_proc_bits_; // distributed_procs_=2^distributed_proc_bits_
                                // (if nprocs != power of 2, set -1)

Theses values are only for MPI, I think. Can we put them in #ifdef AER_MPI. Maybe we set some codes in aer_controller.hpp in it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are many depending codes to these params, so I would like to keep them available for non MPI environment. But I added #ifdef AER_MPI to the codes we can separate.

#endif

// settings for cuStateVec
bool cuStateVec_enable_ = false;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also can be in #ifndef AER_THRUST_CPU I guess.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added #ifdef AER_CUSTATEVEC to this param.


// Rng engine (this one is used to add noise on circuit)
RngEngine rng;
rng.set_seed(circ.seed);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rng.set_seed(circ.seed); is called in later (run_circuit_with_sampling or run_circuit_shots ). It is better to use different seeds for them.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now initial rng used for noise sampling is reused in run_circuit_xxx functions.
I think this is the strategy used in the older code.

rng.set_seed(circ.seed);
run_with_sampling(circ, state, result, rng, circ.shots);
} else {
// Vector to store parallel thread output data
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused that run_circuit_with_sampling needs to consider about parallel_shots_ . If we use sampling, state will be calculated once and then get bitstrings from the state.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed unnecessary functions and codes

Utils::apply_omp_parallel_for((par_shots > 1), 0, par_shots,
run_circuit_lambda);

// gather cregs on MPI processes and save to result
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The below can be in #AER_MPI (gather_creg_memory is also), I guess.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#ifdef AER_MPI inserted

Copy link
Collaborator

@hhorii hhorii left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed Executors. I'm wondering MultiStateExecutor is effective if shot-branching improves any methods.

@@ -948,12 +535,14 @@ Result Controller::execute(std::vector<Circuit> &circuits,
result.metadata.add(max_memory_mb_, "max_memory_mb");
result.metadata.add(max_gpu_memory_mb_, "max_gpu_memory_mb");

#ifdef AER_MPI
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very minor comments. We can consolidate multiple _OPENMP and AER_MPI ifdef blocks.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

merged ifdef blocks

Comment on lines 218 to 219
for (int_t i = 0; i < nshots; i++)
data.add_single(datum, key);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why we add the same datum multiple times here. Even if a caller wants to add the same data multiple times, it is better to add it outside of save_data_pershot as follows:

for (size_t i = 0; i < root.num_shots(); ++i) {
    result.save_data_pershot(Base::states_[root.state_index()].creg(),
                             op.string_params[0], amps, op.type, op.save_type);
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed nshots and loop from save_dat_pershot

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can reuse qerror_loc to sampling in runtime. But it is fine to use sample_noise explicitly to specify points to inject noise again in C++ (also, we may be able to cover noise injection without qerror_loc).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In principle, BatchShotsExecutor should not inherit ParallelStateExecutor because batch execution and chunk execution are independent. However, currently, batch execution is supported only by statevector and densitymatrix and the both supports batch execution. I agree with this class structure because we should avoid complexity from multiple inheritance. We will be able to consider another class structure when batch execution is supported in other methods (that do not support chunk execution).

protected:
void set_config(const Config &config) override;

void set_distribution(uint_t num_states);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add comments to explain this method? Originally, I found the following comment instate_chunk.hpp to the method of the same name excepting its argument name uint_t nprocs.

// set number of processes to be distributed

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added comment

return state.required_memory_mb(circ.num_qubits, circ.ops);
}
return std::make_shared<
CircuitExecutor::Executor<MatrixProductState::State>>();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not MultiStateExecutor is used for MPS and others?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only a simulation method who supports shot-branching inherits MultiStateExecutor at this time.

@hhorii hhorii added the Changelog: New Feature Include in the Added section of the changelog label Jul 27, 2023
doichanj and others added 15 commits July 28, 2023 10:36
Since 0.13.0, Aer does not support Python 3.7.
This commit removes github actions for CI.

* Removing python 3.7 from test workflow
* Removing python 3.7 from build workflow
* Removing python 3.7 from deploy workflow
* Removing python 3.7 from tox
* revert
* Remove python 3.7 from pyproject.toml
* Remove python 3.7 from pyproject.toml - tool
---------

Co-authored-by: Hiroshi Horii <hhorii@users.noreply.github.com>
Qiskit#1877)

Co-authored-by: Hiroshi Horii <hhorii@users.noreply.github.com>
* Fix OpenMP nested parallel

* add comment in release note

* fix true and false

* fix format

---------

Co-authored-by: Hiroshi Horii <hhorii@users.noreply.github.com>
* Support u3 gate application

* Apply clang-format

* Revert clang-format for aer_runtime_api.h

* Add release note

---------

Co-authored-by: Hiroshi Horii <hhorii@users.noreply.github.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
* Fix required_memory_mb

* add release note

---------

Co-authored-by: Hiroshi Horii <hhorii@users.noreply.github.com>
Since 0.12, Qiskit-Aer notices deprecation warnings to use of PulseSimulato. Because 0.13 will be released after +3 months since 0.12 was released, Qiskit-Aer will stop supports of pulse simulation.

* first pass at removing pulse simulator
* autoformat with black
* remove ref to aer pulse in docs
* fix lint issues
* remove pulse rst
* remove pulse tests
* add release note
* remove open pulse from CMakeLists.txt
* remove pulse tests
* remove remaining pulse codes

---------

Co-authored-by: AngeloDanducci <angelo.danducci.ii@ibm.com>
Correct C API `aer_state_initialize` to take an argument of `handler`.

* update aer_state_initialize API
* add reno
Copy link
Collaborator

@hhorii hhorii left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though I was not able to review the detail of implementation due to the size of changes, I believe that we can merge this PR and have time to refine implementation by the release of 0.13.0.

@hhorii hhorii merged commit 9999dfb into Qiskit:main Aug 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Changelog: New Feature Include in the Added section of the changelog enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants