Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluate quantum kernel matrix elements for identical samples #432

Merged
merged 19 commits into from
Jul 31, 2022
Merged
Show file tree
Hide file tree
Changes from 10 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
66 changes: 53 additions & 13 deletions qiskit_machine_learning/kernels/quantum_kernel.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@
# that they have been altered from the originals.

"""Quantum Kernel Algorithm"""
from __future__ import annotations

from typing import Optional, Union, Sequence, Mapping, List
import copy
Expand Down Expand Up @@ -60,6 +61,7 @@ def __init__(
batch_size: int = 900,
quantum_instance: Optional[Union[QuantumInstance, Backend]] = None,
training_parameters: Optional[Union[ParameterVector, Sequence[Parameter]]] = None,
evaluate_duplicates: str | None = "non_diagonal",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this just be of type str? If I pass in None it looks like it will work, since it does a str(None) which ends up as "None" and then lowercase makes it the same as passing "none" in. I guess having both I find confusing - I mean the None/Optional aspect here and the "none" documented string value. (In the unit tests you always pass in "none" as a string.)

On a side note, would we want to mix/match in the same function both the Union the new | way of doing things. I think it would be better to be consistent at least at this level, though I realize we are moving towards the newer way of doing things where some modules may now be in new style.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I wanted to show that None is a valid value as well. But I fine to keep just str as the type hint as well. Should I keep only str?
  2. I changed the type hints across the class to the new way.

Copy link
Collaborator

@ElePT ElePT Jul 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Edit] I think that the issue here is that, as Steve pointed out, on line 101 you are doing str(evaluate_duplicates).

If this line did not exist, then the issue would be clear, the type hint should be str, because we have a non-None default, and therefore the argument is not really optional. In other words, without line 101, doing something like evaluate_duplicates = None would override the default, and later on raise an error.

Now, because line 101 exists, then if you do explicitly evaluate_duplicates = None, it later on gets converted into none, and because this is one of the keyword names we accept, then the code goes on without raising an error. But I don't think this is a behaviour we want.

So, I see 2 possible solutions:

  1. We don't use "none" as a keyword, so this edge case never takes place (however, I can't think of a good alternative)
  2. We don't convert to string on line 101, we enforce this condition to the user, and if we really want to accept an explicit None as an input (not an empty argument, but an actual None), then we handle this case in the code.

I would go for not accepting the explicit None, keeping only str as a type hint, and replacing line 101 with:|

eval_duplicates = evaluate_duplicates.lower()

I will make suggestions with this alternative.

Copy link
Collaborator Author

@adekusar-drl adekusar-drl Jul 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, let me summarize:

QuantumKernel() -> evaluate_duplicates = off_diagonal
QuantumKernel(evaluate_duplicates=None) -> evaluate_duplicates = none
QuantumKernel(evaluate_duplicates="none") -> evaluate_duplicates = none

This is how it works now and this is exactly what I wanted to achieve. If you think this is a wrong way, let me know why.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alright, I thought it was not intentional to have the second case... I still find it a bit strange but I don't think it would happen a lot anyways, so if you believe it makes sense I would be ok with it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to show that None is a valid value as well. But I fine to keep just str as the type hint as well. Should I keep only str?

My preference would be to get rid of None and have it just as str type taking "all", "off_diagonal" or "none". None would feel to me it ought to be something different, but it ends up as just a synonym for "none" if you will so to me dropping it seems the thing to do.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, removed None.

) -> None:
"""
Args:
Expand All @@ -72,6 +74,21 @@ def __init__(
training_parameters: Iterable containing ``Parameter`` objects which correspond to
quantum gates on the feature map circuit which may be tuned. If users intend to
tune feature map parameters to find optimal values, this field should be set.
evaluate_duplicates: Defines a strategy how kernel matrix elements are evaluated if
identical samples are found. Possible values are:
adekusar-drl marked this conversation as resolved.
Show resolved Hide resolved

- ``all`` means that all kernel matrix elements are evaluated, even the diagonal
ones when training. This may introduce additional noise in the matrix.
- ``non_diagonal`` when training the matrix diagonal is set to `1`, the rest
woodsp-ibm marked this conversation as resolved.
Show resolved Hide resolved
elements are fully evaluated, e.g., for two identical samples in the
dataset. When inferring, all elements are evaluated. This is the default
value.
- ``none`` when training the diagonal is set to `1` and if two identical samples
are found in the dataset the corresponding matrix element is set to `1`.
When inferring, matrix elements for identical samples are set to `1`.

Raises:
ValueError: When unsupported value is passed to `evaluate_duplicates`.
"""
# Class fields
self._feature_map = None
Expand All @@ -81,6 +98,12 @@ def __init__(
self._enforce_psd = enforce_psd
self._batch_size = batch_size
self._quantum_instance = quantum_instance
eval_duplicates = str(evaluate_duplicates).lower()
adekusar-drl marked this conversation as resolved.
Show resolved Hide resolved
if eval_duplicates not in ("all", "non_diagonal", "none"):
raise ValueError(
f"Unsupported value passed as evaluate_duplicates: {evaluate_duplicates}"
)
self._evaluate_duplicates = eval_duplicates

# Setters
self.feature_map = feature_map if feature_map is not None else ZZFeatureMap(2)
Expand Down Expand Up @@ -509,13 +532,14 @@ def evaluate(self, x_vec: np.ndarray, y_vec: np.ndarray = None) -> np.ndarray:
# initialize kernel matrix
kernel = np.zeros((x_vec.shape[0], y_vec.shape[0]))

# set diagonal to 1 if symmetric
if is_symmetric:
np.fill_diagonal(kernel, 1)

# get indices to calculate
if is_symmetric:
mus, nus = np.triu_indices(x_vec.shape[0], k=1) # remove diagonal
if self._evaluate_duplicates == "all":
mus, nus = np.triu_indices(x_vec.shape[0])
else:
# exclude diagonal and fill it with ones
mus, nus = np.triu_indices(x_vec.shape[0], k=1)
np.fill_diagonal(kernel, 1)
else:
mus, nus = np.indices((x_vec.shape[0], y_vec.shape[0]))
mus = np.asarray(mus.flat)
Expand Down Expand Up @@ -559,15 +583,24 @@ def evaluate(self, x_vec: np.ndarray, y_vec: np.ndarray = None) -> np.ndarray:
statevectors.append(results.get_statevector(j))

offset = 0 if is_symmetric else len(x_vec)
matrix_elements = [
self._compute_overlap(idx, statevectors, is_statevector_sim, measurement_basis)
for idx in list(zip(mus, nus + offset))
]
for (
i,
j,
) in zip(mus, nus):
woodsp-ibm marked this conversation as resolved.
Show resolved Hide resolved
x_i = x_vec[i]
y_j = y_vec[j]

# fill in ones for identical samples
if np.all(x_i == y_j) and self._evaluate_duplicates == "none":
kernel_value = 1.0
else:
kernel_value = self._compute_overlap(
[i, j + offset], statevectors, is_statevector_sim, measurement_basis
)

for i, j, value in zip(mus, nus, matrix_elements):
kernel[i, j] = value
kernel[i, j] = kernel_value
if is_symmetric:
kernel[j, i] = kernel[i, j]
kernel[j, i] = kernel_value

else: # not using state vector simulator
feature_map_params_x = ParameterVector("par_x", self._feature_map.num_parameters)
Expand All @@ -590,7 +623,14 @@ def evaluate(self, x_vec: np.ndarray, y_vec: np.ndarray = None) -> np.ndarray:
j = nus[sub_idx]
x_i = x_vec[i]
y_j = y_vec[j]
if not np.all(x_i == y_j):

# fill in ones for identical samples
if np.all(x_i == y_j) and self._evaluate_duplicates == "none":
kernel[i, j] = 1
if is_symmetric:
kernel[j, i] = 1
else:
# otherwise evaluate the element
to_be_computed_data_pair.append((x_i, y_j))
to_be_computed_index.append((i, j))

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
---
features:
- |
Introduced a new parameter `evaluate_duplicates` in
:class:`~qiskit_machine_learning.kernels.QuantumKernel`. This parameter defines a strategy how
kernel matrix elements are evaluated if identical samples are found.
adekusar-drl marked this conversation as resolved.
Show resolved Hide resolved
Possible values are:

- ``all`` means that all kernel matrix elements are evaluated, even the diagonal ones when
training. This may introduce additional noise in the matrix.
- ``non_diagonal`` when training the matrix diagonal is set to `1`, the rest elements are
fully evaluated, e.g., for two identical samples in the dataset. When inferring, all
elements are evaluated. This is the default value.
- ``none`` when training the diagonal is set to `1` and if two identical samples are found
in the dataset the corresponding matrix element is set to `1`. When inferring, matrix
elements for identical samples are set to `1`.
fixes:
- |
Fixed quantum kernel evaluation when duplicate samples are found in the dataset. Originally,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when duplicate samples above, apart from the parameter evaluate_duplicates it uses identical to describe samples, Here it uses duplicate like the parameter does, but this is the only place. I guess I should read duplicate and identical as meaning the same thing - i.e. two or more samples having an identical value and thus being in essence duplicates of one another.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right. I used both terms in the same meaning. I agree, I was not consistent. In terms of language and docstrings/parameters what is better?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't find it confusing to have 2 terms, I think it's pretty clear that they refer to the same concept. I believe that maybe balancing them out and using duplicate a bit more should be enough (see suggestions below).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I would prefer to see duplicate used more in the descriptions when referring to elements managed by the evaluate_duplicates.

I can see a duplicate as being another sample with an identical value as I mentioned above, but that begs another question I guess....
....when something is a duplicate is this an exact value match? - we have no tolerance here that can be set for when things are very very close that we consider them the same (a duplicate)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replaced with duplicated, thanks to Elena

kernel matrix elements were not evaluated for identical samples in the dataset and such elements
were set wrongly to zero. Now we introduced a new parameter `evaluate_duplicates` that ensures
that elements of the kernel matrix are evaluated correctly. See the feature section for more
details.
91 changes: 89 additions & 2 deletions test/kernels/test_qkernel.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,10 +19,10 @@

import numpy as np
import qiskit
from ddt import data, ddt
from ddt import data, ddt, idata, unpack
from qiskit import BasicAer, QuantumCircuit
from qiskit.circuit import Parameter
from qiskit.circuit.library import ZZFeatureMap
from qiskit.circuit.library import ZZFeatureMap, ZFeatureMap
from qiskit.transpiler import PassManagerConfig
from qiskit.transpiler.preset_passmanagers import level_1_pass_manager
from qiskit.utils import QuantumInstance, algorithm_globals, optionals
Expand Down Expand Up @@ -708,5 +708,92 @@ def test_qasm_batching(self):
self.assertEqual(sum(self.circuit_counts), num_circuits)


@ddt
class TestQuantumKernelEvaluateDuplicates(QiskitMachineLearningTestCase):
"""Test QuantumKernel for duplicate evaluation."""

def count_circuits(self, func):
"""Wrapper to record the number of circuits passed to QuantumInstance.execute.

Args:
func (Callable): execute function to be wrapped

Returns:
Callable: function wrapper
"""

@functools.wraps(func)
def wrapper(*args, **kwds):
self.circuit_counts += len(args[0])
return func(*args, **kwds)

return wrapper

def setUp(self):
super().setUp()
algorithm_globals.random_seed = 10598
self.circuit_counts = 0

self.qasm_simulator = QuantumInstance(
BasicAer.get_backend("qasm_simulator"),
seed_simulator=algorithm_globals.random_seed,
seed_transpiler=algorithm_globals.random_seed,
)

# monkey patch the qasm simulator
self.qasm_simulator.execute = self.count_circuits(self.qasm_simulator.execute)

self.feature_map = ZFeatureMap(feature_dimension=2, reps=1)

self.properties = {
"no_dups": np.array([[1, 2], [2, 3], [3, 4]]),
"dups": np.array([[1, 2], [1, 2], [3, 4]]),
"y_vec": np.array([[0, 1], [1, 2]]),
}

@idata(
[
("no_dups", "all", 6),
("no_dups", "non_diagonal", 3),
("no_dups", "none", 3),
("dups", "all", 6),
("dups", "non_diagonal", 3),
("dups", "none", 2),
]
)
@unpack
def test_evaluate_duplicates(self, dataset_name, evaluate_duplicates, expected_num_circuits):
"""Tests symmetric quantum kernel evaluation with duplicate samples."""
self.circuit_counts = 0
qkernel = QuantumKernel(
feature_map=self.feature_map,
evaluate_duplicates=evaluate_duplicates,
quantum_instance=self.qasm_simulator,
)
qkernel.evaluate(self.properties.get(dataset_name))
self.assertEqual(self.circuit_counts, expected_num_circuits)

@idata(
[
("no_dups", "all", 6),
("no_dups", "non_diagonal", 6),
("no_dups", "none", 5),
]
)
@unpack
def test_evaluate_duplicates_not_symmetric(
self, dataset_name, evaluate_duplicates, expected_num_circuits
):
"""Tests non-symmetric quantum kernel evaluation with duplicate samples."""
self.circuit_counts = 0
qkernel = QuantumKernel(
feature_map=self.feature_map,
evaluate_duplicates=evaluate_duplicates,
quantum_instance=self.qasm_simulator,
)
qkernel.evaluate(self.properties.get(dataset_name), self.properties.get("y_vec"))
self.assertEqual(self.circuit_counts, expected_num_circuits)


if __name__ == "__main__":
unittest.main()