You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I tried to parallelize some macro operations using SEAL in a client/server setting, but I ran into, what seems, a data race problem when rotating multiple ciphertexts in parallel. The failure occurs always on the first iteration of the parallel loop. Consequent iterations seem to behave as expected. The failure does not seem to happen in the single-threaded version.
In order to reproduce, I'm attaching a reduced version of the code. This program may have to be run a few times before the failure happens due to its non-deterministic nature. Make sure to set environment variable OMP_NUM_THREADS to the same number as NUM_THREADS in the code.
If more information is required. Please, let me know.
// main.cpp
#include "seal/seal.h"
#include <algorithm>
#include <iostream>
#include <memory>
#include <mutex>
#include <omp.h>
#define NUM_THREADS 92 // number of threads to use during accumulation
#define NUM_LOOPS 10 // number of accumulations
#define VECTOR_SIZE 90 // size of vector to accumulate
// This section simulates remote server receiving encryption parameters
//----------------------------------------------------------------------
class SecureOps
{
public:
explicit SecureOps(const seal::EncryptionParameters ¶ms,
double scale,
const seal::PublicKey public_key,
const seal::GaloisKeys galois_keys);
/**
* @brief Accumulates the first \p count values in \p cipher_input vector.
* @param cipher_input[in] Encrypted vector of floats.
* @param count[in] Number of elements in \p cipher_input to accumulates.
* @param ep[in] Encryption parameters.
* @return A Ciphertext where its first element is the accumulated sum of
* the first \p count values in \p cipher_input vector.
*/
seal::Ciphertext accumulate(const seal::Ciphertext &cipher_input, std::size_t count);
private:
std::shared_ptr<seal::SEALContext> pcontext;
std::shared_ptr<seal::Encryptor> pencryptor;
std::shared_ptr<seal::Evaluator> pevaluator;
seal::GaloisKeys galois_keys;
double scale;
};
SecureOps::SecureOps(const seal::EncryptionParameters ¶ms,
double scale,
const seal::PublicKey public_key,
const seal::GaloisKeys galois_keys)
{
this->scale = scale;
this->galois_keys = galois_keys;
pcontext = seal::SEALContext::Create(params); // create context on remote using specified params
pencryptor = std::make_shared<seal::Encryptor>(pcontext, public_key);
pevaluator = std::make_shared<seal::Evaluator>(pcontext);
}
seal::Ciphertext SecureOps::accumulate(const seal::Ciphertext &cipher_input,
std::size_t count)
{
seal::Ciphertext retval;
seal::Ciphertext cipher_zero;
pencryptor->encrypt_zero(cipher_zero);
cipher_zero.scale() = scale;
retval = cipher_zero;
std::mutex mtx;
#pragma omp parallel for num_threads(NUM_THREADS)
for (int steps = 0; steps < static_cast<int>(count); ++steps)
{
seal::Ciphertext rotated;
// rotate_vector() seems to cause a data race that randomly breaks
// the operation result the first time this parallel loop is executed,
// despite using a ThreadLocal memory pool.
//-------------------------------------------------------------------------
pevaluator->rotate_vector(cipher_input, steps, galois_keys, rotated, seal::MemoryPoolHandle::ThreadLocal());
{
std::lock_guard<std::mutex> lock(mtx);
int rotated_level = pcontext->get_context_data(rotated.parms_id())->chain_index();
int result_level = pcontext->get_context_data(retval.parms_id())->chain_index();
if (rotated_level > result_level)
pevaluator->mod_switch_to_inplace(rotated, retval.parms_id());
else if (rotated_level < result_level)
pevaluator->mod_switch_to_inplace(retval, rotated.parms_id());
rotated.scale() = scale;
pevaluator->add_inplace(retval, rotated);
}
} // end for
return retval;
}
// This section simulates local client generating encryption parameters
// and requesting operation from server.
//----------------------------------------------------------------------
void test_accumulate(std::size_t n_loops, std::size_t vector_size)
{
std::vector<double> v;
std::cout << "*****************" << std::endl
<< __func__ << std::endl
<< "*****************" << std::endl << std::endl;
if (vector_size < 1)
vector_size = 1;
v.resize(vector_size);
for (std::size_t i = 0; i < v.size(); ++i)
v[i] = static_cast<double>(i + 1);
double acc = std::accumulate(v.begin(), v.end(), double(0));
// set encryption parameters for this op
seal::EncryptionParameters params(seal::scheme_type::CKKS);
size_t poly_modulus_degree = 8192;
params.set_poly_modulus_degree(poly_modulus_degree);
params.set_coeff_modulus(seal::CoeffModulus::Create(poly_modulus_degree, { 60, 40, 60 }));
double scale = std::pow(2, 40);
// initialize encryption
//-----------------------
// generate context with the parameters
auto context = seal::SEALContext::Create(params);
// generate the encryption keys
seal::KeyGenerator keygen(context);
auto public_key = keygen.public_key();
auto secret_key = keygen.secret_key();
auto galois_keys = keygen.galois_keys_local();
seal::Encryptor encryptor(context, public_key);
seal::Decryptor decryptor(context, secret_key);
// encrypt
//---------
seal::CKKSEncoder encoder(context);
seal::Plaintext plain;
seal::Ciphertext cipher_v;
encoder.encode(v, scale, plain);
encryptor.encrypt(plain, cipher_v);
plain = seal::Plaintext();
// operate
//---------
// send encryption parameters and keys to remote
SecureOps secure_ops(params, scale, public_key, galois_keys);
std::cout << "Vector size: " << v.size() << " elements" << std::endl
<< "Running " << n_loops << " accumulations." << std::endl
<< "Each accumulation uses " << NUM_THREADS << " threads." << std::endl;
std::vector<seal::Ciphertext> cipher_acc(n_loops);
std::size_t cnt;
for (cnt = 0; cnt < cipher_acc.size(); ++cnt)
{
if (cnt % 20 == 0)
std::cout << cnt << " / " << cipher_acc.size() << std::endl;
cipher_acc[cnt] = secure_ops.accumulate(cipher_v, v.size());
}
std::cout << cipher_acc.size() << " / " << cipher_acc.size() << std::endl;
// decrypt
//---------
std::cout << "Decrypting results..." << std::endl;
std::vector<double> result(cipher_acc.size());
#pragma omp parallel for
for (cnt = 0; cnt < cipher_acc.size(); ++cnt)
{
std::vector<double> decrypted_acc;
seal::Plaintext plain_result;
decryptor.decrypt(cipher_acc[cnt], plain_result);
encoder.decode(plain_result, decrypted_acc, seal::MemoryPoolHandle::ThreadLocal());
result[cnt] = decrypted_acc.front();
}
cipher_acc.clear(); // free memory
// check for accuracy
//--------------------
std::cout << "Checking results..." << std::endl;
std::size_t fail_cnt = 0;
for (cnt = 0; cnt < result.size(); ++cnt)
if (std::abs(result[cnt] - acc) > 0.001)
{
++fail_cnt;
std::cout << "FAILED: " << cnt << std::endl;
}
// output
//--------------------
std::cout << std::endl;
std::cout << "v = [ ";
for (std::size_t i = 0; i < v.size(); ++i)
std::cout << v[i] << " ";
std::cout << "]" << std::endl;
std::cout << std::endl
<< "Ground truth" << std::endl
<< "acc = " << acc << std::endl;
std::cout << std::endl
<< "HE" << std::endl
<< "acc = ";
(result.empty() ? std::cout << "NaN" : std::cout << result.front()) << std::endl;
std::cout << std::endl << "Failures: " << fail_cnt << std::endl;
}
int main()
{
test_accumulate(NUM_LOOPS, VECTOR_SIZE);
std::cout << std::endl << "Complete!" << std::endl;
return 0;
}
OS: Ubuntu 16.04 or 18.04
Compiler: Clang-10
Dependencies:
C++17, OpenMP, SEAL 3.5.8
Much appreciated for capturing this. Indeed the method util::GaloisTools::generate_table_ntt(...) that is invoked by rotate_vector is not thread-safe. A fixed is implemented and will be released very soon. Before that happens, here is a temporary solution so that you won't be blocked. Replace auto galois_keys = keygen.galois_keys_local(); with the following:
std::vector<int> steps(vector_size);
for (std::size_t i = 0; i < steps.size(); ++i)
steps[i] = static_cast<int>(i);
auto galois_keys = keygen.galois_keys_local();
Note that this solution will generate a more Galois Keys but results in faster rotate_vector calls.
I've also noticed some room for improvement in your code:
mod_switch_to_inplace is not required because rotate_vector does not change the level.
The accumulate algorithm in your code has linear complexity (on vector_size). It can be achieved with logrithmic complexity. Instead of for (int steps = 0; steps < static_cast<int>(count); ++steps), use for (int steps = 0; steps < 1 << seal::util::get_significant_bit_count(static_cast<std::uint64_t>(count)); steps <= 1). Consequently you should reduce the number of threads. This better algorithm also bypasses the issue.
I tried to parallelize some macro operations using SEAL in a client/server setting, but I ran into, what seems, a data race problem when rotating multiple ciphertexts in parallel. The failure occurs always on the first iteration of the parallel loop. Consequent iterations seem to behave as expected. The failure does not seem to happen in the single-threaded version.
In order to reproduce, I'm attaching a reduced version of the code. This program may have to be run a few times before the failure happens due to its non-deterministic nature. Make sure to set environment variable OMP_NUM_THREADS to the same number as NUM_THREADS in the code.
If more information is required. Please, let me know.
OS: Ubuntu 16.04 or 18.04
Compiler: Clang-10
Dependencies:
C++17, OpenMP, SEAL 3.5.8
Suggested compile commands:
Expected output on failure:
The text was updated successfully, but these errors were encountered: