Possible data race on Evaluator::rotate_vector #216

sidezrw · 2020-09-14T21:06:03Z

I tried to parallelize some macro operations using SEAL in a client/server setting, but I ran into, what seems, a data race problem when rotating multiple ciphertexts in parallel. The failure occurs always on the first iteration of the parallel loop. Consequent iterations seem to behave as expected. The failure does not seem to happen in the single-threaded version.

In order to reproduce, I'm attaching a reduced version of the code. This program may have to be run a few times before the failure happens due to its non-deterministic nature. Make sure to set environment variable OMP_NUM_THREADS to the same number as NUM_THREADS in the code.

If more information is required. Please, let me know.

// main.cpp

#include "seal/seal.h"

#include <algorithm>
#include <iostream>
#include <memory>
#include <mutex>

#include <omp.h>

#define NUM_THREADS 92 // number of threads to use during accumulation
#define NUM_LOOPS 10 // number of accumulations
#define VECTOR_SIZE 90 // size of vector to accumulate

// This section simulates remote server receiving encryption parameters
//----------------------------------------------------------------------

class SecureOps
{
public:
    explicit SecureOps(const seal::EncryptionParameters &params,
                       double scale,
                       const seal::PublicKey public_key,
                       const seal::GaloisKeys galois_keys);

    /**
     * @brief Accumulates the first \p count values in \p cipher_input vector.
     * @param cipher_input[in] Encrypted vector of floats.
     * @param count[in] Number of elements in \p cipher_input to accumulates.
     * @param ep[in] Encryption parameters.
     * @return A Ciphertext where its first element is the accumulated sum of
     * the first \p count values in \p cipher_input vector.
     */
    seal::Ciphertext accumulate(const seal::Ciphertext &cipher_input, std::size_t count);

private:
    std::shared_ptr<seal::SEALContext> pcontext;
    std::shared_ptr<seal::Encryptor> pencryptor;
    std::shared_ptr<seal::Evaluator> pevaluator;
    seal::GaloisKeys galois_keys;
    double scale;
};

SecureOps::SecureOps(const seal::EncryptionParameters &params,
                                       double scale,
                                       const seal::PublicKey public_key,
                                       const seal::GaloisKeys galois_keys)
{
    this->scale = scale;
    this->galois_keys = galois_keys;
    pcontext = seal::SEALContext::Create(params); // create context on remote using specified params
    pencryptor = std::make_shared<seal::Encryptor>(pcontext, public_key);
    pevaluator = std::make_shared<seal::Evaluator>(pcontext);
}

seal::Ciphertext SecureOps::accumulate(const seal::Ciphertext &cipher_input,
                                       std::size_t count)
{
    seal::Ciphertext retval;

    seal::Ciphertext cipher_zero;
    pencryptor->encrypt_zero(cipher_zero);
    cipher_zero.scale() = scale;

    retval = cipher_zero;
    std::mutex mtx;
    #pragma omp parallel for num_threads(NUM_THREADS)
    for (int steps = 0; steps < static_cast<int>(count); ++steps)
    {
        seal::Ciphertext rotated;
        // rotate_vector() seems to cause a data race that randomly breaks
        // the operation result the first time this parallel loop is executed,
        // despite using a ThreadLocal memory pool.
        //-------------------------------------------------------------------------
        pevaluator->rotate_vector(cipher_input, steps, galois_keys, rotated, seal::MemoryPoolHandle::ThreadLocal());
        {
            std::lock_guard<std::mutex> lock(mtx);

            int rotated_level = pcontext->get_context_data(rotated.parms_id())->chain_index();
            int result_level = pcontext->get_context_data(retval.parms_id())->chain_index();
            if (rotated_level > result_level)
                pevaluator->mod_switch_to_inplace(rotated, retval.parms_id());
            else if (rotated_level < result_level)
                pevaluator->mod_switch_to_inplace(retval, rotated.parms_id());

            rotated.scale() = scale;
            pevaluator->add_inplace(retval, rotated);
        }
    } // end for

    return retval;
}


// This section simulates local client generating encryption parameters
// and requesting operation from server.
//----------------------------------------------------------------------

void test_accumulate(std::size_t n_loops, std::size_t vector_size)
{
    std::vector<double> v;

    std::cout << "*****************" << std::endl
              << __func__ << std::endl
              << "*****************" << std::endl << std::endl;

    if (vector_size < 1)
        vector_size = 1;
    v.resize(vector_size);
    for (std::size_t i = 0; i < v.size(); ++i)
        v[i] = static_cast<double>(i + 1);

    double acc = std::accumulate(v.begin(), v.end(), double(0));

    // set encryption parameters for this op
    seal::EncryptionParameters params(seal::scheme_type::CKKS);
    size_t poly_modulus_degree = 8192;
    params.set_poly_modulus_degree(poly_modulus_degree);
    params.set_coeff_modulus(seal::CoeffModulus::Create(poly_modulus_degree, { 60, 40, 60 }));
    double scale = std::pow(2, 40);

    // initialize encryption
    //-----------------------

    // generate context with the parameters
    auto context = seal::SEALContext::Create(params);

    // generate the encryption keys
    seal::KeyGenerator keygen(context);
    auto public_key = keygen.public_key();
    auto secret_key = keygen.secret_key();
    auto galois_keys = keygen.galois_keys_local();
    seal::Encryptor encryptor(context, public_key);
    seal::Decryptor decryptor(context, secret_key);

    // encrypt
    //---------

    seal::CKKSEncoder encoder(context);

    seal::Plaintext plain;
    seal::Ciphertext cipher_v;
    encoder.encode(v, scale, plain);
    encryptor.encrypt(plain, cipher_v);
    plain = seal::Plaintext();

    // operate
    //---------

    // send encryption parameters and keys to remote
    SecureOps secure_ops(params, scale, public_key, galois_keys);

    std::cout << "Vector size: " << v.size() << " elements" << std::endl
              << "Running " << n_loops << " accumulations." << std::endl
              << "Each accumulation uses " << NUM_THREADS << " threads." << std::endl;

    std::vector<seal::Ciphertext> cipher_acc(n_loops);
    std::size_t cnt;
    for (cnt = 0; cnt < cipher_acc.size(); ++cnt)
    {
        if (cnt % 20 == 0)
            std::cout << cnt << " / " << cipher_acc.size() << std::endl;
        cipher_acc[cnt] = secure_ops.accumulate(cipher_v, v.size());
    }
    std::cout << cipher_acc.size() << " / " << cipher_acc.size() << std::endl;

    // decrypt
    //---------

    std::cout << "Decrypting results..." << std::endl;
    std::vector<double> result(cipher_acc.size());
    #pragma omp parallel for
    for (cnt = 0; cnt < cipher_acc.size(); ++cnt)
    {
        std::vector<double> decrypted_acc;
        seal::Plaintext plain_result;
        decryptor.decrypt(cipher_acc[cnt], plain_result);
        encoder.decode(plain_result, decrypted_acc, seal::MemoryPoolHandle::ThreadLocal());

        result[cnt] = decrypted_acc.front();
    }
    cipher_acc.clear(); // free memory

    // check for accuracy
    //--------------------

    std::cout << "Checking results..." << std::endl;
    std::size_t fail_cnt = 0;
    for (cnt = 0; cnt < result.size(); ++cnt)
        if (std::abs(result[cnt] - acc) > 0.001)
        {
            ++fail_cnt;
            std::cout << "FAILED: " << cnt << std::endl;
        }

    // output
    //--------------------

    std::cout << std::endl;
    std::cout << "v = [ ";
    for (std::size_t i = 0; i < v.size(); ++i)
        std::cout << v[i] << " ";
    std::cout << "]" << std::endl;

    std::cout << std::endl
              << "Ground truth" << std::endl
              << "acc = " << acc << std::endl;
    std::cout << std::endl
              << "HE" << std::endl
              << "acc = ";
    (result.empty() ? std::cout << "NaN" : std::cout << result.front()) << std::endl;
    std::cout << std::endl << "Failures: " << fail_cnt << std::endl;
}

int main()
{
    test_accumulate(NUM_LOOPS, VECTOR_SIZE);

    std::cout << std::endl << "Complete!" << std::endl;

    return 0;
}

OS: Ubuntu 16.04 or 18.04
Compiler: Clang-10
Dependencies:
C++17, OpenMP, SEAL 3.5.8

Suggested compile commands:

# compile
clang++ -c -pipe -std=c++17 -fopenmp=libomp -march=native -g -Wall -W -fPIC -O2  -I$PATH_TO_SEAL/native/src -I$PATH_TO_SEAL/thirdparty/msgsl/src/include -o main.o main.cpp
# link
clang++ -fopenmp=libomp -march=native -o seal_accumulate_test_small main.o $PATH_TO_SEAL/lib/libseal-3.5.a

Expected output on failure:

*****************
test_accumulate
*****************

Vector size: 90 elements
Running 10 accumulations.
Each accumulation uses 92 threads.
0 / 10
10 / 10
Decrypting results...
Checking results...
FAILED: 0

v = [ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 ]

Ground truth
acc = 4095

HE
acc = 2.63179e+19

Failures: 1

Complete!

The text was updated successfully, but these errors were encountered:

WeiDaiWD · 2020-09-15T05:37:40Z

Much appreciated for capturing this. Indeed the method util::GaloisTools::generate_table_ntt(...) that is invoked by rotate_vector is not thread-safe. A fixed is implemented and will be released very soon. Before that happens, here is a temporary solution so that you won't be blocked. Replace auto galois_keys = keygen.galois_keys_local(); with the following:

std::vector<int> steps(vector_size);
for (std::size_t i = 0; i < steps.size(); ++i)
  steps[i] = static_cast<int>(i);
auto galois_keys = keygen.galois_keys_local();

Note that this solution will generate a more Galois Keys but results in faster rotate_vector calls.

I've also noticed some room for improvement in your code:

mod_switch_to_inplace is not required because rotate_vector does not change the level.
The accumulate algorithm in your code has linear complexity (on vector_size). It can be achieved with logrithmic complexity. Instead of for (int steps = 0; steps < static_cast<int>(count); ++steps), use for (int steps = 0; steps < 1 << seal::util::get_significant_bit_count(static_cast<std::uint64_t>(count)); steps <= 1). Consequently you should reduce the number of threads. This better algorithm also bypasses the issue.

kimlaine · 2020-09-17T04:05:14Z

This should be fixed now. Thanks again @sidezrw for the detailed bug report! Please let us know if there are further issues.

kimlaine closed this as completed Sep 17, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possible data race on Evaluator::rotate_vector #216

Possible data race on Evaluator::rotate_vector #216

sidezrw commented Sep 14, 2020

WeiDaiWD commented Sep 15, 2020

kimlaine commented Sep 17, 2020

Possible data race on Evaluator::rotate_vector #216

Possible data race on Evaluator::rotate_vector #216

Comments

sidezrw commented Sep 14, 2020

WeiDaiWD commented Sep 15, 2020

kimlaine commented Sep 17, 2020