Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add batch insert/upsert/insert_objects/upsert_objects #193

Closed
no1seman opened this issue Aug 10, 2021 · 9 comments · Fixed by #232
Closed

Add batch insert/upsert/insert_objects/upsert_objects #193

no1seman opened this issue Aug 10, 2021 · 9 comments · Fixed by #232
Assignees
Labels
customer feature A new functionality

Comments

@no1seman
Copy link

When perform huge cold data loading it willbe great to have an availability of inserting/upserting data by batches (list of tuples/objects). This functions will be called directly from cartridge-java or any other clients.

@olegrok
Copy link
Contributor

olegrok commented Aug 10, 2021

Related to tarantool/vshard#176

@kyukhin kyukhin added feature A new functionality teamE labels Aug 20, 2021
@kyukhin kyukhin added the teamP label Aug 27, 2021
@Totktonada
Copy link
Member

@unera Please, highlight what is priority of the feature?

@unera
Copy link

unera commented Oct 1, 2021

both inserts.

@unera unera removed the teamP label Oct 1, 2021
@no1seman
Copy link
Author

no1seman commented Oct 2, 2021

I should say that this task is not so easy as may seems to be. First of all @olegrok mentioned some additional functinality we need in vshard, also need to support this feature in some popular language connectors: Java/Go/python ... The main problem is how to report caller about successed or failed operations from the batch. Seems this feature must be additionally triaged

@Totktonada
Copy link
Member

We can implement batching without cluster wide consistency guarantees for now (it requires 2PC, distributes transactions or something of this kind), but with detailed reporting about errors. Is there a need in such step toward?

@Totktonada Totktonada added the needs feedback Something is unclear with the issue label Oct 6, 2021
@Totktonada
Copy link
Member

Totktonada commented Oct 7, 2021

@dsharonov agreed on that and highlighted that we should return an array of errors.

@Totktonada Totktonada removed the needs feedback Something is unclear with the issue label Oct 7, 2021
@denesterov
Copy link

There is another use case, much more common (and, I think, important) than bulk operations.

If you have a slightly complex data structure, not just Key/Document, you are in trouble with crud.

For example, we have an customer record in one space and his orders in second space, both sharded identically. Lets say we need to close one order and create another in one move, or just store customer record with all his orders at once, not interlacing with other changes / read operations.

CRUD cannot do this.

@akudiyar
Copy link
Contributor

akudiyar commented Dec 5, 2021

@denesterov

For example, we have an customer record in one space and his orders in second space, both sharded identically. Lets say we need to close one order and create another in one move, or just store customer record with all his orders at once, not interlacing with other changes / read operations.

I have a proposal of function registration API as a basis for implementing such cases: tarantool/cartridge#1799

AnaNek added a commit that referenced this issue Feb 3, 2022
Batch insert is mostly used for operation with
one bucket / one Tarantool node in a transaction.
In this case batch insert is more efficient
then inserting tuple-by-tuple.
Right now CRUD cannot provide batch insert with full consistency.
CRUD offers batch insert with partial consistency. That means
that full consistency can be provided only on single replicaset
using `box` transactions.

Part of #193
AnaNek added a commit that referenced this issue Feb 4, 2022
Batch insert is mostly used for operation with
one bucket / one Tarantool node in a transaction.
In this case batch insert is more efficient
then inserting tuple-by-tuple.
Right now CRUD cannot provide batch insert with full consistency.
CRUD offers batch insert with partial consistency. That means
that full consistency can be provided only on single replicaset
using `box` transactions.

Part of #193
AnaNek added a commit that referenced this issue Feb 4, 2022
Batch upsert is mostly used for operation with
one bucket / one Tarantool node in a transaction.
In this case batch upsert is more efficient
then upserting tuple-by-tuple.
Right now CRUD cannot provide batch upsert with full consistency.
CRUD offers batch upsert with partial consistency. That means
that full consistency can be provided only on single replicaset
using `box` transactions.

Closes #193
AnaNek added a commit that referenced this issue Feb 4, 2022
Batch insert is mostly used for operation with
one bucket / one Tarantool node in a transaction.
In this case batch insert is more efficient
then inserting tuple-by-tuple.
Right now CRUD cannot provide batch insert with full consistency.
CRUD offers batch insert with partial consistency. That means
that full consistency can be provided only on single replicaset
using `box` transactions.

Part of #193
AnaNek added a commit that referenced this issue Feb 4, 2022
Batch upsert is mostly used for operation with
one bucket / one Tarantool node in a transaction.
In this case batch upsert is more efficient
then upserting tuple-by-tuple.
Right now CRUD cannot provide batch upsert with full consistency.
CRUD offers batch upsert with partial consistency. That means
that full consistency can be provided only on single replicaset
using `box` transactions.

Closes #193
AnaNek added a commit that referenced this issue Feb 4, 2022
Batch insert is mostly used for operation with
one bucket / one Tarantool node in a transaction.
In this case batch insert is more efficient
then inserting tuple-by-tuple.
Right now CRUD cannot provide batch insert with full consistency.
CRUD offers batch insert with partial consistency. That means
that full consistency can be provided only on single replicaset
using `box` transactions.

Part of #193
AnaNek added a commit that referenced this issue Feb 4, 2022
Batch upsert is mostly used for operation with
one bucket / one Tarantool node in a transaction.
In this case batch upsert is more efficient
then upserting tuple-by-tuple.
Right now CRUD cannot provide batch upsert with full consistency.
CRUD offers batch upsert with partial consistency. That means
that full consistency can be provided only on single replicaset
using `box` transactions.

Closes #193
AnaNek added a commit that referenced this issue Feb 4, 2022
Batch upsert is mostly used for operation with
one bucket / one Tarantool node in a transaction.
In this case batch upsert is more efficient
then upserting tuple-by-tuple.
Right now CRUD cannot provide batch upsert with full consistency.
CRUD offers batch upsert with partial consistency. That means
that full consistency can be provided only on single replicaset
using `box` transactions.

Closes #193
AnaNek added a commit that referenced this issue Feb 7, 2022
Batch upsert is mostly used for operation with
one bucket / one Tarantool node in a transaction.
In this case batch upsert is more efficient
then upserting tuple-by-tuple.
Right now CRUD cannot provide batch upsert with full consistency.
CRUD offers batch upsert with partial consistency. That means
that full consistency can be provided only on single replicaset
using `box` transactions.

Closes #193
AnaNek added a commit that referenced this issue Apr 8, 2022
Batch insert is mostly used for operation with
one bucket / one Tarantool node in a transaction.
In this case batch insert is more efficient
then inserting tuple-by-tuple.
Right now CRUD cannot provide batch insert with full consistency.
CRUD offers batch insert with partial consistency. That means
that full consistency can be provided only on single replicaset
using `box` transactions.

Part of #193
AnaNek added a commit that referenced this issue Apr 8, 2022
Batch upsert is mostly used for operation with
one bucket / one Tarantool node in a transaction.
In this case batch upsert is more efficient
then upserting tuple-by-tuple.
Right now CRUD cannot provide batch upsert with full consistency.
CRUD offers batch upsert with partial consistency. That means
that full consistency can be provided only on single replicaset
using `box` transactions.

Part of #193
AnaNek added a commit that referenced this issue Apr 8, 2022
Before this commit `simple_operation_cases`
were organized as map
(table indexed not with numbers),
in Lua iteration over map does not occur in
the order in which the elements were specified
in the map.
But simple operation cases should be executed
in the order in which they are specified,
because, for example, if `replace()` is performed
before `insert()`, an error will be received.
Therefore, `simple_operation_cases` has been
refactored as table indexed with numbers.
Select cases has been refactored for consistency.

Part of #193
AnaNek added a commit that referenced this issue Apr 8, 2022
Since we have PR #244 it will be nice to collect
statistics for batch operations too.
To establish the effectiveness of `crud.batch_insert()`
method compared to `crud.insert()`, perf tests were added.
`crud.insert()` in the loop and `crud.batch_insert()`
are compared for different batch sizes.

Closes #193
AnaNek added a commit that referenced this issue Jun 24, 2022
Since we have PR #244 it will be nice to collect
statistics for batch operations too.
To establish the effectiveness of `crud.batch_insert()`
method compared to `crud.insert()`, perf tests were added.
`crud.insert()` in the loop and `crud.batch_insert()`
are compared for different batch sizes.

Closes #193
AnaNek added a commit that referenced this issue Jun 27, 2022
Batch insert is mostly used for operation with
one bucket / one Tarantool node in a transaction.
In this case batch insert is more efficient
then inserting tuple-by-tuple.
Right now CRUD cannot provide batch insert with full consistency.
CRUD offers batch insert with partial consistency. That means
that full consistency can be provided only on single replicaset
using `box` transactions.

Part of #193
AnaNek added a commit that referenced this issue Jun 27, 2022
Batch upsert is mostly used for operation with
one bucket / one Tarantool node in a transaction.
In this case batch upsert is more efficient
then upserting tuple-by-tuple.
Right now CRUD cannot provide batch upsert with full consistency.
CRUD offers batch upsert with partial consistency. That means
that full consistency can be provided only on single replicaset
using `box` transactions.

Part of #193
AnaNek added a commit that referenced this issue Jun 27, 2022
Batch upsert is mostly used for operation with
one bucket / one Tarantool node in a transaction.
In this case batch replace is more efficient
then replacing tuple-by-tuple.
Right now CRUD cannot provide batch replace with full consistency.
CRUD offers batch upsert with partial consistency. That means
that full consistency can be provided only on single replicaset
using `box` transactions.

Part of #193
AnaNek added a commit that referenced this issue Jun 27, 2022
Before this commit `simple_operation_cases`
were organized as map
(table indexed not with numbers),
in Lua iteration over map does not occur in
the order in which the elements were specified
in the map.
But simple operation tests could fail
in case if tests would be executed not
in the order in which they are specified,
because, for example, if `replace()` is performed
before `insert()`, an error will be received.
So simple operation tests are codependent.
To solve this problem `truncate_space_on_cluster`
was added after each simple operation test.

Part of #193
AnaNek added a commit that referenced this issue Jun 27, 2022
Since we have PR #244 it will be nice to collect
statistics for batch operations too.
To establish the effectiveness of `crud.batch_insert()`
method compared to `crud.insert()`, perf tests were added.
`crud.insert()` in the loop and `crud.batch_insert()`
are compared for different batch sizes.

Closes #193
AnaNek added a commit that referenced this issue Jun 27, 2022
Batch insert is mostly used for operation with
one bucket / one Tarantool node in a transaction.
In this case batch insert is more efficient
then inserting tuple-by-tuple.
Right now CRUD cannot provide batch insert with full consistency.
CRUD offers batch insert with partial consistency. That means
that full consistency can be provided only on single replicaset
using `box` transactions.

Part of #193
AnaNek added a commit that referenced this issue Jun 27, 2022
Batch upsert is mostly used for operation with
one bucket / one Tarantool node in a transaction.
In this case batch upsert is more efficient
then upserting tuple-by-tuple.
Right now CRUD cannot provide batch upsert with full consistency.
CRUD offers batch upsert with partial consistency. That means
that full consistency can be provided only on single replicaset
using `box` transactions.

Part of #193
AnaNek added a commit that referenced this issue Jun 27, 2022
Batch upsert is mostly used for operation with
one bucket / one Tarantool node in a transaction.
In this case batch replace is more efficient
then replacing tuple-by-tuple.
Right now CRUD cannot provide batch replace with full consistency.
CRUD offers batch upsert with partial consistency. That means
that full consistency can be provided only on single replicaset
using `box` transactions.

Part of #193
AnaNek added a commit that referenced this issue Jun 27, 2022
Before this commit `simple_operation_cases`
were organized as map
(table indexed not with numbers),
in Lua iteration over map does not occur in
the order in which the elements were specified
in the map.
But simple operation tests could fail
in case if tests would be executed not
in the order in which they are specified,
because, for example, if `replace()` is performed
before `insert()`, an error will be received.
So simple operation tests are codependent.
To solve this problem `truncate_space_on_cluster`
was added after each simple operation test.

Part of #193
AnaNek added a commit that referenced this issue Jun 27, 2022
Since we have PR #244 it will be nice to collect
statistics for batch operations too.
To establish the effectiveness of `crud.batch_insert()`
method compared to `crud.insert()`, perf tests were added.
`crud.insert()` in the loop and `crud.batch_insert()`
are compared for different batch sizes.

Closes #193
AnaNek added a commit that referenced this issue Jun 28, 2022
Batch insert is mostly used for operation with
one bucket / one Tarantool node in a transaction.
In this case batch insert is more efficient
then inserting tuple-by-tuple.
Right now CRUD cannot provide batch insert with full consistency.
CRUD offers batch insert with partial consistency. That means
that full consistency can be provided only on single replicaset
using `box` transactions.

Part of #193
AnaNek added a commit that referenced this issue Jun 28, 2022
Batch upsert is mostly used for operation with
one bucket / one Tarantool node in a transaction.
In this case batch upsert is more efficient
then upserting tuple-by-tuple.
Right now CRUD cannot provide batch upsert with full consistency.
CRUD offers batch upsert with partial consistency. That means
that full consistency can be provided only on single replicaset
using `box` transactions.

Part of #193
AnaNek added a commit that referenced this issue Jun 28, 2022
Batch upsert is mostly used for operation with
one bucket / one Tarantool node in a transaction.
In this case batch replace is more efficient
then replacing tuple-by-tuple.
Right now CRUD cannot provide batch replace with full consistency.
CRUD offers batch upsert with partial consistency. That means
that full consistency can be provided only on single replicaset
using `box` transactions.

Part of #193
AnaNek added a commit that referenced this issue Jun 28, 2022
Before this commit `simple_operation_cases`
were organized as map
(table indexed not with numbers),
in Lua iteration over map does not occur in
the order in which the elements were specified
in the map.
But simple operation tests could fail
in case if tests would be executed not
in the order in which they are specified,
because, for example, if `replace()` is performed
before `insert()`, an error will be received.
So simple operation tests are codependent.
To solve this problem `truncate_space_on_cluster`
was added after each simple operation test.

Part of #193
AnaNek added a commit that referenced this issue Jun 28, 2022
Since we have PR #244 it will be nice to collect
statistics for batch operations too.
To establish the effectiveness of `crud.batch_insert()`
method compared to `crud.insert()`, perf tests were added.
`crud.insert()` in the loop and `crud.batch_insert()`
are compared for different batch sizes.

Closes #193
AnaNek added a commit that referenced this issue Jun 28, 2022
Batch upsert is mostly used for operation with
one bucket / one Tarantool node in a transaction.
In this case batch upsert is more efficient
then upserting tuple-by-tuple.
Right now CRUD cannot provide batch upsert with full consistency.
CRUD offers batch upsert with partial consistency. That means
that full consistency can be provided only on single replicaset
using `box` transactions.

Part of #193
AnaNek added a commit that referenced this issue Jun 28, 2022
Batch upsert is mostly used for operation with
one bucket / one Tarantool node in a transaction.
In this case batch replace is more efficient
then replacing tuple-by-tuple.
Right now CRUD cannot provide batch replace with full consistency.
CRUD offers batch upsert with partial consistency. That means
that full consistency can be provided only on single replicaset
using `box` transactions.

Part of #193
AnaNek added a commit that referenced this issue Jun 28, 2022
Before this commit `simple_operation_cases`
were organized as map
(table indexed not with numbers),
in Lua iteration over map does not occur in
the order in which the elements were specified
in the map.
But simple operation tests could fail
in case if tests would be executed not
in the order in which they are specified,
because, for example, if `replace()` is performed
before `insert()`, an error will be received.
So simple operation tests are codependent.
To solve this problem `truncate_space_on_cluster`
was added after each simple operation test.

Part of #193
AnaNek added a commit that referenced this issue Jun 28, 2022
Since we have PR #244 it will be nice to collect
statistics for batch operations too.
To establish the effectiveness of `crud.batch_insert()`
method compared to `crud.insert()`, perf tests were added.
`crud.insert()` in the loop and `crud.batch_insert()`
are compared for different batch sizes.

Closes #193
Totktonada pushed a commit that referenced this issue Jun 28, 2022
Batch insert is mostly used for operation with
one bucket / one Tarantool node in a transaction.
In this case batch insert is more efficient
then inserting tuple-by-tuple.
Right now CRUD cannot provide batch insert with full consistency.
CRUD offers batch insert with partial consistency. That means
that full consistency can be provided only on single replicaset
using `box` transactions.

Part of #193
Totktonada pushed a commit that referenced this issue Jun 28, 2022
Batch upsert is mostly used for operation with
one bucket / one Tarantool node in a transaction.
In this case batch upsert is more efficient
then upserting tuple-by-tuple.
Right now CRUD cannot provide batch upsert with full consistency.
CRUD offers batch upsert with partial consistency. That means
that full consistency can be provided only on single replicaset
using `box` transactions.

Part of #193
Totktonada pushed a commit that referenced this issue Jun 28, 2022
Batch upsert is mostly used for operation with
one bucket / one Tarantool node in a transaction.
In this case batch replace is more efficient
then replacing tuple-by-tuple.
Right now CRUD cannot provide batch replace with full consistency.
CRUD offers batch upsert with partial consistency. That means
that full consistency can be provided only on single replicaset
using `box` transactions.

Part of #193
Totktonada pushed a commit that referenced this issue Jun 28, 2022
Before this commit `simple_operation_cases`
were organized as map
(table indexed not with numbers),
in Lua iteration over map does not occur in
the order in which the elements were specified
in the map.
But simple operation tests could fail
in case if tests would be executed not
in the order in which they are specified,
because, for example, if `replace()` is performed
before `insert()`, an error will be received.
So simple operation tests are codependent.
To solve this problem `truncate_space_on_cluster`
was added after each simple operation test.

Part of #193
Totktonada pushed a commit that referenced this issue Jun 28, 2022
Since we have PR #244 it will be nice to collect
statistics for batch operations too.
To establish the effectiveness of `crud.batch_insert()`
method compared to `crud.insert()`, perf tests were added.
`crud.insert()` in the loop and `crud.batch_insert()`
are compared for different batch sizes.

Closes #193
Totktonada added a commit that referenced this issue Jun 28, 2022
## Overview

This release offers support of several `*_many()` and `*_object_many()`
operations to insert/replace/upsert many tuples at once.

Those operations are faster for many tuples/operations of the same kind.
Say, if you want to add large amount of data into the cluster, invoke
`insert_many()` with 100 (or 1000, depends of the size) tuples per call.

## Breaking changes

There are no breaking changes in the release.

## New features

* Insert many tuples/objects at once (#193).

  ```lua
  crud.insert_many(space_name, tuples, opts)
  crud.insert_object_many(space_name, objects, opts)
  ```
* Replace many tuples/objects at once (#193).

  ```lua
  crud.replace_many(space_name, tuples, opts)
  crud.replace_object_many(space_name, objects, opts)
  ```
* Perform many upsert operations at once (#193).

  ```lua
  crud.upsert_many(space_name, tuples_operation_data, opts)
  crud.upsert_object_many(space_name, objects_operation_data, opts)
  ```

Example:

```lua
crud.replace_many('developers', {
  {1, box.NULL, 'Elizabeth', 'lizaaa'},
  {2, box.NULL, 'Anastasia', 'iamnewdeveloper'},
})
---
- metadata:
  - {'name': 'id', 'type': 'unsigned'}
  - {'name': 'bucket_id', 'type': 'unsigned'}
  - {'name': 'name', 'type': 'string'}
  - {'name': 'login', 'type': 'string'}
  rows:
  - [1, 477, 'Elizabeth', 'lizaaa']
  - [2, 401, 'Anastasia', 'iamnewdeveloper']
...
```

The `*_many()` operations have almost same options as
insert/replace/upsert and two new ones to control how errors are
interpreted on a storage:

* `stop_on_error` (`boolean`, default is `false`)

  If an error occurs on a storage, stop processing operations of the
  request on given storage.

  **Only on the storage, where the error occurs.**
* `rollback_on_error` (`boolean`, default is `false`)

  Rollback all changes on the storage, where an error occurs.

  **Only on the storage, where the error occurs.**

The operations may succeed partially, so data and errors will be
returned both. Several errors can occur at a single request: those calls
return an array of errors, where each error contains the problematic
tuple/object. Consider the README for the detailed description.

Be ready to errors that are not recoverable without interaction with a
human. This implementation does NOT perform cluster wide transactions or
two phare commit: all rollbacks are made only on particular storage.
Totktonada added a commit that referenced this issue Jun 28, 2022
## Overview

This release offers support of several `*_many()` and `*_object_many()`
operations to insert/replace/upsert many tuples at once.

Those operations are faster for many tuples/operations of the same kind.
Say, if you want to add large amount of data into the cluster, invoke
`insert_many()` with 100 (or 1000, depends of the size) tuples per call.

## Breaking changes

There are no breaking changes in the release.

## New features

* Insert many tuples/objects at once (#193).

  ```lua
  crud.insert_many(space_name, tuples, opts)
  crud.insert_object_many(space_name, objects, opts)
  ```
* Replace many tuples/objects at once (#193).

  ```lua
  crud.replace_many(space_name, tuples, opts)
  crud.replace_object_many(space_name, objects, opts)
  ```
* Perform many upsert operations at once (#193).

  ```lua
  crud.upsert_many(space_name, tuples_operation_data, opts)
  crud.upsert_object_many(space_name, objects_operation_data, opts)
  ```

Example:

```lua
crud.replace_many('developers', {
  {1, box.NULL, 'Elizabeth', 'lizaaa'},
  {2, box.NULL, 'Anastasia', 'iamnewdeveloper'},
})
---
- metadata:
  - {'name': 'id', 'type': 'unsigned'}
  - {'name': 'bucket_id', 'type': 'unsigned'}
  - {'name': 'name', 'type': 'string'}
  - {'name': 'login', 'type': 'string'}
  rows:
  - [1, 477, 'Elizabeth', 'lizaaa']
  - [2, 401, 'Anastasia', 'iamnewdeveloper']
...
```

The `*_many()` operations have almost same options as
insert/replace/upsert and two new ones to control how errors are
interpreted on a storage:

* `stop_on_error` (`boolean`, default is `false`)

  If an error occurs on a storage, stop processing operations of the
  request on given storage.

  **Only on the storage, where the error occurs.**
* `rollback_on_error` (`boolean`, default is `false`)

  Rollback all changes on the storage, where an error occurs.

  **Only on the storage, where the error occurs.**

The operations may succeed partially, so data and errors will be
returned both. Several errors can occur at a single request: those calls
return an array of errors, where each error contains the problematic
tuple/object. Consider the README for the detailed description.

Be ready to errors that are not recoverable without interaction with a
human. This implementation does NOT perform cluster wide transactions or
two phare commit: all rollbacks are made only on particular storage.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
customer feature A new functionality
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants