Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batch dataset.put and dataset.delete. #421

Closed
silvolu opened this issue Dec 17, 2014 · 16 comments · Fixed by #509
Closed

Batch dataset.put and dataset.delete. #421

silvolu opened this issue Dec 17, 2014 · 16 comments · Fixed by #509
Assignees
Labels
api: datastore Issues related to the Datastore API. 🚨 This issue needs some love. triage me I really want to be triaged.

Comments

@silvolu
Copy link
Contributor

silvolu commented Dec 17, 2014

Like dataset.get, there should be a dataset.delete and dataset.put that accepts a list of keys / entities.

@silvolu silvolu added api: datastore Issues related to the Datastore API. fix-asap labels Dec 17, 2014
@dhermes
Copy link
Contributor

dhermes commented Dec 17, 2014

I need to get on #336 too and hopefully these can come together. We should actually hash out the discussion in #188 as well, since many of our core patterns are very JavaScript-like and may not be the right way forward.

@tseaver
Copy link
Contributor

tseaver commented Dec 17, 2014

Given a list of entities in hand, what is the advantage of a "convenience" method on the dataset to save or delete them, against just iterating the list and calling save or delete on each one? The get_entity and
get_entities methods on a dataset are there for when you don't already have the entities.

I guess there might be some utility to a delete method taking only keys, so that the user doesn't need to either fetch them via get_entities or convert them to protobufs to hand to the connection's delete_entities method.

@silvolu silvolu changed the title Batch dasatet.put and dataset.delete. Batch dataset.put and dataset.delete. Dec 17, 2014
@dhermes
Copy link
Contributor

dhermes commented Dec 18, 2014

I think dataset.delete and dataset.put are valid, somewhat like ndb.put_multi and ndb.delete_multi.

@tseaver
Copy link
Contributor

tseaver commented Dec 19, 2014

The Zen of Python says:

There should be one– and preferably only one –obvious way to do it.

What use is Dataset.put()? To make it work you have to have the entity objects in hand already, so why not just:

for entity in entities_to_put:
    entity.save()

If Dataset.delete() takes entities, the same applies (entities already have a delete() method). If it takes keys (only), then it might be useful, if you can reasonably expect that there is a way to have a bunch of keys in hand without also having the corresponding entities.

@silvolu
Copy link
Contributor Author

silvolu commented Dec 19, 2014

It's mostly for performances reason. Doing:

for entity in entities_to_put:
    entity.save()

Means one RPC for each save. Batching means one RPC for all saves. (and same for deletes).

@tseaver
Copy link
Contributor

tseaver commented Dec 19, 2014

Isn't that what you get from using a transaction? All the adds / saves / deletes get sent as a single RPC.

@silvolu
Copy link
Contributor Author

silvolu commented Dec 19, 2014

That's meant for a transactional context, where if one fails, they all fail.

@elibixby
Copy link

I think it makes sense to have this be a mode/flag/subclass of Dataset rather than extra methods. 1. it keeps the interface of the class cleaner, 2. it adds support for batch mutations which contain a mix of mutation types. Ideally syntax could be similar to transaction syntax.

with dataset.batch_mutation() as b:
    for entity in entities_to_put:
        b.put(entity)
    for key in keys_to_delete:
        b.delete(key)

@dhermes
Copy link
Contributor

dhermes commented Dec 19, 2014

IMO we need to focus on feature parity with the API before we can add nice sugar like that.

As @silvolu says, we need methods for batch puts and deletes.

@tseaver
Copy link
Contributor

tseaver commented Dec 19, 2014

What is the expectation for error propagation after a non-transactional batch update / delete, where some of the operations failed?

@elibixby
Copy link

@tseaver There is none. If guarantees need to be made about mutation success transactions must be used.

@dhermes It's not necessarily sugar. If instead of immediately calling connection.commit() dataset writes to a mutation_pb buffer and immediately flushes that buffer (by calling connection.commit()) then a batch_mutator class could override that behavior and only flush the buffer on context exit. I think there is a lot of user experience value in having the same method signatures in a different context for batch mutations.

This implementation also works quite nicely for transactions. I have a class hierarchy I made for the Entityless implementation of the library which could easily be adapted, if you would be interested.

@tseaver
Copy link
Contributor

tseaver commented Dec 19, 2014

@elibixby are you saying there is no way to detect the failures in batch mode? Who would use an API like that, and why would we go out of our way to support them?

@elibixby
Copy link

@tseaver this is fundamentally a property of the datastore, not the API. You could for instance make a bunch of batch writes, and then do other stuff. Then when you need to use the data, in a transaction retrieve all of the writes you made, redo the writes that failed, and operate on your data as normal. This will be faster than simply making all the writes in the transaction, as fewer writes will have to propagate through the datastore, (since transactions demand snapshot isolation and strong consistency)

EDIT: to expand on this, transactions are for making mutations that are dependent upon a certain state in the datastore (e.g. updating data based on it's current value)

OTOH you may want to make a huge number of mutations that don't depend on the state of the datastore. If you want to add 10,000 entities to the datastore (during a short time, and not dependent on the current content of the datastore), it would be EXTREMELY bad to slam the datastore with 10,000 api calls. This is a frequent use case during datastore initialization or clean up (in testing for example)

@tseaver
Copy link
Contributor

tseaver commented Dec 19, 2014

@dhermes we have examples of context-manager-based batching in Storage already:

@property
def batch(self):
    """Return a context manager which defers/batches updates.

    E.g., to batch multiple updates to a bucket::

        >>> with bucket.batch:
        ...     bucket.enable_versioning()
        ...     bucket.disable_website()

    or for a key::

        >>> with key.batch:
        ...     key.content_type = 'image/jpeg'
        ...     key.content_encoding = 'gzip'

    Updates will be aggregated and sent as a single call to
    :meth:`_patch_properties` IFF the ``with`` block exits without
    an exception.

    :rtype: :class:`_PropertyBatch`
    """
    return _PropertyBatch(self)

@tseaver tseaver self-assigned this Dec 19, 2014
@silvolu
Copy link
Contributor Author

silvolu commented Jan 8, 2015

@tseaver with the major refactorings, is this still under the radar?

@dhermes
Copy link
Contributor

dhermes commented Jan 8, 2015

It's right on the horizon (for one of us at least).

dhermes added a commit to dhermes/google-cloud-python that referenced this issue Jan 8, 2015
This is a pre-cursor to designated put(), delete() and get()
methods in this module (inspired by googleapis#421).
dhermes added a commit to dhermes/google-cloud-python that referenced this issue Jan 8, 2015
This is a pre-cursor to designated put(), delete() and get()
methods in this module (inspired by googleapis#421).
dhermes added a commit to dhermes/google-cloud-python that referenced this issue Jan 8, 2015
This is a pre-cursor to designated put(), delete() and get()
methods in this module (inspired by googleapis#421).
dhermes added a commit to dhermes/google-cloud-python that referenced this issue Jan 8, 2015
This is a pre-cursor to designated put(), delete() and get()
methods in this module (inspired by googleapis#421).
dhermes added a commit to dhermes/google-cloud-python that referenced this issue Jan 8, 2015
This is a pre-cursor to designated put(), delete() and get()
methods in this module (inspired by googleapis#421).
@jgeewax jgeewax modified the milestone: Datastore Stable Jan 30, 2015
@yoshi-automation yoshi-automation added the triage me I really want to be triaged. label Apr 6, 2020
@yoshi-automation yoshi-automation added the 🚨 This issue needs some love. label Apr 7, 2020
atulep pushed a commit that referenced this issue Apr 6, 2023
* chore(deps): update all dependencies

* 🦉 Updates from OwlBot post-processor

See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md

* revert

Co-authored-by: Owl Bot <gcf-owl-bot[bot]@users.noreply.github.com>
Co-authored-by: Anthonios Partheniou <partheniou@google.com>
atulep pushed a commit that referenced this issue Apr 6, 2023
* chore(deps): update all dependencies

* 🦉 Updates from OwlBot post-processor

See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md

* revert

Co-authored-by: Owl Bot <gcf-owl-bot[bot]@users.noreply.github.com>
Co-authored-by: Anthonios Partheniou <partheniou@google.com>
atulep pushed a commit that referenced this issue Apr 18, 2023
* chore(deps): update all dependencies

* 🦉 Updates from OwlBot post-processor

See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md

* revert

Co-authored-by: Owl Bot <gcf-owl-bot[bot]@users.noreply.github.com>
Co-authored-by: Anthonios Partheniou <partheniou@google.com>
parthea pushed a commit that referenced this issue Jun 4, 2023
Source-Link: googleapis/synthtool@6ed3a83
Post-Processor: gcr.io/cloud-devrel-public-resources/owlbot-python:latest@sha256:3abfa0f1886adaf0b83f07cb117b24a639ea1cb9cffe56d43280b977033563eb

Co-authored-by: Owl Bot <gcf-owl-bot[bot]@users.noreply.github.com>
parthea pushed a commit that referenced this issue Jun 4, 2023
Source-Link: googleapis/synthtool@fdba3ed
Post-Processor: gcr.io/cloud-devrel-public-resources/owlbot-python:latest@sha256:1f0dbd02745fb7cf255563dab5968345989308544e52b7f460deadd5e78e63b0
parthea pushed a commit that referenced this issue Jun 4, 2023
* feat: Add support for python 3.11

chore: Update gapic-generator-python to v1.8.0
PiperOrigin-RevId: 500768693

Source-Link: googleapis/googleapis@190b612

Source-Link: googleapis/googleapis-gen@7bf29a4
Copy-Tag: eyJwIjoiLmdpdGh1Yi8uT3dsQm90LnlhbWwiLCJoIjoiN2JmMjlhNDE0YjllY2FjMzE3MGYwYjY1YmRjMmE5NTcwNWMwZWYxYSJ9

* 🦉 Updates from OwlBot post-processor

See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md

Co-authored-by: Owl Bot <gcf-owl-bot[bot]@users.noreply.github.com>
parthea pushed a commit that referenced this issue Jul 6, 2023
…p/templates/python_library/.kokoro (#421)

Source-Link: https://github.com/googleapis/synthtool/commit/bb171351c3946d3c3c32e60f5f18cee8c464ec51
Post-Processor: gcr.io/cloud-devrel-public-resources/owlbot-python:latest@sha256:f62c53736eccb0c4934a3ea9316e0d57696bb49c1a7c86c726e9bb8a2f87dadf
parthea pushed a commit that referenced this issue Aug 15, 2023
* docs: add detect event with event input snippet

* Update detect_intent_event_test.py
parthea pushed a commit that referenced this issue Sep 20, 2023
* fix: Add async context manager return types

chore: Mock return_value should not populate oneof message fields

chore: Support snippet generation for services that only support REST transport

chore: Update gapic-generator-python to v1.11.0
PiperOrigin-RevId: 545430278

Source-Link: googleapis/googleapis@601b532

Source-Link: googleapis/googleapis-gen@b3f18d0
Copy-Tag: eyJwIjoiLmdpdGh1Yi8uT3dsQm90LnlhbWwiLCJoIjoiYjNmMThkMGY2NTYwYTg1NTAyMmZkMDU4ODY1ZTc2MjA0NzlkN2FmOSJ9

* 🦉 Updates from OwlBot post-processor

See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md

---------

Co-authored-by: Owl Bot <gcf-owl-bot[bot]@users.noreply.github.com>
vchudnov-g pushed a commit that referenced this issue Sep 20, 2023
- [ ] Regenerate this pull request now.

PiperOrigin-RevId: 402401837

Source-Link: googleapis/googleapis@16ff813

Source-Link: googleapis/googleapis-gen@c9e6ac2
Copy-Tag: eyJwIjoiLmdpdGh1Yi8uT3dsQm90LnlhbWwiLCJoIjoiYzllNmFjMjVkNWUzMDkxNzBmMTgwNWJmOGVhMzgyYThiMjNkZWU2NSJ9
vchudnov-g pushed a commit that referenced this issue Sep 20, 2023
🤖 I have created a release \*beep\* \*boop\*
---
## [2.10.0](https://www.github.com/googleapis/python-dialogflow/compare/v2.9.1...v2.10.0) (2021-11-12)


### Features

* add context manager support in client ([#416](https://www.github.com/googleapis/python-dialogflow/issues/416)) ([317187c](https://www.github.com/googleapis/python-dialogflow/commit/317187cbaacc6889d6fff5d7ea483fe1bc2cd9ee))
* add document metadata filter in article suggestion ([#437](https://www.github.com/googleapis/python-dialogflow/issues/437)) ([56a6e11](https://www.github.com/googleapis/python-dialogflow/commit/56a6e11622f73c6d302a5f43142ceb289b334fd1))
* add smart reply model in human agent assistant ([56a6e11](https://www.github.com/googleapis/python-dialogflow/commit/56a6e11622f73c6d302a5f43142ceb289b334fd1))
* add support for python 3.10 ([#422](https://www.github.com/googleapis/python-dialogflow/issues/422)) ([652e2e8](https://www.github.com/googleapis/python-dialogflow/commit/652e2e8d860f369b62e7866d6cf220204740ade8))
* **v2:** added support to configure security settings, language code and time zone on conversation profile ([#431](https://www.github.com/googleapis/python-dialogflow/issues/431)) ([6296673](https://www.github.com/googleapis/python-dialogflow/commit/629667367d7098cfb62bae1b6e48cc11a72b9fbc))


### Bug Fixes

* **deps:** drop packaging dependency ([fd06e9f](https://www.github.com/googleapis/python-dialogflow/commit/fd06e9fe8626ac3d86175518c52ff14efebc0f7b))
* **deps:** require google-api-core >= 1.28.0 ([fd06e9f](https://www.github.com/googleapis/python-dialogflow/commit/fd06e9fe8626ac3d86175518c52ff14efebc0f7b))


### Documentation

* clarified meaning of the legacy editions ([#426](https://www.github.com/googleapis/python-dialogflow/issues/426)) ([d7a7544](https://www.github.com/googleapis/python-dialogflow/commit/d7a7544ce69cb357d7cad13e9a44afe26c6d3cf5))
* clarified semantic of the streaming APIs ([d7a7544](https://www.github.com/googleapis/python-dialogflow/commit/d7a7544ce69cb357d7cad13e9a44afe26c6d3cf5))
* list oneofs in docstring ([fd06e9f](https://www.github.com/googleapis/python-dialogflow/commit/fd06e9fe8626ac3d86175518c52ff14efebc0f7b))
* **samples:** Added comments ([#425](https://www.github.com/googleapis/python-dialogflow/issues/425)) ([f5d40dc](https://www.github.com/googleapis/python-dialogflow/commit/f5d40dc9b4bb57b8830dcd6541a2a1189a6c9780))
* **v2beta1:** clarified meaning of the legacy editions ([fd06e9f](https://www.github.com/googleapis/python-dialogflow/commit/fd06e9fe8626ac3d86175518c52ff14efebc0f7b))
* **v2beta1:** clarified semantic of the streaming APIs ([fd06e9f](https://www.github.com/googleapis/python-dialogflow/commit/fd06e9fe8626ac3d86175518c52ff14efebc0f7b))
* **v2beta1:** recommend AnalyzeContent for future users ([#420](https://www.github.com/googleapis/python-dialogflow/issues/420)) ([1afdab3](https://www.github.com/googleapis/python-dialogflow/commit/1afdab3b50c98cc082b150ff408d0f07f11f9cf3))
* **v2:** recommend AnalyzeContent for future users ([#421](https://www.github.com/googleapis/python-dialogflow/issues/421)) ([c6940a9](https://www.github.com/googleapis/python-dialogflow/commit/c6940a9f974af95037616bd1affb34d8db4405c9))
---


This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).
parthea pushed a commit that referenced this issue Sep 22, 2023
* feat: Add support for python 3.11

chore: Update gapic-generator-python to v1.8.0
PiperOrigin-RevId: 500768693

Source-Link: googleapis/googleapis@190b612

Source-Link: googleapis/googleapis-gen@7bf29a4
Copy-Tag: eyJwIjoiLmdpdGh1Yi8uT3dsQm90LnlhbWwiLCJoIjoiN2JmMjlhNDE0YjllY2FjMzE3MGYwYjY1YmRjMmE5NTcwNWMwZWYxYSJ9

* 🦉 Updates from OwlBot post-processor

See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md

Co-authored-by: Owl Bot <gcf-owl-bot[bot]@users.noreply.github.com>
parthea pushed a commit that referenced this issue Oct 21, 2023
* feat: Update Compute Engine API to revision 20230701 (#821)

Source-Link: googleapis/googleapis@761c3cb

Source-Link: googleapis/googleapis-gen@cac56a0
Copy-Tag: eyJwIjoiLmdpdGh1Yi8uT3dsQm90LnlhbWwiLCJoIjoiY2FjNTZhMGVkZDExNzllOTI5YzMxZWJlYWFlODQ4N2YzODhkYzEwOSJ9

* 🦉 Updates from OwlBot post-processor

See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md

---------

Co-authored-by: Owl Bot <gcf-owl-bot[bot]@users.noreply.github.com>
parthea added a commit that referenced this issue Oct 21, 2023
* chore: Update gapic-generator-python to v1.11.4

PiperOrigin-RevId: 547897126

Source-Link: googleapis/googleapis@c09c75e

Source-Link: googleapis/googleapis-gen@45e0ec4
Copy-Tag: eyJwIjoiLmdpdGh1Yi8uT3dsQm90LnlhbWwiLCJoIjoiNDVlMGVjNDM0MzUxN2NkMGFhNjZiNWNhNjQyMzJhMTgwMmMyZjk0NSJ9

* 🦉 Updates from OwlBot post-processor

See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md

* feat: add git_file_source and git_repo_source to build_trigger

PiperOrigin-RevId: 550012872

Source-Link: googleapis/googleapis@f90d153

Source-Link: googleapis/googleapis-gen@7682e23
Copy-Tag: eyJwIjoiLmdpdGh1Yi8uT3dsQm90LnlhbWwiLCJoIjoiNzY4MmUyMzFiNjA1OGFhNDM5YjRiNGY2ZGYyMzYyNDBiZjg4YjNlMiJ9

* 🦉 Updates from OwlBot post-processor

See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md

* feat: Add automap_substitutions flag to use substitutions as envs in Cloud Build

PiperOrigin-RevId: 551218480

Source-Link: googleapis/googleapis@f823915

Source-Link: googleapis/googleapis-gen@5979eec
Copy-Tag: eyJwIjoiLmdpdGh1Yi8uT3dsQm90LnlhbWwiLCJoIjoiNTk3OWVlY2Q4ZDJiMGZjMTU4ZmU3MTUyYTRkYTQzY2VkMjkyOTc4MCJ9

* 🦉 Updates from OwlBot post-processor

See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md

* feat: add update_mask to UpdateBuildTriggerRequest proto

PiperOrigin-RevId: 552479161

Source-Link: googleapis/googleapis@f8415bd

Source-Link: googleapis/googleapis-gen@4ac7667
Copy-Tag: eyJwIjoiLmdpdGh1Yi8uT3dsQm90LnlhbWwiLCJoIjoiNGFjNzY2N2M3ZGVhMzg3MzhkOWUyMTc2MzA1YTgwNGRlMjI1OGJlMSJ9

* 🦉 Updates from OwlBot post-processor

See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md

* fix docs build. Issue filed upstream here googleapis/gapic-generator-python#1724

---------

Co-authored-by: Owl Bot <gcf-owl-bot[bot]@users.noreply.github.com>
Co-authored-by: Anthonios Partheniou <partheniou@google.com>
parthea added a commit that referenced this issue Oct 22, 2023
* chore(deps): update all dependencies

* 🦉 Updates from OwlBot post-processor

See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md

* revert

Co-authored-by: Owl Bot <gcf-owl-bot[bot]@users.noreply.github.com>
Co-authored-by: Anthonios Partheniou <partheniou@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: datastore Issues related to the Datastore API. 🚨 This issue needs some love. triage me I really want to be triaged.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants