Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feels like we've added a bit more complexity to the datastore API, comments? #647

Closed
jgeewax opened this issue Feb 15, 2015 · 10 comments
Closed
Assignees
Labels
api: datastore Issues related to the Datastore API. type: question Request for information or clarification. Not an issue.

Comments

@jgeewax
Copy link
Contributor

jgeewax commented Feb 15, 2015

Hi guys,

I was reviewing the "state of the world" of the datastore API, and found a few things that seem (to me at least) a bit more complicated than they should be.

A bit of this comes from the authentication stuff (which we have another open bug for) but I wanted to bring up a few things for discussion... If this issue gets out of control, we can move the discussion elsewhere, but I wanted to kick things off:

Create a new Person entity in the Datastore

What I have to write

from gcloud import datastore
datastore.set_defaults()
entity = datastore.Entity(key=datastore.Key('Person'))
entity.update({'name': 'Jimmy'})
datastore.put([entity])

What I'd like to write

from gcloud import datastore
entity = datastore.Entity('Person')
entity.update({'name': 'Jimmy'})
entity.put()

Key differences

  1. no set_defaults() (there's another issue open for this I believe)
  2. Create an entity of kind Person without lots more typing (or maybe we get the entity with datastore.Kind('Person').get_entity() ?)
  3. .put not requiring a list for a single entity

Get an existing Person (123) from the Datastore

What I have to write

from gcloud import datastore
datastore.set_defaults()
entity = datastore.get(datastore.Key('Person', 123))

What I'd like to write

from gcloud import datastore
datastore.set_defaults()
entity = datastore.Kind('Person').get_by_id(123)

Key differences

  1. No set_defaults
  2. Equivalent of "drilling down" into a "table" (kind), and then a "row" (id)

(Not hugely concerned about this use case, but wanted to toss it out)

Create a new Person using a specific set of credentials (/home/creds.json) to talk to a specific dataset ('my-dataset') in the Datastore

(Being looked into in PR #641)

What I have to write

(This is a best guess... I don't know if this code is even right... And even so, this code won't do the same thing if you ran it on App Engine because of the order of evaluation here: http://googlecloudplatform.github.io/gcloud-python/latest/gcloud-api.html#gcloud.credentials.get_credentials )

import sys
sys.env['GOOGLE_APPLICATION_CREDENTIALS'] = '/home/creds.json'

from gcloud import datastore
datastore.set_defaults()
entity = datastore.Entity(key=datastore.Key('Person', dataset_id='my-dataset'))
entity.update{'name': 'Jimmy'})
datastore.put(entity)

What I'd like to write

from gcloud import datastore
datastore.set_credentials('/home/creds.json')
dataset = datastore.Dataset('my-dataset')
entity = dataset.Entity('Person')
entity.update({'name': 'Jimmy'})
entity.put()

Key differences

  1. Simple in-code way to set which credentials file to use
  2. High-level concept of a dataset
  3. Datasets can provide Entities just like datastore would

/cc @tseaver @dhermes @pcostell

@jgeewax jgeewax added type: question Request for information or clarification. Not an issue. api: datastore Issues related to the Datastore API. labels Feb 15, 2015
@jgeewax jgeewax added this to the Core Stable milestone Feb 15, 2015
@dhermes
Copy link
Contributor

dhermes commented Feb 15, 2015

@jgeewax We spent a lot of work to get rid of the high-level concept of dataset. The philosophy was that we should make the objects and their constructors easier to use and that "learning" another class was overkill when the dataset was a) just a string and b) something that was a single-valued constant for most users.

@jgeewax
Copy link
Contributor Author

jgeewax commented Feb 15, 2015

That's going to be a really big problem when Datastore exposes multiple datasets per project...

@elibixby
Copy link

  1. I think that the dataset was a good higher-level concept, but it kinda got bulldozed when the major restructuring happened. You basically already have a dataset module, which is api. I think it's better to construct a dataset from datastore and leave datastore for meta-methods like set_defaults() now.
  2. I really really don't think you should be adding functionality to Entity. It inevitably ends up moving towards ORM which inevitably makes the library more and more heavyweight, and duplicates use-cases from NDB and DB. If anything I think the easiest way forward is to simplify entity. The most lightweight syntax I can think of is:

SETUP

datastore.set_defaults() //does auth and connection stuff
d = datastore.Dataset('dataset-id', **kwargs) //more advanced options like read_options should be put here since it is a one time operation

Put a single entity

key = d.put(Key(*path), Entity(**props))

Put a bunch of entities

d.put({key1: entity1, key2: entity2 ...})

Get an entity

d.get(Key(*path))

A couple comments about "what I'd like to write":

  • I think entity.put() is confusing. Where is the entity going? Is put really functionality of an entity, or functionality of a dataset on an entity and key. Why is
ent = dataset.Entity(key)
ent.update(props)
ent.put()

preferable to

ent = Entity(props)
dataset.put(key, ent)

I am completely boggled by this.

  • I also think that datastore.Kind('kind').getById(123) is really really scary. We shouldn't be encouraging people to think of this as relational database. It isn't. We also shouldn't be encouraging them to make 'kind' queries when they know the id, not only is it confusing, it's inefficient.

Finally, as an aside, I submitted a PR to remove the list wrapping requirement for single arguments #623 , but if you want to support python 27 and 26 then you need to use **kwargs and a parser rather than explicit named arguments. Dunno if it's worth it. Python 3 supports def func(arg, *args, kw=value, **kwargs) syntax.

@elibixby
Copy link

@jgeewax just wanted to add that I agree with the premise of this issue. I'm writing the samples for the landing page as we speak, and it's awkward to say the least.

@jgeewax
Copy link
Contributor Author

jgeewax commented Feb 15, 2015

Re "why .put()"

It's similar to the ext.db API. This is a matter of preference. See https://cloud.google.com/appengine/docs/python/datastore/modelclass#Model_put

profile = datastore.Entity('Profile').get_by_id(123)
profile['likes_fruit'] = False
profile.put()

Re "why .get_by_id()"

It's a helper-method for ultimately constructing a key which reads more clearly (to me). It is also similar to ext.db which provides a get_by_id method. Again, this is a matter of preference.

profile = datastore.Entity('Profile').get_by_id(1234)
# is the same as
profile = datastore.get(Key('Profile', 1234))

Re the comment about querying

We shouldn't be encouraging people to think of this as relational database. It isn't.

Nowhere here are we talking about relational data. get_by_id is a shortcut for get(Key(<kind already known>, id)). See https://cloud.google.com/appengine/docs/python/datastore/modelclass#Model_get_by_id

We also shouldn't be encouraging them to make 'kind' queries when they know the id, not only is it confusing, it's inefficient.

I don't see a query here, nor do I see anything related to efficiency. The code I'm talking about would likely look like...

class Entity(object):
  ...
  def get_by_id(self, id):
    return datastore.get(Key(self.kind, id))

@elibixby
Copy link

I thought the goal was to move away from the DB implementations, not take queues from their implementation. If the goal is to cover different use cases the syntax is going to be different.

RE: kind(foo).get_by_id(bar) I mistakenly assumed from your comment

Equivalent of "drilling down" into a "table" (kind), and then a "row" (id)

that this was how you meant it, and then it would be implemented by pulling the entities of a certain kind and searching for the right id.

EDIT: If you really wanted to add extra sugar on top of key, you could do something like

dataset.key(parent=parent_key, kind='kind', id='id')

@jgeewax
Copy link
Contributor Author

jgeewax commented Feb 15, 2015

I don't recall ever saying or hearing "we're going to avoid everything done in ext.db"...?

I think the .put() style on an Entity is a nice thing. I don't have to know anything about the connection, dataset, or authentication -- and .put() will just "save the entity the way it was retrieved". This might seem obvious now, but when multiple datasets are a thing, and different "connections" are floating around, it might be important. It means an Entity on it's own can be enough to persist data.

I could be wrong here. Either way, I don't think we should take a hard-line stance of "no stuff that was in ext.db"...

@elibixby
Copy link

I think that is going to be more of a trouble than a convenience TBH. To quote Zen of Python explicit is better than implicit.

Before the major restructuring Entity had all the necessary information (dataset, key, etc) to have all this functionality on it's own, and usage was much more cumbersome IMO. It was the point where I would rather just have used NDB, since it felt like I was being forced to do all the ORM stuff anyway.

@dhermes
Copy link
Contributor

dhermes commented Mar 9, 2015

@jgeewax Can we close this out?

The core issues were

  • Fully implicit / lazy auth
  • Dataset class

Both of those have been implemented.

Anything remaining can be broken off into a smaller more digestible bug for discussion?

@jgeewax
Copy link
Contributor Author

jgeewax commented Mar 11, 2015

SGTM

@jgeewax jgeewax closed this as completed Mar 11, 2015
vchudnov-g pushed a commit that referenced this issue Sep 20, 2023
…#647)

* docs: added google.api.field_behavior for some fields in audio_config

PiperOrigin-RevId: 546447788

Source-Link: googleapis/googleapis@370fc4c

Source-Link: googleapis/googleapis-gen@f886e2a
Copy-Tag: eyJwIjoiLmdpdGh1Yi8uT3dsQm90LnlhbWwiLCJoIjoiZjg4NmUyYTJiYTFlYjg3YTA4YjY5MTI3ZmFhOWY4ODJmYTA4Mzc2YSJ9

* 🦉 Updates from OwlBot post-processor

See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md

* docs: added google.api.field_behavior for some fields in audio_config

PiperOrigin-RevId: 546447860

Source-Link: googleapis/googleapis@52535ce

Source-Link: googleapis/googleapis-gen@14c79ff
Copy-Tag: eyJwIjoiLmdpdGh1Yi8uT3dsQm90LnlhbWwiLCJoIjoiMTRjNzlmZjdjYzI4MDM3YTk5MjJkN2UyMjEzMWViY2I1ODdlMTE2NiJ9

* 🦉 Updates from OwlBot post-processor

See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md

---------

Co-authored-by: Owl Bot <gcf-owl-bot[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: datastore Issues related to the Datastore API. type: question Request for information or clarification. Not an issue.
Projects
None yet
Development

No branches or pull requests

3 participants