Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

64bits integer property are lost #1563

Open
michellemay opened this issue Apr 27, 2021 · 5 comments
Open

64bits integer property are lost #1563

michellemay opened this issue Apr 27, 2021 · 5 comments
Labels

Comments

@michellemay
Copy link

michellemay commented Apr 27, 2021

Some code to reproduce:

Create the class with rowid as an integer:


client = weaviate.Client("http://localhost:8080") # or another location where your Weaviate instance is running

schema = {
  "classes": [
    {
      "class": "Object",
      "vectorizer": "none",
      "properties": [
        {
          "dataType": [ "int" ],
          "name": "rowid"
        }
      ]
    }
  ]
}

print(client.schema.create(schema))

Add some content:

import requests

doc1 = {
    "class": "Object",
    "vector": [1.0, 2.0],
    "properties": { "rowid": 16040291 }
}
doc2 = {
    "class": "Object",
    "vector": [3.0, 4.0],
    "properties": { "rowid": 1604029199751626752  }
}
data = { 'objects': [doc1, doc2] }
result = requests.post('http://localhost:8080/v1/batch/objects', json=data)
print(result.content)

b'[{"class":"Object","creationTimeUnix":1619548348443,"id":"1bf45658-037b-44db-a4d6-b338d214301b","properties":{"rowid":16040291},"vector":[1,2],"deprecations":null,"result":{}},{"class":"Object","creationTimeUnix":1619548348442,"id":"ae9aff5f-59ec-4a9f-86d8-cc48443b59dc","properties":{"rowid":1604029199751626752},"vector":[3,4],"deprecations":null,"result":{}}]\n'

Query the vector:

near_vec = {"vector": [1.0, 3.0] }

res = client \
    .query \
    .get("Object", ["rowid", "_additional {certainty}"]) \
    .with_near_vector(near_vec) \
    .with_limit(5) \
    .do()

results = res["data"]["Get"]["Object"]

for result in results:
    print(result)

will output:

{'_additional': {'certainty': 0.99497473}, 'rowid': 16040291}
{'_additional': {'certainty': 0.97434163}, 'rowid': None}

@etiennedi
Copy link
Member

Thanks for discovering this issue. We'll investigate and get back to you.

@etiennedi etiennedi added the bug label Apr 28, 2021
@etiennedi
Copy link
Member

etiennedi commented Apr 28, 2021

I can confirm this is a bug in Weaviate itself, not in the client used. The 64bit int value is present in the REST API, but not in GraphQL. Will investigate.

@etiennedi
Copy link
Member

etiennedi commented Apr 28, 2021

What we've learned so far:

  • The GraphQL Spec defines int as int32 and explicitly does not support int64
  • In line with the spec the graphql-go implementation ignores any int64 value that is outside the bounds of an int32. It simply returns nil in that case, which translates to the python client's None value, you're seeing.
  • It is possible to add custom Scalar Types to GraphQL, but we have yet to check what effect that has on standard-tooling which might not understand those custom types
  • A short-term workaround that I saw mentioned in many GraphQL issues is to use a string instead.
  • There are a few npm packages for standard GraphQL tooling, e.g. Apollo, however even they support at max an int53 or an experimental, not widely supported int63 type (That's a combination of letters and numbers I never expected to type) as that's the Javascript limitation. Since a ton of GraphQL tooling is built in JS/NodeJS, it seems there is no standard way to get an actual int64 into GraphQL without breaking most of the tooling. This means the "use a string instead" recommendation which is also explicitly mentioned in the GraphQL spec really seems the best option.
  • At the very least we need to highlight this limitation in our docs (cc @laura-ham)

@samos123
Copy link
Contributor

I can't seem to get this to work with pure REST API either:

curl -X POST -H 'Content-Type: application/json' -d '{
      "class": "Article",
      "properties": {
          "title": "Large int64",
          "wordCount": 9223372036854775807
      }
  }' http://localhost:8080/v1/objects
{"class":"Article","creationTimeUnix":1669085839830,"id":"dc420c7f-8962-4ddf-a587-8ee5a873b1c2","lastUpdateTimeUnix":1669085839830,"properties":{"title":"Large int64","wordCount":9223372036854775807}}

curl http://localhost:8080/v1/objects/dc420c7f-8962-4ddf-a587-8ee5a873b1c2 | jq           
{
  "class": "Article",
  "creationTimeUnix": 1669085839830,
  "id": "dc420c7f-8962-4ddf-a587-8ee5a873b1c2",
  "lastUpdateTimeUnix": 1669085839830,
  "properties": {
    "title": "Large int64",
    "wordCount": 9223372036854776000
  },
  "vectorWeights": null
}

notice how storing 9223372036854775807 became 9223372036854776000 after doing a get REST API request. I suspect int isn't really stored as an int64 or somewhere along the road it becomes an float64?

This is my schema:

{      
        "class": "Article",
        "description": "A description of this class, in this case, it is about authors",
        "properties": [
            {
                "dataType": [
                    "string"
                ],
                "name": "title"                                   
            },
            {                                                     
                "dataType": [
                    "int"                                         
                ],                                                
                "name": "wordCount"                                                                                                 
            },                                                    
            {                                                     
                "dataType": [                                     
                    "string"                                                                                                        
                ],                                                
                "name": "content" 
            }
        ]                                                                                                                           
    }

@aliszka
Copy link
Member

aliszka commented Nov 24, 2022

Thank @samos123 for reporting the issue and a reproducible example.
I can confirm that weaviate returns rounded numbers and your suspicion is correct. While int64 values are stored correctly, it is reading/unmarshalling process that converts them to float64. Those converted values are then marshalled (ending up being rounded) and send back in the responses.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants