-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Total counts in GraphQL queries #405
Comments
There is a workaround for this and a reason why it is not automatically included as part of amplify - DynamoDB does not have an efficient way to count items, and can't in general perform aggregate operations. If you want to count items, write a custom lambda to read from the dynamo table stream and conditionally increment or decrement a counter on the table. Add a lambda resolver to read this counter value from the table, and add it to your schema. Because dynamo can't perform aggregate operations, and you potentially want to get item counts after a filter expression is applied, having the cli provide this feature is a tall order. |
Yeah, I aware of DynamoDB limitations, but asking the developer to develop correct counts is also a tall order. As I can have any combination of filters, I would need to count based on that and that seems like a huge implementation. A count of items is something most apps use, so while I may be able to develop something that works decently, this feature request really is about either DynamoDB fixing this or Amplify creating some kind of sensible workaround that does not put all the workload on the developer. Thanks for your feedback, the things you outline has been in my thoughts in relation to a workaround on this, but trying to come up with a sensible workaround that respects any type of filter on any table seems like something I could spend a week or two developing and may still be prone to wrong counts in edge situations. Maybe ElasticSearch has counts of total results, even if there's a limit applied and this could be included in results as a decent workaround? We use ElasticSearch for most queries, as the default filters are too limited anyways. |
Elastic does provide this information, you can access it in the resolver template with: $ctx.result.hits.total. There is a difficult disconnect between graphql and dynamodb development mindset. Dynamo DB requires planning of most access patterns in the design phase, while graphql makes an iterative approach seem straightforward. While amplify backed by dynamodb does offer some flexibility, it does require more ahead of time planning than other platforms, in this case with aggregates. If you know your aggregates now, and know they are stable, development of lambdas is doable. Solving the general case sounds much more difficult as long as your data is backed by dynamodb. I agree that it would be helpful to have templates to speed up creation of lambda aggregate creators (including backfilling data). |
That sounds promising. Any hints on how I could create a |
Elasticsearch by default returns the total hit count, and by default its accurate up to around 10,000. So if you make a rest api call in your lambda you will get this back as a response. Just update you schema to include it as a property. You will likely need a new type to hold the items, nextToken, and total property. |
I do think that In terms of DynamoDB support, I'm not too sure. I mean it would be nice to have, but if you have a large dataset just getting the count on every request will be expensive. Because every time you go over 1 MB of data, you have to paginate through to get totals. In reality, you could scan through a table made up of millions of records before you can get a final result, which obviously makes no sense even in terms of performance. |
If we were to build this into Amplify in the future with DynamoDB, could you give a bit more detail on some things:
Any other specifics on your requirements or use cases would help us look into this in the future. |
I would say it would have to be the total count of items matching a query. This would be primarily needed for pagination purposes, it's very hard to give a good user experience if we are supposedly showing X results at a time out of Y number of records. This would also need to work with custom indexes, as most likely that's the most efficient setup when listing tables unless we are using searchable. I wouldn't mind if there are additional lambda functions added to the GraphQL Query, however, I think it would be wise if we are able to choose whether to request/run this or not. If we are paginating across 1000 results with 10 results a time, I don't think we need to necessarily run the total length query 10 times, it could be that the first request is enough. Also, there could be instances where I wouldn't care much about the count and as a result wouldn't want to run the lambda function for no reason. Potentially another way to go around this could be to have a 'cache'/dynamo table store for counts; with rules of how long count results may be valid for. I hope that makes sense. |
I agree with @jonmifsud — the main interest for us is to have a count of the filtered query, so we can do a proper pagination and show the total results of that query. A full count would be nice, but if you do a normal query, shouldn't that result in a count of all items? Also, maybe a way to just return the count would be nice instead of having the count as a part of the result dataset would be nice. I also would like to be able to choose when to use this and when to not, to avoid extra processes running with load time and costs overhead. |
You should be able to do pagination using |
Associating the word "impossible" for a query count in 2019 makes me cringe a bit. And more than that, it makes me wonder if selecting Amplify and all its (current) dependents was a very wrong choice. The fact that DynamoDB does not do counts for its queries (besides a full table estimated count every ~6 hours) is simply a limitation the team working on DynamoDB should solve. Every single competitor to DynamoDB handles this without issues, so I'm sure those smart people can also come up with a solution that does not just benefit AppSync and Amplify users, but also people using DynamoDB directly. Maybe it will be near correct counts if millions and more precise when thousands like MySQL / InnoDB, and that would be way way better than having no clue whatsoever. I am aware that using the nextToken I can make pagination but that paginator is somewhat less cool to look at from a UX perspective as I won't be able to show 1, 2, 3, 4....12 because I don't know how many pages I have. When someone wants to know how many items to we have fitting this filter, it cannot be that I have to pull them all out (only the id field) in the leanest way, and then count the array client-side? I'm sure AWS compares themselves in some ways with other GraphQL services like Prisma etc. and they don't seem to have a problem supporting this. This is a DynamoDB limitation. Attacking a solution on top of that for AppSync is the wrong angle, this needs to end on the table of AWS DynamoDB developers so they can come up with a sensible solution nearer to the root of the problem — everything else is a hack. Asking me to keep counts in a model/table myself when things update is even worse and not what you'd expect of a platform with an otherwise impressive feature set. And if it's not possible for DynamoDB to solve this, then the Amplify / AppSync team should start considering built-in support for other major database players such as MongoDB, MySQL, Postgres, etc. so they are not being held down by a half-baked database that is backing the entire thing, but when that is considered, I am sure it looks way more interesting to just figure out a solution to counts and other minor limitations DynamoDB currently has. |
@undefobj What I would like to see to support this type of feature is to utilise kinesis firehose and glue to send data in Redshift or S3 Parquet. Then I could connect to Athena as a serverless analytics system, query Redshift for a non-serverless solution, or have another lambda pick up the objects from S3 and send them into Postgres. Amplify is well placed here as the api category can pick up on schema changes to re-run glue, amplify can make life easier creating a new firehose for each table, and setup the lambdas to put dynamo stream data into firehose. I realise its a big ask, but the question can be generalised into "how can amplify better support analysis workloads" I see a smaller and faster win on just providing an aggregation template lambda to deal with uncomplicated counters. I would not like aggregation counters enabled by default, as I don't want to pay for what I don't use. |
We do support Kinesis Streams in the Analytics category today: https://aws-amplify.github.io/docs/js/analytics#using-amazon-kinesis This seems to be an independent ask of the issue in this thread though as this is related to ingest. Total counts in a decoupled systems wouldn't be accurate on the pagination against DynamoDB as you'd run into consistency issues. For the analytics requests I think this would be a good ask in a separate Feature Request issue. |
I wasn't as interested in strong consistency or using it for pagination, but the total count calculation and getting aggregations in general. The ask seems to be about bringing other database system capabilities into amplify, which is where ddb stream -> firehose would come in handy as a building block. |
This got sidetracked by different ideas, but I'd like to know if we can expect total counts for at least Elasticsearch based queries soon? @undefobj |
It seems like aws-amplify/amplify-cli#2600 made aws-amplify/amplify-cli#2602 happen, so here's to hoping that PR gets accepted fast. |
@houmark isn't that PR related to For example, a user has many books in their library, and I want to show on their profile page that how many books are there in their library. I don't want to load all of their books, just to count it. It might be thousands. |
Yeah, I first thought that PR would solve it due to the original code that was halfway baked in at the time. But that PR changed to just show the total of the returned items which is more or less useless because it's very easy to count the amount of results returned client side. I don't think the Amplify team is working on a total results value due to the limitations and complications of passing through |
@houmark I wouldn't call it "useless" since if you're getting paginated results say 10 results at a time, you might want to get the total hits (if you have more than 10 results) to display information like the number of pages on your UI. |
@kaustavghosh06 But how can I do that? I get 10 when I If I try to pull out 1000 then I hit another limitation which is the result set being too large (don't remember the exact error, but it errors, and if I query a lot but not too much to hit the limit, then it takes, of course, many seconds for result which in a UI leads to racing situations, and I cannot cancel queries due to another limitation, which has a PR but that PR has been stuck now for a good long time). Anyways, If I have 3k items, then it will still give me 1000 in Am I missing something here? |
@houmark Do you have @auth on @searchable? |
I'll speak to @SwaySway about this behavior. |
That was what I expected also, but this changed in the PR because it would leak data about the total results if See his comment here: aws-amplify/amplify-cli#2602 (comment) |
If you have records that are in the 15,000 or less range, a potential solution would be to grab the first set of results and do a "pagination" query on a separate GSI for the whole set of records Won't that Query for the whole set of records be expensive? Then do 1 query for all the results. 1 record would only be about 75 bytes and 1 query can return up to 1MB. 1MB / 70 bytes is about 14-15K records. Now you have a whole set of all the keys with indexes and when you want to query a specific page, just grab the PK/SK of the page index from your "pagination" query @naseer036 could this work for you? |
I could be mistaken but I thought Elasticsearch returns
|
Yes @duwerq - it does and that's what I've been using for a couple of years - works brilliantly. But as I said before, you need to implement your security (auth) model in the request resolver to ensure your response resolver is giving accurate totals. |
Okay so you’re saying if you rely on the auth to filter records in the VTL request, it will give a an innaccurate total. Are you suggesting any auth params also be passed into the must match of Elasticseearch? |
Hi @duwerq - I was trying to say the same thing - must have miscommunicated. I mean you can't rely on auth in in your graphql schema - you need to make sure the request resolver (VTL) filters based on the same parameters as your auth. I use a lot of dynamic auth for example so make sure the elasticsearch query is exactly the same so the response resolver totals are the same (and also that search is not returning anything it shouldn't). By the way, I find auth pretty unreliable across the board - particularly dynamic auth - so implement security in the response resolvers for DynamoDB too to accurately return what I need. For example, subscriptions don't (or didn't) work with dynamic auth. This brings up another point, security in the DynamoDB resolvers is implemented in the response resolvers and they use a for loop for that. There is an inherent limitation in appSync of 1000 iterations in a for loop. So any count operation you would do with AppSync on DynamoDB is going to have to paginate 1,000 records at a time. |
This thread has now got the length where posters are repeating points covered way way way back in time. Can we either kill this one off or at least give us a glimmer of hope that it might actually get some response from the Amplify Core team? Leaving it just hanging like this for so many years is just painful. |
I'm stopping dev on amplify about 5 days in for my project because of this. It's a basic function of any db, and the lockin with Dynamo isn't worth these issues. Very valuable info thanks 🙏 |
Hi folks - I wanted to follow-up on this thread to give you all more insights on where we're at with this issue and many other GraphQL transformer issues. Over the past several months, we've been laying the foundation to drastically accelerate our GraphQL feature & enhancement velocity. A new architecture detailed in our GraphQL Transformer vNext RFC aws-amplify/amplify-cli#6217. More updates will come in the next two months on a preview version for broader public testing. Once we've delivered on that architecture revamp, we'll be looking into this issue as the top-most feature request priority. As highlighted in prior comments this problem is hard, especially considering scale, by-default best-practice security, and cost-efficiency but we are determined to solve it. I also want to thank the entire community here for your passion and continuous feedback on this issue. Your energy and feedback is what motivated us to do the revamp of our GraphQL transformer in order to deliver features faster and provide more customizability to help make you more successful. |
If someone is having a issue with pagination and filter on and looking a backend as a service then it would be a good idea to look at strapi. I was able to get my backend running in 1 day with graphql with almost all the filter you can think of with pagination with Goto page as well. |
Also checkout Nhost.io, we are using that as our backend with some of our new frontend clients in Amplify and love it |
Hey folks! We've included the In the new GraphQL transformer, we add a user authorization filter in the OpenSearch query request. (In the past, filtering on the OpenSearch result set happened after the fact.) Try the PREVIEW (do not use this in your production environment) with the following instructions: aws-amplify/amplify-cli#6217 (comment) |
But what about mentioning these issues with @searchable? I guess they are almost all related to costs and they did not go away, simply by closing them: • aws-amplify/amplify-cli#3860 (closed for some reason) Based on that it would seem that @searchable costs 70 dollars per month for each environment in the cloud, which is not quite friendly for developers who are working in small agencies, or devs who are creating just MVPs. (Even serious solo developers prefer to use at least 2 envs: dev & prod which would cost 140$ per month in total - is that correct, please? Well, I am not sure what my question should look like... but really, we should pay AWS 140$ per month if we just need to get counts from the database in 2 environments in 2021 to run ES/Opensearch? Just for testing MVP in the real world? Or should we fork the Amplify like this: https://github.com/starpebble/amplify-cli to target free Elastic instances, or modify it to ignore @searchable, or even route calls to potentially target different, more dev-and-cost-friendly services, like Algolia? e.g. In Firestore (I guess it's one of the many Amplify competitors), you can use transactions & distributed counters - e.g. whenever you create 'Post', a property called 'totalPosts' will be updated(increased) somewhere else (e.g in the owner entity and/or in the table that hosts data for all Posts). The same update(decrease of the counter) can occur on delete operation. From 2018 - https://aws.amazon.com/blogs/aws/new-amazon-dynamodb-transactions/, we can use transactions also with DynamoDB + from 2019, also in Appsync. But for some obviously secret reason, if you use
command to run the API emulator locally, you are not allowed to use either transactWrite items, or batchWrites - features that were announced 3 years ago by AWS itself. To my surprise, the pull-request that solves that was created 1 year ago - and it's still not merged. • aws-amplify/amplify-cli#5574 Also, @searchable is not compatible (yet) with mocking API locally, on the local machine: • aws-amplify/amplify-cli#5981 (also, closed) Well, there is this RFC: aws-amplify/amplify-cli#7546 (with the last comment on Jul 31) - but the problem is, that it's still RFC. I am convinced that local mocking that works somehow with @searchable should be a part of @searchable from the start. In preview - aws-amplify/amplify-cli#6217 - that you are also mentioning above, there is no mention of transactWrite, nor batchWrite that could be used for such counters. Or am I missing something? |
@majirosstefan - All the things you mentioned are still on our radar. The new GraphQL Transformer is our first step. Many of the @searchable RFC items are actually included as part of the preview. A more efficient VTL-based approach is still on our radar and we'll be working on it post the launch of the new GraphQL Transformer. While the Amplify team can't directly change the cost structure of OpenSearch service, we plan on allowing local mock of OpenSearch to help customer test locally first before deciding to deploy. Regarding the #165, we've now added warnings to the customer to follow the OpenSearch-provided best practices guidelines for production configurations. |
@renebrandel it might be useful to make an option such that we don't spin up new opensearch instances automatically when creating new environments...basically just redirect all the queries to "not-implemented in this environment" or something...this would save costs when many environments are being spun up for testing or development purposes...and again, the ideal way to do it would be to allow for putting all environments opensearch on one instance...I would rather have one huge open search instance than many small ones from a cost perspective... |
To those still interested in this issue, I've made a package that provides a |
so its not really clear, is it based on scans? |
Currently it uses scans, which is believe is also how |
As of transformer v2, things has changed, a new directive has been implemented to add this functionality. |
What do you mean? Which one? @searchable ? |
Hi there,
I need to unsubscribe from these emails, as Stephen has passed away.
I don't know how login for GitHub, so could you please help me unsubscribe
him?
Any help would be appreciated.
Kind Regards
Megan Hagar (wife of Stephen)
…On Wed, 1 May 2024, 12:49 am biller-aivy, ***@***.***> wrote:
As of transformer v2, things has changed, a new directive has been
implemented to add this functionality. Please refer to the official docs
https://docs.amplify.aws/vue/build-a-backend/graphqlapi/search-and-result-aggregations/
What do you mean? Which one? @searchable <https://github.com/searchable> ?
This is not really a solution. I understand that this is an dynamoDB
problem instead an amplify problem.
—
Reply to this email directly, view it on GitHub
<#405 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABQROCCIDD4OBM4KGKSWZP3Y76VQVAVCNFSM5WGKT26KU5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TEMBYGU2TKMJVGEZA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Hi Megan, I'm truly sorry for your loss. Our hearts go out to you and your family during this difficult time. I'll do the best of my ability to turn off your email notifications from this thread but it'll likely not affect all other threads' notifications. However, GitHub does provide a policy in case a user passes away. This might offer a solution for unsubscribing from GitHub entirely. Please know that I'm here to support you in any way I can. Warm regards, |
Is your feature request related to a problem? Please describe.
I think it's way past due that Amplify supports total counts in GraphQL queries. Other GraphQL based platforms has this built in by simply adding
totalCount
(or similar) in the query and no matter the limit, they'll get back the total count in addition to the (filtered) data.Describe the solution you'd like
This should at least work for DynamoDB backed models and of course also for search based queries that pass by ElasticSearch.
Describe alternatives you've considered
Making a Lambda function that is a field in each model using the
@function
directive, but since we are both usinglistItems
andsearchItems
with filters added, the implementation is not simple as we have to reapply those filters on the lambda function for getting the correct count.Making custom resolvers seems like another not very fun route and not very scaleable or maintainable, and once again, this should be a "out of the box one liner" to have available as a developer. With either a Lambda or some other custom resolver I'm looking at hours or days of development.
Additional context
This is a must have feature and there's not really any workaround for displaying total counts for systems with many items — at least that I know of. I read several bug reports, but none of them seems to have a simple solution. That it has not yet been developer by AWS is beyond my understanding, as pulling out counts is one of the most common things to do when developing web apps.
The text was updated successfully, but these errors were encountered: