-
Notifications
You must be signed in to change notification settings - Fork 149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve read performance by using stale reads #1994
Comments
A quick test with the 15s staleness shows very similar numbers ... |
There is an unfortunate implementation detail that transactions will send a begin transaction request, followed by your get document requests. Effectively, that means transactions are sending multiple requests instead of one with the regular get request. We are looking to improve this. The v1 FirestoreClient allows complete access to communication protocol, including ability to set Thank you for the question. Interest in features like this from the developer community helps inform priorities for SDK development. I will be sure to pass this on. Feel free to tell us why this important. |
@tom-andersen Thanks for the provided details 👌 The reason I'm asking is that we are looking into this particular technique for a latency-sensitive service where we want to improve the latency even more. We have already looked into and adopted techniques like caching, optimizing business logic, etc. -- I could imagine the following designs for such a native read-time feature: const firestore = getFirestore();
firestore.settings({
readTime: Timestamp.fromDate(),
}); (For use-cases where you'd want all requests to query at a particular point in time. This would be useful for data recovery scripts, to not having to redefine the read time every time) and/or: getFirestore()
.doc('foo/bar')
.get({
readTime: Timestamp.fromDate(maxDataStaleness),
}) getFirestore()
.collection('foo')
.where('bar', '==', true)
.get({
readTime: Timestamp.fromDate(maxDataStaleness),
}) |
I've quickly implemented a version of this and ran some tests (10k requests) in a Cloud Shell:
Test script
import {Firestore, Timestamp} from '@google-cloud/firestore';
import {createHistogram} from 'perf_hooks';
async function run() {
const firestore = new Firestore({
projectId: '<project>',
});
const histogram = createHistogram();
for (let i = 0; i < 10000; i++) {
const start = performance.now();
const maxDataStaleness: Date = new Date(
new Date().getTime() - 15 * 1000
);
await firestore
.doc('always/the/same/document')
.get({
readTime: Timestamp.fromDate(maxDataStaleness),
});
const end = performance.now();
histogram.record(Math.round(end - start));
}
console.log('min', histogram.min);
console.log('max', histogram.max);
console.log('mean', histogram.mean);
console.log('stddev', histogram.stddev);
console.log('exceeds', histogram.exceeds);
console.log('percentiles', histogram.percentiles);
}
run(); |
Okay, quickly ran another test, that randomly picks a document, instead of reading the same topic all the time (as this may result in a different behavior).
Test script
import {Firestore, Timestamp} from '@google-cloud/firestore';
import {createHistogram} from 'perf_hooks';
async function run() {
const firestore = new Firestore({
projectId: '<project>',
});
const documentIds = await firestore.collection('the/test/collection').listDocuments();
console.log(documentIds.length);
const histogram = createHistogram();
for (let i = 0; i < 10000; i++) {
const start = performance.now();
const maxDataStaleness: Date = new Date(
new Date().getTime() - 15 * 1000
);
const randomDocument = documentIds[Math.floor(Math.random() * documentIds.length)];
await randomDocument.get({
readTime: Timestamp.fromDate(maxDataStaleness),
});
const end = performance.now();
histogram.record(Math.round(end - start));
}
console.log('min', histogram.min);
console.log('max', histogram.max);
console.log('mean', histogram.mean);
console.log('stddev', histogram.stddev);
console.log('exceeds', histogram.exceeds);
console.log('percentiles', histogram.percentiles);
}
run(); Note: I don't get those numbers consistently 🤔 |
Looks like you were able implement the optimization. This is a good test case, where the only difference is Understanding why you see these latencies, is a little beyond SDK support. I am sure there are other customer specific factors in play, such as database size, concurrent writes, warmup. You may want to use Firebase support to get answer specific to your use case: https://firebase.google.com/support/troubleshooter/firestore/queries Can I help you with anything else? |
Follow up for @IchordeDionysos. I asked internally, and was given some explanation: Stale reads have two main values:
In your case, (2) is applicable. You should run the workload (a) without transactions (b) from europe-west4 instead of europe-west1 |
@IchordeDionysos The next release of SDK will have optimization for transactions with |
In the documentation, it is mentioned that stale reads may improve the performance of reading from Firestore as data can be just fetched from the nearest replica without having to reconfirm with the leader replica:
https://firebase.google.com/docs/firestore/understand-reads-writes-scale#stale_reads
I'm using the following code to perform a stale read:
As the data is not changed very often it's fine to have one minute (or even longer) stale content.
But what we are seeing is that the strong reads are faster than the stale reads:
Query used for analysing the logs
I wanted to share this experience with you and maybe I'm doing something wrong here...
Not sure if increasing to the 60s staleness (instead of the 15s) breaks it?
Interesting data:
The text was updated successfully, but these errors were encountered: