Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(NODE-5691): make findOne() close implicit session to avoid memory leak #3889

Merged
merged 8 commits into from
Oct 18, 2023

Conversation

vkarpov15
Copy link
Contributor

@vkarpov15 vkarpov15 commented Oct 11, 2023

Description

I found this potential issue while troubleshooting Automattic/mongoose#13829. Running the following Mongoose script:

import v8 from 'node:v8';
import mongoose from 'mongoose';

await mongoose.connect('mongodb://127.0.0.1:27017/mongoose_test');
  
const testSchema = new mongoose.Schema(
    {
        data: String,
        createdAt: { type: Date, default: Date.now },
    },
    {
        timeseries: {
            timeField: 'createdAt'
        },
        expireAfterSeconds: 60 * 60 * 24 * 7
    }
);

const TestModel = mongoose.model('Test', testSchema);

await TestModel.deleteMany({});
const docs = [];
for (let i = 0; i < 1000; ++i) {
    docs.push({ data: '0'.repeat(1024 * 2 * 10) });
}
await TestModel.create(docs);
console.log('Created docs');

let start = Date.now();
let count = 0;

await TestModel.deleteMany({});
const docs = [];
for (let i = 0; i < 1000; ++i) {
    docs.push({ data: '0'.repeat(1024 * 2 * 10) });
}
await TestModel.create(docs);
console.log('Created docs');

let start = Date.now();
let count = 0;

while (++count) {
    const res = await TestModel.findOne({}).lean().exec();
    if (count % 1000 === 0) {
        console.log(`{ x: ${Date.now() - start}, y: ${v8.getHeapStatistics().used_heap_size} },`);
    }
    await new Promise(resolve => setTimeout(resolve, 1));
};

Using --max-heap-size to constrain memory, without this PR, leads to OOM:

$ node --max-old-space-size=42 gh-13829.mjs 
Created docs
{ x: 3546, y: 40780432 },
{ x: 7021, y: 26150200 },
{ x: 12514, y: 26803160 },
{ x: 15310, y: 27629264 },
{ x: 155712, y: 28427984 },
{ x: 158292, y: 30820408 },
{ x: 168100, y: 33326456 },
{ x: 256559, y: 35073952 },

<--- Last few GCs --->
al[1079899:0x5b19830]   257664 ms: Mark-sweep 33.9 (69.3) -> 33.6 (69.3) MB, 1.5 / 0.0 ms  (+ 5.5 ms in 76 steps since start of marking, biggest step 0.2 ms, walltime since start of marking 9 ms) (average mu = 0.376, current mu = 0.388) finalize incremental[1079899:0x5b19830]   257676 ms: Mark-sweep 34.1 (69.3) -> 33.7 (69.1) MB, 1.8 / 0.0 ms  (+ 5.9 ms in 81 steps since start of marking, biggest step 0.2 ms, walltime since start of marking 10 ms) (average mu = 0.378, current mu = 0.379) finalize incrementa

With this PR:

$ node --max-old-space-size=42 gh-13829.mjs 
Created docs
{ x: 3839, y: 39456136 },
{ x: 7508, y: 19674968 },
{ x: 10814, y: 20988120 },
{ x: 13826, y: 19179712 },
{ x: 16398, y: 19229168 },
{ x: 18885, y: 19254040 },
{ x: 21368, y: 19296360 },
{ x: 23841, y: 19218352 },
{ x: 26327, y: 19381648 },
{ x: 28803, y: 19321888 },
{ x: 31316, y: 19338768 },
{ x: 33770, y: 19400056 },
{ x: 36273, y: 19374016 },
{ x: 38757, y: 19314128 },
{ x: 41267, y: 19250920 },
{ x: 43754, y: 19443352 },

It looks like the issue is that, while toArray() and asyncIterator call close(), findOne() does not. And that means findOne()'s implicitly created session never ends, leading to what looks like a memory leak.

What is changing?

Explicitly close() cursor in findOne() in order to ensure that the driver calls endSession() on the implicitly created session.

Is there new documentation needed for these changes?

No

What is the motivation for this change?

See description

Release Highlight

Fixed leak in findOne() under specific conditions

When running a findOne against a time series collection due to the way the server returns the resulting cursor information the driver left the implicit session for the cursor un-ended. Now the cursor will always be cleaned up regardless of the outcome of the find operation.

Double check the following

  • Ran npm run check:lint script
  • Self-review completed using the steps outlined here
  • PR title follows the correct format: type(NODE-xxxx)[!]: description
    • Example: feat(NODE-1234)!: rewriting everything in coffeescript
  • Changes are covered by tests
  • New TODOs have a related JIRA ticket

@nbbeeken nbbeeken changed the title fix: make findOne() close implicit session to avoid memory leak fix(NODE-5691): make findOne() close implicit session to avoid memory leak Oct 16, 2023
nbbeeken
nbbeeken previously approved these changes Oct 17, 2023
Copy link
Contributor

@nbbeeken nbbeeken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @vkarpov15, thanks, I threw in a simple test to prevent regression, thanks for the help.

Just sharing some context: I investigated this issue it turns out it is specifically when the collection is timeseries in a 7.0+ server because of how it returns the cursor document. Nonetheless, we do want the driver to clean up the cursor resource and not rely on a server result to hit our cleanup logic.

return this.find(filter, options).limit(-1).batchSize(1).next();
const cursor = this.find(filter, options).limit(-1).batchSize(1);
const res = await cursor.next();
await cursor.close();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling close without try/finally here I believe is correct, a cursor that receives an error always calls close. Also, calling close unconditionally makes sense because close already includes the logic to no-op when the cursor has already been closed.

@nbbeeken nbbeeken self-assigned this Oct 17, 2023
@nbbeeken nbbeeken added the Team Review Needs review from team label Oct 17, 2023
test/integration/crud/crud_api.test.ts Outdated Show resolved Hide resolved
@nbbeeken nbbeeken requested a review from dariakp October 18, 2023 17:06
dariakp
dariakp previously approved these changes Oct 18, 2023
dariakp
dariakp previously approved these changes Oct 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Team Review Needs review from team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants