findOne with nested refpath in large array of subdocuments causes out of memory exception #10983

jessgoldq4 · 2021-11-17T16:25:26Z

Do you want to request a feature or report a bug?

Bug

What is the current behavior?

A findOne() invocation, paired with a dynamic populate() call against a document with an array of subdocuments that contain a nested refpath, 'arrayName.typeField', causes an out of memory exception. This particularly happens when the subdocument array contains a large number of records (i.e. 2100). I don't fully understand the logic, so I will just list my findings

------

During the Model.populate('arrayName.idField') -> getModelsMapForPopulate() -> _getModelNames() method invocation, the array of modelNames returned contains 2100 elements, each of which with a string value of the referred Model . Down the line, this causes the addModelNamesToMap() method to create a map with a very large memory footprint, where an individual model in the map has a total of 2.1 million records in it's allIds property, which just seems wrong:

Each of the arrays in allIds are copies of each other, which brings into question why we need them all. In this scenario, we ultimately see an out of memory exception during an invocation of Model.populate('arrayName.idField') -> _done() -> _assign -> utils.clone(mod.allIds):

This does not seem right. If we return the modelNames array from _getModelNames() with just 3 elements, 'Model1', 'Model2', and 'Model3', populate('arrayName.idField') seems to produce the correct results without this huge amount of memory usage. Maybe I don't really know what I'm talking about, but this seems like overkill

If the current behavior is a bug, please provide the steps to reproduce.

Pre-requisite: 1000+ elements exist in the document's 'items' array

const SampleSchema = new mongoose.Schema(
  {
    name: String,
    items: [{
      itemId: {
        type: mongoose.Schema.Types.ObjectId,
        required: true,
        refPath: 'items.type'
      },
      type: {
        type: String,
        required: true,
        enum: ['Model1, Model2, Model3']
      }
    }],
 },
 {
    timestamps: {
      createdAt: 'create_date',
      updatedAt: 'update_date'
    }
 }
)

const SampleModel = mongoose.model('Sample', SampleSchema)

async.waterfall([
      (cb) => {
        SampleModel
          .findOne({
            _id: mongoose.Types.ObjectId(id),
            _organization: mongoose.Types.ObjectId(params.organizationId)
          })
          .populate('links.item')
          .lean()
          .exec(cb)
     },
     (sample, cb) => {
	// process data here
     }
], completedCallback)

What is the expected behavior?

The above code executed with the pre-requisite number of subdocuments does not blow up the heap

What are the versions of Node.js, Mongoose and MongoDB you are using? Note that "latest" is not a version.

NodeJs: 12.22.6
Mongoose: 6.0.13
MongoDB: 5.0.2

The text was updated successfully, but these errors were encountered:

vkarpov15 · 2021-12-19T19:29:03Z

I can confirm that the below script is much slower and takes much more memory than expected. We're investigating but haven't figured anything out yet.

'use strict';
  
const mongoose = require('mongoose');

const { Schema } = mongoose;

run().catch(err => console.log(err));

async function run() {
  await mongoose.connect('mongodb://localhost:27017/test', {
    useNewUrlParser: true,
    useUnifiedTopology: true
  });
  
  await mongoose.connection.dropDatabase();
  
  const SampleSchema = new mongoose.Schema({
    name: String,
    items: [{ 
      itemId: {
        type: mongoose.Schema.Types.ObjectId,
        required: true,
        refPath: 'items.type'
      },
      type: { 
        type: String,
        required: true,
        enum: ['Model1', 'Model2', 'Model3']
      }
    }],
  });
  
  const SampleModel = mongoose.model('Sample', SampleSchema);
  const Model1 = mongoose.model('Model1', Schema({ name: String }));
  
  const doc = { name: 'test', items: [] };
  for (let i = 0; i < 2100; ++i) {
    const { _id } = await Model1.create({ name: 'test' + i });
    doc.items.push({ itemId: _id, type: 'Model1' });
  }

  await SampleModel.create(doc);
  console.log('Created');

  await SampleModel.findOne().populate('items.itemId');
  console.log('Memory Usage:', process.memoryUsage().heapUsed / (1024 ** 2));
}

Re: #10983

vkarpov15 added this to the 6.0.16 milestone Nov 18, 2021

vkarpov15 added the performance label Nov 18, 2021

vkarpov15 closed this as completed in c892db0 Dec 20, 2021

vkarpov15 added a commit that referenced this issue Oct 15, 2024

fix(populate): handle array of ids with parent refPath

304c791

Re: #10983

vkarpov15 mentioned this issue Oct 15, 2024

fix(populate): handle array of ids with parent refPath #14965

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

findOne with nested refpath in large array of subdocuments causes out of memory exception #10983

findOne with nested refpath in large array of subdocuments causes out of memory exception #10983

jessgoldq4 commented Nov 17, 2021 •

edited

Loading

vkarpov15 commented Dec 19, 2021

findOne with nested refpath in large array of subdocuments causes out of memory exception #10983

findOne with nested refpath in large array of subdocuments causes out of memory exception #10983

Comments

jessgoldq4 commented Nov 17, 2021 • edited Loading

vkarpov15 commented Dec 19, 2021

jessgoldq4 commented Nov 17, 2021 •

edited

Loading