Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

findOne with nested refpath in large array of subdocuments causes out of memory exception #10983

Closed
jessgoldq4 opened this issue Nov 17, 2021 · 1 comment
Milestone

Comments

@jessgoldq4
Copy link

jessgoldq4 commented Nov 17, 2021

Do you want to request a feature or report a bug?

Bug

What is the current behavior?

A findOne() invocation, paired with a dynamic populate() call against a document with an array of subdocuments that contain a nested refpath, 'arrayName.typeField', causes an out of memory exception. This particularly happens when the subdocument array contains a large number of records (i.e. 2100). I don't fully understand the logic, so I will just list my findings

------

During the Model.populate('arrayName.idField') -> getModelsMapForPopulate() -> _getModelNames() method invocation, the array of modelNames returned contains 2100 elements, each of which with a string value of the referred Model . Down the line, this causes the addModelNamesToMap() method to create a map with a very large memory footprint, where an individual model in the map has a total of 2.1 million records in it's allIds property, which just seems wrong:

count undefined

Each of the arrays in allIds are copies of each other, which brings into question why we need them all. In this scenario, we ultimately see an out of memory exception during an invocation of Model.populate('arrayName.idField') -> _done() -> _assign -> utils.clone(mod.allIds):

RUNNING

This does not seem right. If we return the modelNames array from _getModelNames() with just 3 elements, 'Model1', 'Model2', and 'Model3', populate('arrayName.idField') seems to produce the correct results without this huge amount of memory usage. Maybe I don't really know what I'm talking about, but this seems like overkill


If the current behavior is a bug, please provide the steps to reproduce.

Pre-requisite: 1000+ elements exist in the document's 'items' array

const SampleSchema = new mongoose.Schema(
  {
    name: String,
    items: [{
      itemId: {
        type: mongoose.Schema.Types.ObjectId,
        required: true,
        refPath: 'items.type'
      },
      type: {
        type: String,
        required: true,
        enum: ['Model1, Model2, Model3']
      }
    }],
 },
 {
    timestamps: {
      createdAt: 'create_date',
      updatedAt: 'update_date'
    }
 }
)

const SampleModel = mongoose.model('Sample', SampleSchema)

async.waterfall([
      (cb) => {
        SampleModel
          .findOne({
            _id: mongoose.Types.ObjectId(id),
            _organization: mongoose.Types.ObjectId(params.organizationId)
          })
          .populate('links.item')
          .lean()
          .exec(cb)
     },
     (sample, cb) => {
	// process data here
     }
], completedCallback)

What is the expected behavior?

The above code executed with the pre-requisite number of subdocuments does not blow up the heap

What are the versions of Node.js, Mongoose and MongoDB you are using? Note that "latest" is not a version.

NodeJs: 12.22.6
Mongoose: 6.0.13
MongoDB: 5.0.2

@vkarpov15 vkarpov15 added this to the 6.0.16 milestone Nov 18, 2021
@vkarpov15
Copy link
Collaborator

I can confirm that the below script is much slower and takes much more memory than expected. We're investigating but haven't figured anything out yet.

'use strict';
  
const mongoose = require('mongoose');

const { Schema } = mongoose;

run().catch(err => console.log(err));

async function run() {
  await mongoose.connect('mongodb://localhost:27017/test', {
    useNewUrlParser: true,
    useUnifiedTopology: true
  });
  
  await mongoose.connection.dropDatabase();
  
  const SampleSchema = new mongoose.Schema({
    name: String,
    items: [{ 
      itemId: {
        type: mongoose.Schema.Types.ObjectId,
        required: true,
        refPath: 'items.type'
      },
      type: { 
        type: String,
        required: true,
        enum: ['Model1', 'Model2', 'Model3']
      }
    }],
  });
  
  const SampleModel = mongoose.model('Sample', SampleSchema);
  const Model1 = mongoose.model('Model1', Schema({ name: String }));
  
  const doc = { name: 'test', items: [] };
  for (let i = 0; i < 2100; ++i) {
    const { _id } = await Model1.create({ name: 'test' + i });
    doc.items.push({ itemId: _id, type: 'Model1' });
  }

  await SampleModel.create(doc);
  console.log('Created');

  await SampleModel.findOne().populate('items.itemId');
  console.log('Memory Usage:', process.memoryUsage().heapUsed / (1024 ** 2));
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants