Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggestion - insertMany #197

Closed
Mattchewone opened this issue Jun 8, 2017 · 9 comments
Closed

Suggestion - insertMany #197

Mattchewone opened this issue Jun 8, 2017 · 9 comments

Comments

@Mattchewone
Copy link
Contributor

I have been having a play with inserting array of items.

What I was thinking was if we was able to return as a result the list of successful inserts, along with a list of the failures. At the moment only the first error is returned, and there is no way to tell how many did or didn't get inserted.

This also then skips the after hooks completely, which means any of the data that was inserted does not go through the hooks.

In my situation I am bulk insert 500 docs at a time and in an after hook I am inserting into elasticsearch, there could be 499 docs inserted and only one error, but the successful inserts will not be sync'd to elasticsearch due to the hook not running.

I would imagine and this isn't something I have test that the error hook is run, not sure if that will contain the results, but it would mean having the sync hook run in the after and error hook.

Are there plans to integrate the insertMany method of mongoose/mongo (I believe it to be more performant also), as I understand mongo returns the successful _ids and an array of any errors, I would think mongoose wraps this.

Thoughts?

Thanks

@marshallswain
Copy link
Member

marshallswain commented Jun 8, 2017

This would be a good enhancement. @Mattchewone do you want to take a shot at a PR? I'll eventually get to this because it'll fix a couple things for me, too.

@Mattchewone
Copy link
Contributor Author

Thanks @marshallswain. I'll see if I can put a PR together or at least a PoC.

@daffl
Copy link
Member

daffl commented Jun 8, 2017

I think this could be done with https://github.com/feathersjs/feathers-batch but it definitely needs some love.

@Mattchewone
Copy link
Contributor Author

Mattchewone commented Jun 8, 2017

@daffl isn't that more for if multiple different service calls are to be called with one batch call?

I have a script which is running through 50k rows and batching them into the service 500 at a time, then in the after hook I am fetching a flattened ($populated item) into elasticsearch. It's only one endpoint I want to call but a few times.

It is just that the bulk insert has to be low at the minute as at 500 if one of them has a validation error that whole batch of 500 will not hit the after hook and subsequently will not be pushed into elasticsearch.

If you think this could be done with feathers-batch I'd be interested to see what your thoughts are on that with my use case?

@Mattchewone
Copy link
Contributor Author

Ok @marshallswain I may have mis-understood how mongoose wrapped the native insertMany.

Mongodb natively will return the inserted doc count, and how many failures. But mongoose, will still throw an error and only return the first validation failure and will not return the inserted docs. Which wasn't what I was hoping for.

@marshallswain
Copy link
Member

After some discussion in Slack, @Mattchewone and I determined that the best way to implement this would be to

  • Leave the single-item creates the same as they are now.
  • Individually validate each record against the Mongoose model and keep an array of records that failed validation along with accompanying errors.
  • Use MongoDB's model.collection.insertMany instead of Mongoose's model.insertMany
  • Return any failed inserts on a separate key in the response.

This doesn't solve the following problem that Matt mentioned in slack
"Just quickly before I head off, looking at the native driver, if there is a duplicate index item in the data it will insert all items up to that but will stop writing any after that. So there needs to be some work in that, but think it could be done."

@Mattchewone
Copy link
Contributor Author

I have made some progress with this which can been seen here https://github.com/Mattchewone/feathers-mongoose/tree/insert-many

It is failing on the feathers-service-test multiple instance create tests currently. So I may have to re-think the return data. I built it around the feathers-batch style of return data, with this format:

{
  data: [
    [...errors || null],
    [...data || null]
  ]
}

This will return any WriteErrors and ValidationErrors in the first array, and any successful data in the second.

There is one other issue at the minute I am trying to work out, if there is a duplicate index, although the error is returned the success insert is not returned with the successful data, assuming the array of data has 2 items with the same _id in.

@nmajethiya
Copy link

Did anyone found solution for this ?

@Mattchewone
Copy link
Contributor Author

@nmajethiya there is an open PR at the moment - #199

There are a few issues being discussed at the moment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants