Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add children attribute feature + Groovy test case (RiverMongoChildrenTes... #106

Closed
wants to merge 2 commits into from

Conversation

pablomolnar
Copy link

Hi Richard, I've implemented a new feature in the river to allow index an array of a mongodb document. Each element of the array will be indexed as a single document in elasticseach.

E.g.: At my work we have a purchase order composed by many items:

Order.json

    {
        ...
        "items":[
           { "name": "Item #1", ... },
           { "name": "Item #2", ... },
           { "name": "Item #3", ... }
        ]
    }

Due some business requirements we need and index of all order items. This now is possible creating a river with the children attribute.
Have a look to the RiverMongoChildrenTest to see more details and spec. I wrote the test cases using Groovy and GMongo (way cleaner and less code than using Java).

@buildhive
Copy link

Richard Louapre » elasticsearch-river-mongodb #46 UNSTABLE
Looks like there's a problem with this pull request
(what's this?)

@buildhive
Copy link

Richard Louapre » elasticsearch-river-mongodb #47 SUCCESS
This pull request looks good
(what's this?)

@buildhive
Copy link

Richard Louapre » elasticsearch-river-mongodb #48 FAILURE
Looks like there's a problem with this pull request
(what's this?)

@buildhive
Copy link

Richard Louapre » elasticsearch-river-mongodb #49 UNSTABLE
Looks like there's a problem with this pull request
(what's this?)

@richardwilly98
Copy link
Owner

@pablomolnar I believe this type of data transformation should probably not be available in the river but provided by script filter [1](there is actually a Groovy script filter available).

But unfortunately the data object model provided to the script filter does not allow yet to return multiple documents.

  • The current implementation provides a context object "ctx" to the script filter with 2 properties: document and operation.
  • The new implementation should provide a context object "ctx" with 1 property documents.
    • documents is an array of object
    • Each object contains
    • 2 required properties: data (the actual mongo document) and operation (i: insert, u: update or d: delete)
    • few optional properties:
      - _index (ES document index)
      - _type (ES document type)
      - _parent (ES parent document)
      - _routing (ES routing document)
      - ignore (ignore the document)
      - delete (boolean delete document if true)

So in your specific scenario:

ctx.documents = [
   {
      operation: "i",
      data:  {
         _id: "123456"
         name: "Pablo",
            tweets: [
               [_id: "51c8ddbae4b0548e8d233181", text: "foo"],
               [_id: "51c8ddbae4b0548e8d233182", text: "bar"],
               [_id: "51c8ddbae4b0548e8d233183", text: "zoo"],
            ]
      }
   }
]

After the script is executed context would be transformed in:

ctx.documents = [
   {
      operation: "i",
      data:  {
         _id: "123456"
         name: "Pablo"
      }
   },
   {
      operation: "i",
      _parent: "123456",
      _index: "tweets"
      _type: "tweet",
      data:  {
         _id: "51c8ddbae4b0548e8d233181", 
         text: "foo"
      }
   },
   {
      operation: "i",
      _parent: "123456",
      _index: "tweets"
      _type: "tweet",
      data:  {
         _id: "51c8ddbae4b0548e8d233182", 
         text: "bar"
      }
   },
   {
      operation: "i",
      _parent: "123456",
      _index: "tweets"
      _type: "tweet",
      data:  {
         _id: "51c8ddbae4b0548e8d233183", 
         text: "zoo"
      }
   }
]

This changes would work for your scenario and other similar scenario where data transformation is required.

What do you think?

[1] - https://github.com/richardwilly98/elasticsearch-river-mongodb/wiki#script-filters

@richardwilly98
Copy link
Owner

@pablomolnar I have just created w new branch [1] which contains the new implementation.

It passes you Groovy unit test [2]. I feel it is more generic.

It requires 2 new settings

  • mongodb.options.advanced_transformation: to enable to new structure used ny the script.
  • mongodb.options.parent_types: mainly to optimize deletion of children (when parent is deleted).

Let me know what you think.

[1] - https://github.com/richardwilly98/elasticsearch-river-mongodb/tree/advanced-transformation
[2] - https://github.com/richardwilly98/elasticsearch-river-mongodb/blob/advanced-transformation/src/test/java/test/elasticsearch/plugin/river/mongodb/advanced/RiverMongoAdvancedTransformationChildrenTest.groovy

@pablomolnar
Copy link
Author

@richardwilly98 Cool, I'm gonna take a look and let you know.

@richardwilly98
Copy link
Owner

@pablomolnar did you have a chance to look at it?
I am planning to make a new release soon.

@vglafirov
Copy link

Hi Richard.

Could you please provide an example how to river parent/child relationships. Spent whole day and have not succeeded.

Thanks in advance.

@richardwilly98
Copy link
Owner

@vglafirov,

Issues #64 and #85 are related to parent / child. Please have a look.
There are also 2 example related to these issues in github repository: [1] and [2]

[1] - https://github.com/richardwilly98/elasticsearch-river-mongodb/tree/master/resources/issues/64
[2] - https://github.com/richardwilly98/elasticsearch-river-mongodb/tree/master/resources/issues/85

Thanks,
Richard.

@vglafirov
Copy link

@richardwilly98,
Thanks a lot, it worked. :)

@vglafirov
Copy link

@richardwilly98
One more comment. It looks like new version 1.7.0 does not close mongo cursor when you delete _river, it worked with 1.6.11 without restart elasticsearch.

@richardwilly98
Copy link
Owner

@vglafirov thanks for reporting this issue.

That should be fixed in the last commit [1]

[1] - 2972e6b

<groupId>org.codehaus.groovy</groupId>
<artifactId>groovy-all</artifactId>
<version>2.1.0</version>
<scope>test</scope>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like the spacing is screwed up on this line (and below)

@benmccann
Copy link
Collaborator

just changed the directory layout for the tests (93b91fb), so you may want to update this changeset

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants