Adding a field to the JSON of a PDF in MongoDB => NullPointerException for the river #91

antoinecarton · 2013-06-20T08:12:07Z

Hi,

First of all, here is the Exception from ElasticSearch :

Exception in thread "elasticsearch[Nathaniel Richards][mongodb_river_slurper][T#1]" java.lang.NullPointerException
at org.elasticsearch.river.mongodb.MongoDBRiver$Slurper.processOplogEntry(MongoDBRiver.java:1074)
at org.elasticsearch.river.mongodb.MongoDBRiver$Slurper.run(MongoDBRiver.java:986)
at java.lang.Thread.run(Thread.java:679)

Here is my configuration :
River : 1.6.9
ElasticSearch : 0.90.1
MongoDB : 2.4.4

Configuration used for MongoDB :
http://docs.mongodb.org/manual/tutorial/deploy-replica-set/, partie "Deploy a Development or Test Replica Set"

Next, in a console :

mongo --port 27017
use pdf_database5

In a second console, I add a PDF file :

mongofiles --host localhost:27017 --db pdf_database5 --collection fs --type applicaton/pdf put /PATH_TO_A_PDF

After that, I create a MongoDB river for ElasticSearch :

curl -XPUT "${host}/_river/mongodb/_meta" -d '{
"type": "mongodb",
"mongodb": {
"db": "pdf_database5",
"collection": "fs",
"gridfs": true
},
"index": {
"name": "mongoindex",
"type": "files"
}
}'

Until now, everything is OK and my PDF file is correctly indexed and full text search is OK.

However, once I add a field to the JSON of the PDF file, that is to say with the following step in mongoDB console :

db.fs.files.find({});

(for instance, 51c05f881a13d534df7463c4 is the ID of my PDF).

I add a field "titleDoc" to the object with the id 51c05f881a13d534df7463c4 thanks to the following command :

db.fs.files.update({"_id": ObjectId("51c05f881a13d534df7463c4")}, {$set: {"titleDoc":"MY TITLE DOC"}})

I then have the exception in the ElasticSearch log. I tried to edit the _mapping in ElasticSearch but there's still the error.

Maybe it is an error due to the fact that I forgot something for the river to map new fields of raw file like PDF in Mongo.

Thank in advance,

Antoine

richardwilly98 · 2013-06-24T00:19:26Z

Hi Antoine,

Additional gridfs metadata should be stored in metadata attribute (see here [1]).

doc.metadata = {}
doc.metadata.title = "woww"
db.fs.files.save(doc)
{
        "_id" : ObjectId("51c78a054ce10426a81a3e27"),
        "filename" : "test-document.pdf",
        "chunkSize" : 262144,
        "uploadDate" : ISODate("2013-06-23T23:51:33.229Z"),
        "md5" : "947090a3e9cac07c13adabb25b9a3fa9",
        "length" : 50573,
        "contentType" : "applicaton/pdf",
        "title" : "test",
        "metadata" : {
                "title" : "woww"
        }
}

Does it help?

[1] - http://docs.mongodb.org/manual/reference/gridfs/#gridfs-files-collection

Thanks,
Richard.

antoinecarton · 2013-06-24T08:01:20Z

Hi,

Thank you for your answer.

You are right for metadata attribute. However, I have already tried to use it and I still have the problem with the following steps :

My initial object :

{ "_id" : ObjectId("51c7f5dc71f6549c212cae37"), "filename" : "/home/acarton/Téléchargements/Cairngorm.pdf", "chunkSize" : 262144, "uploadDate" : ISODate("2013-06-24T07:31:41.611Z"), "md5" : "2d7d1f636a4e07b675eebb873330205e", "length" : 661649, "contentType" : "applicaton/pdf" }

The update command :

db.fs.files.update({"_id": ObjectId("51c7f5dc71f6549c212cae37")}, {$set: {"metadata.titleDoc":"Framework CAIRNGORM"}})

And the final object :

{ "_id" : ObjectId("51c7f5dc71f6549c212cae37"), "chunkSize" : 262144, "contentType" : "applicaton/pdf", "filename" : "/home/acarton/Téléchargements/Cairngorm.pdf", "length" : 661649, "md5" : "2d7d1f636a4e07b675eebb873330205e", "metadata" : { "titleDoc" : "Framework CAIRNGORM" }, "uploadDate" : ISODate("2013-06-24T07:31:41.611Z") }

I still have the NullPointerException with this update command.

However, the steps you give work fine. What is the difference between the "update" and the "save" commands ?

Thank you in advance,

Antoine

richardwilly98 · 2013-06-24T10:34:01Z

Hi,

The oplog entry is different for $set operation.

The entry for "save" operation is:

{
        "ts" : {
                "t" : 1372032972,
                "i" : 1
        },
        "h" : NumberLong("2162081457563127592"),
        "v" : 2,
        "op" : "u",
        "ns" : "mydb91.fs.files",
        "o2" : {
                "_id" : ObjectId("51c78a054ce10426a81a3e27")
        },
        "o" : {
                "_id" : ObjectId("51c78a054ce10426a81a3e27"),
                "filename" : "test-document.pdf",
                "chunkSize" : 262144,
                "uploadDate" : ISODate("2013-06-23T23:51:33.229Z"),
                "md5" : "947090a3e9cac07c13adabb25b9a3fa9",
                "length" : 50573,
                "contentType" : "applicaton/pdf",
                "title" : "test",
                "metadata" : {
                        "title" : "woww"
                }
        }
}

For $set operation:

{
        "ts" : {
                "t" : 1372065805,
                "i" : 1
        },
        "h" : NumberLong("8302104313737943305"),
        "v" : 2,
        "op" : "u",
        "ns" : "mydb91.fs.files",
        "o2" : {
                "_id" : ObjectId("51c78a07ae251a257e0e4d3e")
        },
        "o" : {
                "$set" : {
                        "metadata.titleDoc" : "test91"
                }
        }
}

The object id was extract from "o" but with $set is is only available in "o2". I will fix the code soon.

antoinecarton · 2013-06-24T12:13:03Z

Perfect ! Thank you !

richardwilly98 · 2013-07-16T13:12:45Z

Fix is available in release 1.6.11.

Thanks,
Richard.

richardwilly98 added a commit that referenced this issue Jun 24, 2013

Fix issue #91

c22fa8f

richardwilly98 closed this as completed Jul 29, 2013

cheald referenced this issue in kdkeck/elasticsearch-river-mongodb Apr 18, 2014

Support UPDATE_ROW operation

b90c687

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding a field to the JSON of a PDF in MongoDB => NullPointerException for the river #91

Adding a field to the JSON of a PDF in MongoDB => NullPointerException for the river #91

antoinecarton commented Jun 20, 2013

richardwilly98 commented Jun 24, 2013

antoinecarton commented Jun 24, 2013

richardwilly98 commented Jun 24, 2013

antoinecarton commented Jun 24, 2013

richardwilly98 commented Jul 16, 2013

Adding a field to the JSON of a PDF in MongoDB => NullPointerException for the river #91

Adding a field to the JSON of a PDF in MongoDB => NullPointerException for the river #91

Comments

antoinecarton commented Jun 20, 2013

richardwilly98 commented Jun 24, 2013

antoinecarton commented Jun 24, 2013

richardwilly98 commented Jun 24, 2013

antoinecarton commented Jun 24, 2013

richardwilly98 commented Jul 16, 2013