-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Ingest pipeline bulk update issue #16663
Comments
Similar issue: #10864, the root cause is that Update API converts the updateRequest to an indexRequest if the document exists, so the default ingest pipeline is executed, but Bulk API keep the updateRequest as the origin. By checking the code, I think ingest pipeline was designed only for index operation, not for update operation, we can also see that the Index API supports For this use case, I've tried to find some workaround, one option is that use painless script to update the
|
Thanks @gaobinlong for looking into it
Found this long thread on the matter [1], TLDR; is that Update API does not support ingest pipelines, we should probably document that (and prevent if possible). |
Thanks @gaobinlong
👍
But this functionality does not work, does it? |
For update API, we cannot specify a pipeline explicitly, but if the index has a default pipeline or a final pipeline and the specified document exists, the default or final pipeline will be executed, this behavior is unexpected, and not consistent with the bulk update(the default pipeline or final pipeline never has a chance to execute). Preventing the default or final pipeline from being executed in update API is possible, but it's a breaking change IMO, so we may firstly show a warning header for users. |
Got it now, thanks, it makes sense to me. |
Describe the bug
Ingest Pipeline works fine for single call of create, index and Update for pipeline.
Bulk create, bulk index works fine for pipeline only when we are performing bulk update it doesn't work.
Related component
Other
To Reproduce
PUT _ingest/pipeline/update_timestamp
{
"description": "Automatically updates the 'updated' field on insert or update",
"processors": [
{
"set": {
"field": "updated",
"value": "{{_ingest.timestamp}}"
}
}
]
}
Output
{
"acknowledged": true
}
2.Create index
PUT /on_boarding_employees-1
{
"settings": {
"index": {
"default_pipeline": "update_timestamp"
}
}
}
Output
{
"acknowledged": true,
"shards_acknowledged": true,
"index": "on_boarding_employees-1"
}
Adding Doc:
POST /on_boarding_employees-1/_doc
{
"type": "ONBOARDING_EMPLOYEE",
"name": “Rahul”
}
Output
{
"_index": "on_boarding_employees-1",
"_id": "9f2pM5MB70XT8uT4kP1K",
"_version": 1,
"result": "created",
"_shards": {
"total": 2,
"successful": 2,
"failed": 0
},
"_seq_no": 0,
"_primary_term": 1
}
Match query Output:
{
"took": 620,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 1,
"hits": [
{
"_index": "on_boarding_employees-1",
"_id": "9f2pM5MB70XT8uT4kP1K",
"_score": 1,
"_source": {
"name": “Rahul”,
"type": "ONBOARDING_EMPLOYEE",
"updated": "2024-11-16T06:29:30.826236733Z"
}
}
]
}
}
Normal Update:
POST /on_boarding_employees-1/_update/9f2pM5MB70XT8uT4kP1K
{
"doc": {
"type": "ONBOARDING_EMPLOYEE_UPDATED"
}
}
Output
{
"_index": "on_boarding_employees-1",
"_id": "9f2pM5MB70XT8uT4kP1K",
"_version": 2,
"result": "updated",
"_shards": {
"total": 2,
"successful": 2,
"failed": 0
},
"_seq_no": 1,
"_primary_term": 1
}
Match query Output:
"took": 268,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 1,
"hits": [
{
"_index": "on_boarding_employees-1",
"_id": "9f2pM5MB70XT8uT4kP1K",
"_score": 1,
"_source": {
"name": “Rahul”,
"type": "ONBOARDING_EMPLOYEE_UPDATED",
"updated": "2024-11-16T06:33:05.478645288Z"
}
}
]
}
}
Bulk Update:
POST /on_boarding_employees-1/_bulk?pipeline=update_timestamp
{"update":{"_id":"9f2pM5MB70XT8uT4kP1K"}}
{"doc":{"type":"ONBOARDING_EMPLOYEE14","name":"Aman2"}}
{"update":{"_id":"9v2xM5MB70XT8uT4uv0x"}}
{"doc":{"type":"ONBOARDING_EMPLOYEE13","name":"Neha"}}
Match query Output:
{
"took": 777,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 2,
"relation": "eq"
},
"max_score": 1,
"hits": [
{
"_index": "on_boarding_employees-1",
"_id": "9v2xM5MB70XT8uT4uv0x",
"_score": 1,
"_source": {
"name": "Neha",
"type": "ONBOARDING_EMPLOYEE13",
"updated": "2024-11-16T06:38:25.841280080Z"
}
},
{
"_index": "on_boarding_employees-1",
"_id": "9f2pM5MB70XT8uT4kP1K",
"_score": 1,
"_source": {
"name": "Aman2",
"type": "ONBOARDING_EMPLOYEE14",
"updated": "2024-11-16T06:33:05.478645288Z"
}
}
]
}
}
Expected behavior
Expected behaviour would be updating the timefield, but it remains same for bulk operation
"updated": "2024-11-16T06:33:05.478645288Z"
Additional Details
No response
The text was updated successfully, but these errors were encountered: