Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] "java.lang.IllegalArgumentException: -84 is not a valid id" #4521

Closed
uranru opened this issue Jul 3, 2024 · 12 comments
Closed

[BUG] "java.lang.IllegalArgumentException: -84 is not a valid id" #4521

uranru opened this issue Jul 3, 2024 · 12 comments
Labels
bug Something isn't working untriaged Require the attention of the repository maintainers and may need to be prioritized

Comments

@uranru
Copy link

uranru commented Jul 3, 2024

Describe the bug

I use logstash to load data in Openserach.
After updating the version (2.13 -> 2.15) on several nodes, I began to receive this error.

Related component

Other

To Reproduce

Logstash (logstash:8.13.1) sends data to Opensearch.

Expected behavior

No errors when loading data

Additional Details

Plugins
Please list all plugins currently enabled.

Screenshots
If applicable, add screenshots to help explain your problem.

Host/Environment (please complete the following information):

  • OS: [e.g. iOS]
  • Version [e.g. 22]

Additional context
Add any other context about the problem here.

@uranru uranru added bug Something isn't working untriaged Require the attention of the repository maintainers and may need to be prioritized labels Jul 3, 2024
@mgodwan
Copy link
Member

mgodwan commented Jul 3, 2024

Thanks @uranru for reporting this.

  1. Could you share the complete stack trace you're seeing for this in your opensearch cluster?
  2. Do you have security enabled for your OpenSearch cluster?

@uranru
Copy link
Author

uranru commented Jul 3, 2024

@mgodwan
I see these events in the logs:

[2024-07-03T13:38:30,029][INFO ][logstash.outputs.opensearch][db-pgbouncer][df97c6871c75b7619f5d19d9becaa589840f886a88ab802122079f6b3e55c614] Retrying failed action {:status=>500, :action=>["index", {:_id=>nil, :_index=>"db-pgbouncer-2024.07.02", :routing=>nil}, {"host"=>{"name"=>"db", "domain"=>"prod"}, "log"=>{"name"=>"pgbouncer", "time"=>2024-07-02T21:00:25.758Z, "file"=>{"path"=>"/var/log/postgresql/pgbouncer.log"}}, "tags"=>["pgbouncer", "prod", "_grokparsefailure"], "@version"=>"1", "processing"=>{"host"=>"logstash04", "timestamp"=>2024-07-02T21:00:30.879200Z, "delay"=>5.1}, "message"=>"2024-07-03 00:00:20.511 MSK [1133] WARNING could not parse hba config line 9", "@timestamp"=>2024-07-02T21:00:25.758Z, "app"=>{"name"=>"pgbouncer", "group"=>"db"}, "alert"=>"false"}], :error=>{"type"=>"exception", "reason"=>"java.lang.IllegalArgumentException: -84 is not a valid id", "caused_by"=>{"type"=>"illegal_argument_exception", "reason"=>"-84 is not a valid id"}}}

[2024-07-03T13:38:30,029][INFO ][logstash.outputs.opensearch][db-pgbouncer][df97c6871c75b7619f5d19d9becaa589840f886a88ab802122079f6b3e55c614] Retrying failed action {:status=>500, :action=>["index", {:_id=>nil, :_index=>"db-pgbouncer-2024.07.02", :routing=>nil}, {"host"=>{"name"=>"db", "domain"=>"prod"}, "log"=>{"name"=>"pgbouncer", "time"=>2024-07-02T21:00:25.758Z, "file"=>{"path"=>"/var/log/postgresql/pgbouncer.log"}}, "tags"=>["pgbouncer", "prod", "_grokparsefailure"], "@version"=>"1", "processing"=>{"host"=>"logstash04", "timestamp"=>2024-07-02T21:00:30.879238Z, "delay"=>5.1}, "message"=>"2024-07-03 00:00:20.510 MSK [1133] LOG got SIGHUP, re-reading config", "@timestamp"=>2024-07-02T21:00:25.758Z, "app"=>{"name"=>"pgbouncer", "group"=>"db"}, "alert"=>true}], :error=>{"type"=>"exception", "reason"=>"java.lang.IllegalArgumentException: -84 is not a valid id", "caused_by"=>{"type"=>"illegal_argument_exception", "reason"=>"-84 is not a valid id"}}}

@uranru
Copy link
Author

uranru commented Jul 3, 2024

securityconfig/config.yml

_meta:
  type: "config"
  config_version: 2
config:
  dynamic:
    authc:
      basic_internal_auth_domain:
        http_enabled: true
        transport_enabled: true
        order: 0
        http_authenticator:
          type: basic
          challenge: false
        authentication_backend:
          type: internal

      openid_auth_domain:
        http_enabled: true
        transport_enabled: true
        order: 1
        http_authenticator:
          type: openid
          challenge: false
          config:
            openid_connect_idp:
              enable_ssl: true
              verify_hostnames: false
              pemtrustedcas_filepath: /usr/share/opensearch/config/certificates/ca/ca.pem
            subject_key: preferred_username
            roles_key: roles
            openid_connect_url: https://idp.xxx.com:/realms/prod/.well-known/openid-configuration
        authentication_backend:
          type: noop

@peternied
Copy link
Member

[Triage - attendees 1 2 3]
@uranru Thanks for filing, transfering this to the security plugin repo

@peternied peternied transferred this issue from opensearch-project/OpenSearch Jul 3, 2024
@DarshitChanpura
Copy link
Member

Related to: #4494

@shikharj05
Copy link
Contributor

shikharj05 commented Jul 4, 2024

In 2.11, custom serialization was introduced with #2802

In 2.14, custom serialization was disabled and the plugin moved back to JDK serialization with #4264

Hence, I think for a cluster upgrading from 2.11/2.12/2.13 to any 2.14+ version, the cluster can run into this issue due to different serialization methods. (edit: if a plugin like PA wraps the outer channel #609)

In 2.11/2.12/2.13 - serialization method is decided using version check

In 2.14+ - serialization is decided using version check in SerializationFormat

Potential fix: We need to consider outerChannel if an existing cluster with custom serialization is being upgraded to a cluster that will use JDK serialization and change serialization method accordingly.

@stephen-crawford
Copy link
Contributor

[Triage] Hi @uranru thanks for filing this issue. I am going to close this issue in favor of #4494 which is a bit older with some additional context. Thank you for filing this though as it is a problem.

@cwperks
Copy link
Member

cwperks commented Jul 8, 2024

Hence, I think for a cluster upgrading from 2.11/2.12/2.13 to any 2.14+ version, the cluster can run into this issue due to different serialization methods.

How does this happen though?

The serialization/de-serialization logic requires the sending and receiving node to get the version of the opposite node on the channel correctly.

i.e.

Transmitting Node --------------------- Transport Channel ------------------------> Receiving Node

The Transmitting node needs to get the version of the receiving node to determine serialization method. Likewise, the receiving node needs to get the version of the transmitting node to determine how to deserialize.

This issue occurs when either the transmitting node or receiving node fails to get the accurate version of the opposite node.

This issue a red herring for improper delegation and it needs to be determined where the improper delegation is occurring.

When this issue was seen in < 2.11 nodes mixed with 2.11 nodes it was because Performance Analyzer was not properly delegating the getVersion implementation to the wrapped channel: opensearch-project/performance-analyzer#609

@shikharj05
Copy link
Contributor

How does this happen though?

In this case, the version of the outer channel is not being checked here. Inner channel wrapped by PA doesn't implement the getVersion method, which would mean a cluster with PA would fallback to current version instead of version of transmitting node.

@DarshitChanpura
Copy link
Member

TLDR from above, would implementing a version check in both channels resolve the issue?

@cwperks
Copy link
Member

cwperks commented Jul 9, 2024

In this case, the version of the outer channel is not being checked here. Inner channel wrapped by PA doesn't implement the getVersion method, which would mean a cluster with PA would fallback to current version instead of version of transmitting node.

What are the concrete classes of the outer channel and inner channel that produces the error seen by OP?

@tronboto
Copy link

tronboto commented Jul 18, 2024

To add another data point - we're seeing this during an upgrade from 2.11.1 to 2.15.0. Would disabling PA help to work around it for now?

edit: I had thought it was enabled by default but we never had it enabled anyway:

[2024-07-18T12:04:46,318][INFO ][o.o.p.h.c.PerformanceAnalyzerConfigAction] [esnode1:9200] PerformanceAnalyzer Enabled: false

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working untriaged Require the attention of the repository maintainers and may need to be prioritized
Projects
None yet
Development

No branches or pull requests

8 participants