Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Argument Error :erlang.hd([]) on Elixir 1.8.1 #346

Closed
brianlow opened this issue Apr 26, 2019 · 12 comments · Fixed by #402
Closed

Argument Error :erlang.hd([]) on Elixir 1.8.1 #346

brianlow opened this issue Apr 26, 2019 · 12 comments · Fixed by #402

Comments

@brianlow
Copy link

After upgrading Elixir 1.7 to 1.8 we see this error from Consumer Groups:

** (stop) exited in: GenServer.call(#PID<0.10310.1>, {:fetch, %KafkaEx.Protocol.Fetch.Request{auto_commit: false, client_id: nil, correlation_id: nil, max_bytes: 1000000, min_bytes: 1, offset: 27, partition: 0, topic: "sabine-84829.mytopic", wait_time: 10}}, 5000)
    ** (EXIT) an exception was raised:
        ** (ArgumentError) argument error
            :erlang.hd([])
            (kafka_ex) lib/kafka_ex/server_0_p_8_p_2.ex:265: KafkaEx.Server0P8P2.fetch/2
            (kafka_ex) lib/kafka_ex/server_0_p_8_p_2.ex:117: KafkaEx.Server0P8P2.kafka_server_fetch/2
            (stdlib) gen_server.erl:661: :gen_server.try_handle_call/4
            (stdlib) gen_server.erl:690: :gen_server.handle_msg/6
            (stdlib) proc_lib.erl:249: :proc_lib.init_p_do_apply/3
    (elixir) lib/gen_server.ex:989: GenServer.call/3
    (kafka_ex) lib/kafka_ex/gen_consumer.ex:679: KafkaEx.GenConsumer.consume/1
    (kafka_ex) lib/kafka_ex/gen_consumer.ex:630: KafkaEx.GenConsumer.handle_info/2
    (stdlib) gen_server.erl:637: :gen_server.try_dispatch/4
    (stdlib) gen_server.erl:711: :gen_server.handle_msg/6
    (stdlib) proc_lib.erl:249: :proc_lib.init_p_do_apply/3
Last message: :timeout

kafka_ex: 0.10.1
Kafka: Heroku Kafak Add-on, common runtime, multi-tenant plan, Kafka version 2.0.1
Elixir: 1.8.1, erlang 21.3.3

  • occurs relatively consistently for particular partition+topic combinations
  • using a different Kafka cluster we see different partition+topic combinations
  • does not occur using Heroku Kafka Add-on in private space (not multi-tenant), Kafka v 2.1.1
  • have not been able to reproduce using a local docker kafka cluster
  • does not occur on Elixir 1.7.4
  • In fetch/2, the first hd/1 call fails because the response from the broker contains no topics, e.g. in network_request/3 we see <<0, 0, 6, 190, 0, 0, 0, 0>>
@joshuawscott
Copy link
Member

joshuawscott commented Apr 26, 2019

Ouch. hd/1 strikes again.

A few questions:

Does this take down the application?
Are you auto-creating topics? (I noticed the topic name seemed to be numbered)
Do you have high latency (not local network) between the consumer and the broker?

I'll see if I can replicate this and figure out a fix.

@brianlow
Copy link
Author

Does this take down the application?

App continues to run, error is logged periodically (maybe every 10s)

Are you auto-creating topics? (I noticed the topic name seemed to be numbered)

No, Heroku assigns a prefix that we must use on all topic names (e.g. sabine-84829.) because it is multi-tenant.

Do you have high latency (not local network) between the consumer and the broker?

Both are hosted within Heroku. I'll see if I can measure latency.

Interestingly, this error is logged repeated for the certain partition+topic combinations. I'd expect random partition+topic combinations if latency was the issue b/c it is usually random.

@brianlow
Copy link
Author

Errors are confined to 2 out of 8 brokers. Every partition on these brokers is affected.

Unable to measure latency, ping blocked.

Any suggestions for debugging?

@joshuawscott
Copy link
Member

how many partitions do you have?

@brianlow
Copy link
Author

1-8 partitions depending on topic

@joshuawscott
Copy link
Member

I'm wondering if having fewer partitions than brokers is what causes it.

@brianlow
Copy link
Author

Unfortunately #351 did not solve this issue for us. I appreciate the PR though @ukrbublik!

@brianlow
Copy link
Author

Continue to see this occasionally

  • one partition per topic
  • Heroku shared Kafka instances
  • kafka_ex 0.10, elixir 1.9.0

@adrienmo
Copy link

So we just had the same problem while using kafka_ex with the managed kafka service from confluent.

It seems that they are enabling quotas, when you exceed the quota it will return an empty response with a throttling time. Right now an empty result makes kafka_ex crash.

To fix it we can change the code here:
https://github.com/kafkaex/kafka_ex/blob/0.10.0/lib/kafka_ex/server_0_p_8_p_2.ex#L264

last_offset = case response do
          [%{partitions: [%{last_offset: last_offset}|_]}|_] -> last_offset
          bad_response ->
            nil
        end

@joshuawscott
Copy link
Member

@adrienmo That seems like a perfectly reasonable fix - feel free to open a PR with that and I can get it merged.

@adrienmo
Copy link

@joshuawscott Would this PR be based on master or on the tag 0.10.0 ?

@dantswain
Copy link
Collaborator

@adrienmo Against master, please. If you have a moment could you check lib/kafka_ex/new/client.ex for a similar problem? That code is based on this code. It might be good to log a warning with the bad response when this happens.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants