Skip to content
This repository has been archived by the owner on Oct 12, 2023. It is now read-only.

IoTHub module: Unrecoverable crash #288

Closed
janfl opened this issue May 29, 2017 · 12 comments
Closed

IoTHub module: Unrecoverable crash #288

janfl opened this issue May 29, 2017 · 12 comments
Assignees

Comments

@janfl
Copy link

janfl commented May 29, 2017

Hi
I have an older version(2 mdr) of the GW running in production together with standard and custom modules.
I have experienced problems with the IotHub module.
I get this error in the log file.
iothubtransport_amqp_messenger.c
Function: process_state_changes:1530
message: messagereceiver reported unexpected state %d while messenger is starting

After i have received this message the module never recover and it keep failing with all kind of errors until GW is restarted. In one day it generated 4 GB log file with errors.

@darobs
Copy link
Contributor

darobs commented May 30, 2017

I sorry this happened. I'm looking into the error that led it into this state and seeing what we could do to avoid or mitigate this type of failure in the future.

If it's possible, I'd like to get the specific version this is based on (commit id or tag, etc). The latest master version of iothubtransport_amqp_messenger.c is different and I want to make sure I'm looking at the correct version. Thanks!

ETA: If we could get logs (not all 4G please!) around when the error seemed to start, that would be awesome.

@darobs darobs self-assigned this May 30, 2017
@darobs darobs added the bug label May 30, 2017
@janfl
Copy link
Author

janfl commented Jun 6, 2017

I don't have commit id or Tag. I can see the version i'm running is downloaded the 2nd of April after lunch.

Error.txt

@janfl
Copy link
Author

janfl commented Jun 6, 2017

We see another problem we get this error every 1 hour and 11-12 min.

And after that the gateway keep running....
Hourly_Error.txt

@darobs
Copy link
Contributor

darobs commented Jun 8, 2017

Thank you @janfl - I'm going to consult with our device SDK team about this.

@ewertons
Copy link

ewertons commented Jun 9, 2017

Hi @darobs ,
I took a look at the logs, and they start at the point where the transport indicates that there was an error, but the part of the traces that indicate what the error was are not included. It would be the lines above the ones pasted.

Is the network connection on all the time? Or there are glitches?

@janfl
Copy link
Author

janfl commented Jun 16, 2017

Do you generate some extra log for this module, because what i have done is to hook up to the Internal Gateway logger and that does not give me more errors then what i have send you. Do the module log other levels than Trace, Info or Error? (Could explain the missing log).

I can't guaranty that the connection is on at all times. Do your module not support the loss of connection?

@janfl
Copy link
Author

janfl commented Jun 16, 2017

Regarding the fatal crash with log files expanding we had it yesterday again.

@darobs
Copy link
Contributor

darobs commented Jun 16, 2017

@janfl

We do have connection recovery in AMQP with the Device SDK, but your connection seems to be exposing a bug. I was hoping this was something the SDK team had already experienced, and that a fix is in place. This doesn't seem to be the case, so I'm raising this issue with the devices SDK team, both on GitHub and in person so this get the attention it needs.

See: Azure/azure-iot-sdk-c#159

@darobs
Copy link
Contributor

darobs commented Jun 19, 2017

Hello again @janfl, @ewertons is asking for information on the other issue that I am not able to provide. It also looks like the Edge repo needs to pull in a later version of the C SDK so that we can get access to some improvements and retry policies for AMQP.

@janfl
Copy link
Author

janfl commented Jun 30, 2017

Hi @darobs @ewertons @tameraw . Here is an update to the issue.
I have been running an IotGateway based service(the one where the problem was seen before) and a IoT Edge based service on the same computer connected to different IotHubs.
I have experienced both getting into the fatal error state.....

Iot Edge service i based on:
Microsoft.Azure.IoT.Gateway.Module.1.0.5 and Microsoft.Azure.Devices.Gateway.Native.Windows.x64.1.1.2

Here is part of the error files:
IotEdgeError.txt
IotGatewayerror.txt

We have a meeting with @DBojsen later today and i'll mention the issue.

@darobs
Copy link
Contributor

darobs commented Jul 7, 2017

@janfl

I have updated the submodule dependencies, which pulls in the 2017_06_30 release of the Device SDKs. This version of the Device SDK has improved reconnection logic with a retry policy, and I was promised better and less verbose logging.

If you can, please update the edge build to include these changes, and then we can see how well the edge performs.

@damonbarry
Copy link
Member

Available in the 2017-08-21 release.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants