-
-
Notifications
You must be signed in to change notification settings - Fork 356
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
usbhid-ups changes: more resiliant w/failures; scales Tripp Lite SMART1500LCDT #122
base: master
Are you sure you want to change the base?
Conversation
assigning *udevp handles to devices we skip and then usb_close(). Don't assign *udevp till we're sure we're going to return it.
this introduces changes on the drivers interface that requires more thinking and review. the "Scale for SMART1500LCDT" commit should also be extracted and submitted as a separate pull request. |
So far, the 3016 protocol units all seem to have this scaling issue.
I applied these patches to nut 2.7.3 (running on Fedora 21). They improved it from 'never works' to 'works sometimes, maybe'. Repeated runs of When trying to run nut-server the driver initially seems to work and then repeatedly fails with |
@bcl this seems to be motherboard-dependent: http://article.gmane.org/gmane.comp.monitoring.nut.user/9465 What kind of hardware are you running on? Any luck with moving to non-USB3 ports? Also what kernel and libusb does F21 provide? |
Interesting, I'm not sure what it was plugged into, I'll try switching it. The motherboard is an ASRock B85M-ITX The kernel is 4.1.13-100.fc21.x86_64 and libusb is libusb-0.1.5-5.fc21.x86_64, I also tried some manual poking around using libhid-python-0.2.17-17.fc21.x86_64 and ran into similar problems - Error from libusb: Input/output error |
$ lspci|grep -i usb
Not entirely surprising - much of the NUT USB HID code came from that libhid source tree. I should look into whether the HIDAPI package does things differently, but I really think this is a compatibility issue at a lower level (as you pointed out with lsusb). |
00:14.0 USB controller: Intel Corporation 8 Series/C220 Series Chipset Family USB xHCI (rev 05) It was on a USB2 port, moving it to a USB3 port didn't change the behavior at all. |
I moved the Tripp Lite to another system with only the scaling patch applied and for the moment it appears to be running ok. lspci looks like: 00:1a.0 USB controller: Intel Corporation 6 Series/C200 Series Chipset Family USB Enhanced Host Controller #2 (rev 05) |
Thanks, that's a useful data point. It points towards an incompatibility with the newer USB host controllers.
|
A lot has changed at once, but I have a positive datapoint for more reliable SMART1500LCDT support:
I have a feeling that the kernel update helped the most, because If you had trouble before, can you recheck? If you're still on an older kernel, feel free to try out the libusb-1.0 branch (see issue #300). |
So I just got this UPS yesterday and have it working nicely with NUT. However, every so often (about 15-30 minutes, sometimes much longer) i'll get notified that NUT has lost communications. I am running on the NUT provided by apt.
I just changed If it matters, I originally had the UPS plugged into a USB 2 port but after I started having issues I switched it to a USB 3 port. I believe that it may have started crashing less after switching it. Also, i'm on a SuperMicro A1SAI-2750F Intel Atom server board. I have a Z-Wave stick plugged in alongside the UPS and it's never given me problems. Finally, I have been doing a lot of development for NUT since getting my UPS (creating a Go library and implementing NUT support in Home Assistant). Could repeated abuse of the TCP API cause problems somehow? I assume not but thought i'd point it out. |
The TCP protocol fetches data from the dstate layer. The polling loop proceeds independently of the dstate queries (unless you are starving out the event loop completely), so I don't think this would affect things. |
Changing the |
Finally got around to switching to |
The libusb-1.0 branch (#300) is now available in this PPA: https://launchpad.net/~clepple/+archive/ubuntu/nut |
@clepple Still getting drops with the |
@robbiet480 can you elaborate on this? Failure to detect, or disconnects relatively soon when on USB 3.0? (Also wondering if simply unplugging and reconnecting the USB cable helps...) |
@clepple Sorry for the delay, meant to reply to this last night. It was total failure to detect. I did not attempt to unplug/reconnect, just left it on a USB 2.0 port. |
I continue having drops. I tried to restart the driver during one of these drops and |
I think it's the combination of that model (the 3016 USB ID) plus the USB controllers on newer motherboards. Another data point: I moved my SMART1500LCDT from a Linux box with an ICH10 to an older Core2 Duo Mac Mini. I let Mac OS X poll the UPS, so libusb was out of the picture. It worked fine for several days. I then moved the UPS to an i7-based Mac Mini (also OS X 10.11) and the UPS disconnected within five minutes of being polled by the OS. |
@clepple Interesting data point since my last comment: Haven't had a single drop since changing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I applied this scale change as a patch to 2.6.5 on rhel6, and it greatly improves the output.
@sdgathman I am not familiar with the new "user approves these changes" UI in GitHub, but the reason why this pull request hasn't been merged is due to the major changes in the claim/release logic that would have to be tested on many other OS and UPS combinations, and also because I don't think the infinite retry logic makes sense for non-home users (or if it does, we need additional upsmon notifications for when it is still retrying). We ended up testing all of the OS/UPS combinations anyway as part of #300... The scaling patch was merged separately as part of 2.7.4. |
Until late last night I was continuing to get disconnects many times per day. Then I saw the issue description of #277 where the author stated that he resolved disconnections by changing out the provided USB cable with a known good one. I then searched the Amazon reviews and found many people reporting the exact same issue. Since changing out my USB cable everything has been super stable 😎. First time for everything. Never would have thought it was the cable! |
Turns out, I kept getting disconnects and just gave up for a long time. A few days ago, while I was doing annual server maintenance and was waiting for a ZFS scrub to finish I decided to play around with this again. On the latest NUT, compiled from GitHub, with no other config changes, I am no longer getting disconnects, for real, after like 3 days of monitoring! What is sometimes happening is
Attempting to restart Things to note: I was previously on the libusb branch, now back to the mainline. So anyway, guessing that a thought to be unrelated fix was applied between January 13, 2017 and 3-4 days ago that happened to seriously improve Tripp-Lite support in NUT? |
@robbiet480 Have you seen my reset assist kludge for SMART1500LCDT? |
Almost two years on and man this UPS really sucks. Continually getting drops still. |
I upgraded server to CentOS-8, and the UPS still drops. BUT, kernel-4.18 now automatically tries the power cycle sequence, so my addon package for CentOS-6 is no longer needed. See README on sdgathman/trippfix. |
Unfortunately this PR grew out of sync vs. current master, needs expert attention to realign. |
Note for future looks at this PR: according to comments above, "The scaling patch was merged separately as part of 2.7.4" (and I think there were later others between that and 2.7.5, see e.g. #963), while the rest of this change was deemed too disruptive for the time being then (with libusb-1.0 merge ahead). Now that the libusb-1.0 support is in NUT master branch, as well as other changes around the codebase (from formatting to warnings fixes) this PR has to be adapted to be mergeable again, at least. Content-wise, seems it can be split into several efforts:
|
So at this current point in time, is there a viable workaround for this model? |
@caseyjmorton I use this model. I used the duct tape referenced above for CentOS-6, and CentOS-8 and now Rocky8 kernels do the port power cycle themselves. You just need a hub that supports PPPS. https://gathman.org/2016/07/30/Standard_Schmandard/ |
Had trouble getting nut to work for my new Tripp Lite SMART1500LCDT. So in best open source fashion, I hacked at it till it worked. Couple things in this pull request: