Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

apcsmart lost communication with UPS results in intense syslog flood #704

Closed
tomiisp opened this issue Jun 17, 2019 · 31 comments
Closed

apcsmart lost communication with UPS results in intense syslog flood #704

tomiisp opened this issue Jun 17, 2019 · 31 comments
Assignees
Labels
APC bug Connection stability issues Issues about driver<->device and/or networked connections (upsd<->upsmon...) going AWOL over time DDL impacts-release-2.7.4 Issues reported against NUT release 2.7.4 (maybe vanilla or with minor packaging tweaks) impacts-release-2.8.2 Issues reported against NUT release 2.8.2 (maybe vanilla or with minor packaging tweaks) Linux Some issues are specific to Linux as a platform serial port USB
Milestone

Comments

@tomiisp
Copy link

tomiisp commented Jun 17, 2019

Hi,

I get this issue second time, nutups lost communication with UPS (via USB/Serial cable) and nut tools and syslog start eating all 4 cores (cpu quickly reach temperature 78C ), it produce huge log file (my poor SD card...) and it produce about 4500lines in log per second !
Entries in log looks:

Jun 17 22:00:08 iotgwpc2 apcsmart[1285]: Warning: excessive comm failures, limiting error reporting

Jun 17 22:00:08 iotgwpc2 apcsmart[1285]: Communications with UPS lost: serial port write error: 1379(smartmode): Input/output error

Jun 17 22:00:08 iotgwpc2 apcsmart[1285]: message repeated 9 times: [ Communications with UPS lost: serial port write error: 1379(smartmode): Input/output error]

That USB/serial is only temporary solution, later UPS will be connected directly to onboard UART but this is insane amount of error messages and rate.
Is this a bug or there is a option to limit this error messages ?

Orange PI PC2 - Armbian 4.19.38-sunxi64 #5.86 SMP
Network UPS Tools - UPS driver controller 2.7.4

/Tomi

@jimklimov jimklimov added the bug label Jul 14, 2022
@fatbasstard
Copy link

Hitting the same behaviour now, any progress on this?

@jimklimov
Copy link
Member

I am not aware of anyone addressing this specifically, so probably fair to say it is a bug, and probably it is still present. Tested PRs for throttling the message emission (maybe slower backoff to retry connecting?) are welcome.

@olicooper
Copy link

Just had the same happen to me on a Raspberry Pi. Filled my 250GB SSD which subsequently made the Home Assistant database get corrupted. No way of catching it that quickly since it happened while I was sleeping. I'm not happy about this at all.

Any solution or workaround to this? I've just disabled nut for now.

@jimklimov
Copy link
Member

jimklimov commented May 2, 2023 via email

@nnov4k
Copy link

nnov4k commented Jul 21, 2024

Hello!
i have some problem, is any ideas, how to fix syslog flooding by apcsmart?

@jimklimov
Copy link
Member

jimklimov commented Jul 21, 2024

Not that I'm aware of, nobody posted related PRs or investigation notes...

In-driver throttling could be relatively easy, there are a few precedents in other NUT programs.

Restarting the driver or the connection if the situation persists (and if it does mean the running copy of driver becomes useless for data collection) might kickstart from that same timestamp tracking involved in throttling.

Getting to the root of it (faulty HW/FW? any mitigations?) would be harder.

@nnov4k
Copy link

nnov4k commented Jul 28, 2024

baaaad :(
i made a script, which kills stupid driver when UPS deads... it fixes a system stability, but problem still remains...

ACTION=="add", SUBSYSTEM=="usb", ATTR{idVendor}=="067b", ATTR{idProduct}=="2303", MODE="0660", GROUP="nut", RUN+="/bin/nut-switch-restart.sh"
ACTION=="remove", SUBSYSTEM=="usb", ATTR{idVendor}=="067b", ATTR{idProduct}=="2303", MODE="0660", GROUP="nut", RUN+="/bin/systemctl stop nut-driver"

@jimklimov
Copy link
Member

Just in case, would you care to contribute your script and a README about it to NUT, for others to use?

Wiki page https://github.com/networkupstools/nut/wiki/Troubleshooting-eventual-disconnections-(Data-stale) refers to a couple of such scripted and documented know-how remedies, one more would be good... At least, better have a bad way forward for the time being, than none at all.

@jimklimov jimklimov added APC USB serial port impacts-release-2.7.4 Issues reported against NUT release 2.7.4 (maybe vanilla or with minor packaging tweaks) Linux Some issues are specific to Linux as a platform Connection stability issues Issues about driver<->device and/or networked connections (upsd<->upsmon...) going AWOL over time labels Jul 28, 2024
@jimklimov
Copy link
Member

jimklimov commented Jul 29, 2024

Finally got to dig around the code and try to guess the issue.

So, my guess is that at some point the UPS controller pushes out some message into the buffer, so FD is sort of ready to be read. But something fails while reading it, maybe the apc_read_*() methods fail early (leaving part of buffer unread) and there would be a logged message just before the flood?

Supposedly, at this point the buffer is not empty and dstate_poll_fds() returns immediately.

Further attempts to upsdrv_updateinfo() start with setting of smartmode and that fails, maybe because the controller is still pushing out that earlier message and the buffer was not emptied by the driver.

After a few quiet retries, the driver bombards the controller with smartmode requests as part of failure recovery, probably DoS'ing the UPS controller even further.

Some ideas to try would be:

  • sleep in data-stale recovery loop, at least calming the flood to a trickle
  • read the incoming buffers, if any, just to flush them (after a recovery from data stale we would do a full read anyway)
  • if this situation persists, consider recycling the connection (or exiting the driver program so systemd would recycle it whole)

UPDATE: One more thing of note: in apc_read() there are timeouts for the select() from the upsfd, different for varying flags bits. Maybe those are insufficient if the controller on the UPS is old and clocked out of spec, or the host (or its USB bus) is busy context-switching and traffic-sharing, and does not deliver the message in time?

jimklimov added a commit to jimklimov/nut that referenced this issue Jul 29, 2024
…owly [networkupstools#704]

Signed-off-by: Jim Klimov <jimklimov+nut@gmail.com>
jimklimov added a commit to jimklimov/nut that referenced this issue Jul 29, 2024
…lush the incoming data buffer [networkupstools#704]

Signed-off-by: Jim Klimov <jimklimov+nut@gmail.com>
jimklimov added a commit to jimklimov/nut that referenced this issue Jul 29, 2024
…econnect every 60 retry attempts [networkupstools#704]

Signed-off-by: Jim Klimov <jimklimov+nut@gmail.com>
jimklimov added a commit to jimklimov/nut that referenced this issue Jul 29, 2024
…e fix it [networkupstools#704]

Signed-off-by: Jim Klimov <jimklimov+nut@gmail.com>
jimklimov added a commit to jimklimov/nut that referenced this issue Jul 29, 2024
Signed-off-by: Jim Klimov <jimklimov+nut@gmail.com>
@jimklimov jimklimov added this to the 2.8.3 milestone Jul 29, 2024
@jimklimov
Copy link
Member

Posted a PR based on the findings and guesses above. Testing would be very much welcome.

@jimklimov jimklimov changed the title Lost communication with UPS flood syslog apcsmart lost communication with UPS results in intense syslog flood Jul 29, 2024
@jimklimov
Copy link
Member

Did anyone manage to try testing the proposed code? Instructions are in the PR #2564

@jimklimov
Copy link
Member

jimklimov commented Aug 15, 2024

Bump? I'm eager to merge that change, and head towards a new NUT release, but would love confirmations first that it does not actually break the driver into a worse state than now :)

@nnov4k
Copy link

nnov4k commented Aug 20, 2024

compiled 2.8.2, service starts OK, but does not accept connections to itself at start. After some minutes connects OK.
testing... :)

log:
root@ha:/etc/nut# upsdrvctl status
Network UPS Tools upsdrvctl - UPS driver controller 2.8.2.922-922-g575d42369 (development iteration after 2.8.2)
UPSNAME UPSDRV RUNNING PF_PID S_RESPONSIVE S_PID S_STATUS
apc apcsmart N/A -3 NOT_RESPONSIVE N/A
root@ha:/etc/nut# upsdrvctl start
Network UPS Tools upsdrvctl - UPS driver controller 2.8.2.922-922-g575d42369 (development iteration after 2.8.2)
Network UPS Tools driver 2.8.2.922-922-g575d42369 (development iteration after 2.8.2) - APC Smart protocol driver 3.33
APC command table version 3.1
[d] unrecognized
root@ha:/etc/nut# upsdrvctl status
Network UPS Tools upsdrvctl - UPS driver controller 2.8.2.922-922-g575d42369 (development iteration after 2.8.2)
UPSNAME UPSDRV RUNNING PF_PID S_RESPONSIVE S_PID S_STATUS
apc apcsmart RUNNING 6652 RESPONSIVE 6652 "OL LB"

root@ha:/etc/nut# service nut-driver start
root@ha:/etc/nut# service nut-driver status
● nut-driver.service - Network UPS Tools - power device driver controller
Loaded: loaded (/lib/systemd/system/nut-driver.service; static)
Active: inactive (dead)

Aug 20 21:33:37 ha upsdrvctl[6662]: [d] unrecognized
Aug 20 21:33:37 ha apcsmart[6662]: [d] unrecognized
Aug 20 21:33:43 ha apcsmart[6683]: Startup successful
Aug 20 21:33:43 ha apcsmart[6683]: upsnotify: failed to notify about state 2: no notification tech defined, will not spam more about it
Aug 20 21:33:43 ha systemd[1]: Started Network UPS Tools - power device driver controller.
Aug 20 21:33:43 ha systemd[1]: Stopping Network UPS Tools - power device driver controller...
Aug 20 21:33:43 ha upsdrvctl[6684]: Network UPS Tools upsdrvctl - UPS driver controller 2.8.2.922-922-g575d42369 (development iteration after 2.8.2)
Aug 20 21:33:43 ha apcsmart[6683]: Signal 15: exiting
Aug 20 21:33:45 ha systemd[1]: nut-driver.service: Succeeded.
Aug 20 21:33:45 ha systemd[1]: Stopped Network UPS Tools - power device driver controller.
root@ha:/etc/nut# service nut-server status
● nut-server.service - Network UPS Tools - power devices information server
Loaded: loaded (/lib/systemd/system/nut-server.service; enabled; vendor preset: enabled)
Active: active (running) since Tue 2024-08-20 21:29:54 MSK; 4min 20s ago
Process: 6414 ExecStartPre=/usr/bin/systemd-tmpfiles --create /nut-common-tmpfiles.conf (code=exited, status=1/FAILURE)
Process: 6416 ExecStartPost=/bin/grep -E Units|Max open files /proc/${MAINPID}/limits (code=exited, status=0/SUCCESS)
Main PID: 6415 (upsd)
Tasks: 1 (limit: 2171)
Memory: 1.5M
CGroup: /system.slice/nut-server.service
└─6415 /usr/sbin/upsd -F

Aug 20 21:29:54 ha nut-server[6415]: Running as foreground process, not saving a PID file
Aug 20 21:29:54 ha nut-server[6415]: upsnotify: failed to notify about state 2: no notification tech defined, will not spam more about it
Aug 20 21:29:54 ha upsd[6415]: Running as foreground process, not saving a PID file
Aug 20 21:29:54 ha upsd[6415]: upsnotify: failed to notify about state 2: no notification tech defined, will not spam more about it
Aug 20 21:29:58 ha nut-server[6415]: User net@192.168.2.3 logged into UPS [apc]
Aug 20 21:29:58 ha upsd[6415]: User net@192.168.2.3 logged into UPS [apc]
Aug 20 21:31:07 ha nut-server[6415]: Connected to UPS [apc]: apcsmart-apc
Aug 20 21:31:07 ha upsd[6415]: Connected to UPS [apc]: apcsmart-apc
Aug 20 21:33:15 ha nut-server[6415]: Connected to UPS [apc]: apcsmart-apc
Aug 20 21:33:15 ha upsd[6415]: Connected to UPS [apc]: apcsmart-apc

root@ha:/etc/nut# upsc -L
apc: APC SC420i @ HA
root@ha:/etc/nut# upsc -V
Network UPS Tools upsc 2.8.2.922-922-g575d42369 (development iteration after 2.8.2)
root@ha:/etc/nut# upsc apc
Error: Driver not connected
root@ha:/etc/nut# upsc apc@localhost
Error: Driver not connected

but.... after 5 minutes...

root@ha:/var/log# upsdrvctl start
Network UPS Tools upsdrvctl - UPS driver controller 2.8.2.922-922-g575d42369 (development iteration after 2.8.2)
Network UPS Tools driver 2.8.2.922-922-g575d42369 (development iteration after 2.8.2) - APC Smart protocol driver 3.33
APC command table version 3.1
[d] unrecognized
root@ha:/var/log# upsc apc
battery.alarm.threshold: 0
battery.charge: 100.0
battery.charge.low: 80
battery.charge.restart: 00
battery.date: 11/14/06
battery.runtime: 360
battery.runtime.low: 600
battery.voltage: 13.69
battery.voltage.nominal: 012
device.mfr: APC
device.model: Smart-UPS SC420
device.serial: QS0647321282
device.type: ups
driver.debug: 0
driver.flag.allow_killpower: 0
driver.name: apcsmart
driver.parameter.override.battery.charge.low: 80
driver.parameter.pollinterval: 2
driver.parameter.port: /dev/serial/by-id/usb-Prolific_Technology_Inc._USB_2.0_To_COM_Device-if00-port0
driver.parameter.synchronous: auto
driver.state: updateinfo
driver.version: 2.8.2.922-922-g575d42369
driver.version.internal: 3.33
input.frequency: 50.00
input.quality: FF
input.sensitivity: M
input.transfer.high: 253
input.transfer.low: 200
input.transfer.reason: unacceptable utility voltage rate of change
input.voltage: 213.1
input.voltage.maximum: 213.1
input.voltage.minimum: 211.6
output.voltage: 211.6
output.voltage.nominal: 230
ups.delay.shutdown: 060
ups.delay.start: 000
ups.firmware: 725.1.I
ups.id: UPS_IDEN
ups.load: 031.8
ups.mfr: APC
ups.mfr.date: 11/14/06
ups.model: Smart-UPS SC420
ups.serial: QS0647321282
ups.status: OL LB
ups.test.interval: 1209600
ups.test.result: NO

@jimklimov
Copy link
Member

Just to clarify: there are several programs (wrapped into services) connecting not to "itself" but to each other, e.g. the driver (e.g. apcsmart) talking over a local unix socket to the data server (upsd, nut-server.service), which in turn talks over TCP to clients like upsc or upsmon (nut-monitor).

Logs seem to indicate that the driver did not start, and so upsd can not talk to it.

You were starting it by upsdrvctl as in the olden times; nowadays on Linux the nut-driver-enumerator service (if installed and activated) could have created the nut-driver@yourdevicename.service instance and it would conflict with a driver started manually - more details on Wiki. This is consistent with it eventually working, as systemd restarted a failed service instance (I suppose).

@jimklimov
Copy link
Member

In any case, thanks for the report - also in the area that upsdrvctl might warn about systemd support being present, and its implications. :)

I also thought the two copies of the driver with modern protocol model would find the older running copy and arrange... something.

Not sure what [d] unrecognized is about, but if they both did manage to connect to the serial port and each process reads every other byte or so, that could be confusing too...

@nnov4k
Copy link

nnov4k commented Aug 21, 2024

after a while (~20h) driver kills syslog with flood write :)

PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND

13773 root 20 0 440556 86688 5812 R 148.3 4.3 25:49.76 rsyslogd
7534 nut 20 0 2432 1552 1408 R 100.0 0.1 21:51.33 apcsmart

service nut-server stop
service nut-driver stop
does not kills process

logs:
daemon.log:
Aug 21 19:32:21 ha apcsmart[7534]: Communications with UPS lost: serial port read timeout: 816(poll_data)
Aug 21 19:32:21 ha apcsmart[7534]: Communications with UPS lost: serial port write error: 1447(smartmode): Input/output error
Aug 21 19:32:21 ha nut-server[6415]: Data for UPS [apc] is stale - check driver
Aug 21 19:32:21 ha apcsmart[7534]: Communications with UPS lost: serial port write error: 1447(smartmode): Input/output error
flood...

kernel.log
detach + attach usb hub + ups usb-com converter

messages:
Aug 21 19:30:05 ha rsyslogd[13773]: imjournal: journal files changed, reloading... [v8.2102.0 try https://www.rsyslog.com/e/0 ]
Aug 21 19:32:21 ha kernel: [79785.565042] pl2303 ttyUSB2: pl2303 converter now disconnected from ttyUSB2
Aug 21 19:32:21 ha kernel: [79785.568009] pl2303 1-1.3.4:1.0: device disconnected
Aug 21 19:32:21 ha kernel: pl2303 ttyUSB2: pl2303 converter now disconnected from ttyUSB2
Aug 21 19:32:22 ha kernel: [79786.860498] usb 1-1.3.4: New USB device found, idVendor=067b, idProduct=2303, bcdDevice= 3.00
Aug 21 19:32:22 ha kernel: [79786.860525] usb 1-1.3.4: New USB device strings: Mfr=1, Product=2, SerialNumber=0
Aug 21 19:32:22 ha kernel: [79786.860532] usb 1-1.3.4: Product: USB 2.0 To COM Device
Aug 21 19:32:22 ha kernel: [79786.860537] usb 1-1.3.4: Manufacturer: Prolific Technology Inc.
Aug 21 19:32:22 ha kernel: [79786.926207] pl2303 1-1.3.4:1.0: pl2303 converter detected
Aug 21 19:32:22 ha kernel: [79786.933207] usb 1-1.3.4: pl2303 converter now attached to ttyUSB7
Aug 21 19:32:22 ha kernel: usb 1-1.3.4: New USB device found, idVendor=067b, idProduct=2303, bcdDevice= 3.00
Aug 21 19:32:22 ha kernel: usb 1-1.3.4: New USB device strings: Mfr=1, Product=2, SerialNumber=0
Aug 21 19:32:22 ha kernel: usb 1-1.3.4: Product: USB 2.0 To COM Device
Aug 21 19:32:22 ha kernel: usb 1-1.3.4: Manufacturer: Prolific Technology Inc.
Aug 21 19:32:22 ha kernel: pl2303 1-1.3.4:1.0: pl2303 converter detected
Aug 21 19:32:22 ha kernel: usb 1-1.3.4: pl2303 converter now attached to ttyUSB7
Aug 21 19:32:34 ha rsyslogd: [origin software="rsyslogd" swVersion="8.2102.0" x-pid="13773" x-info="https://www.rsyslog.com"] rsyslogd was HUPed
Aug 21 19:32:34 ha rsyslogd[13773]: [origin software="rsyslogd" swVersion="8.2102.0" x-pid="13773" x-info="https://www.rsyslog.com"] rsyslogd was HUPed
Aug 21 19:32:40 ha rsyslogd: rsyslogd[internal_messages]: 23744 messages lost due to rate-limiting (500 allowed within 5 seconds)
Aug 21 19:32:40 ha rsyslogd[13773]: rsyslogd[internal_messages]: 23744 messages lost due to rate-limiting (500 allowed within 5 seconds)
Aug 21 19:32:46 ha rsyslogd: rsyslogd[internal_messages]: 23840 messages lost due to rate-limiting (500 allowed within 5 seconds)
Aug 21 19:32:46 ha rsyslogd[13773]: rsyslogd[intern

syslog:
Aug 21 19:32:21 ha kernel: [79785.353339] pl2303 ttyUSB2: usb_serial_generic_read_bulk_callback - urb stopped: -32
Aug 21 19:32:21 ha kernel: [79785.353339] pl2303 ttyUSB2: usb_serial_generic_read_bulk_callback - urb stopped: -32
Aug 21 19:32:21 ha kernel: [79785.519601] pl2303 ttyUSB2: pl2303_read_int_callback - usb_submit_urb failed with result -19
Aug 21 19:32:21 ha kernel: [79785.519601] pl2303 ttyUSB2: pl2303_read_int_callback - usb_submit_urb failed with result -19
Aug 21 19:32:21 ha kernel: [79785.553105] pl2303 ttyUSB2: error sending break = -19
Aug 21 19:32:21 ha kernel: [79785.553105] pl2303 ttyUSB2: error sending break = -19
Aug 21 19:32:21 ha kernel: [79785.565042] pl2303 ttyUSB2: pl2303 converter now disconnected from ttyUSB2
Aug 21 19:32:21 ha kernel: [79785.565042] pl2303 ttyUSB2: pl2303 converter now disconnected from ttyUSB2
Aug 21 19:32:21 ha kernel: [79785.568009] pl2303 1-1.3.4:1.0: device disconnected
Aug 21 19:32:21 ha kernel: [79785.568009] pl2303 1-1.3.4:1.0: device disconnected
Aug 21 19:32:21 ha apcsmart[7534]: Communications with UPS lost: serial port read timeout: 816(poll_data)
Aug 21 19:32:21 ha apcsmart[7534]: Communications with UPS lost: serial port read timeout: 816(poll_data)
Aug 21 19:32:21 ha apcsmart[7534]: Communications with UPS lost: serial port write error: 1447(smartmode): Input/output error
Aug 21 19:32:21 ha apcsmart[7534]: Communications with UPS lost: serial port write error: 1447(smartmode): Input/output error
Aug 21 19:32:21 ha nut-server[6415]: Data for UPS [apc] is stale - check driver
Aug 21 19:32:21 ha apcsmart[7534]: Communications with UPS lost: serial port write error: 1447(smartmode): Input/output error
Aug 21 19:32:21 ha apcsmart[7534]: Communications with UPS lost: serial port write error: 1447(smartmode): Input/output error
flood...
Aug 21 19:32:21 ha apcsmart[7534]: Communications with UPS lost: serial port write error: 1447(smartmode): Input/output error
Aug 21 19:32:21 ha apcsmart[7534]: WARNING: send_to_all: write 34 bytes to socket 6 failed (ret=-1), disconnecting: Resource temporarily unavailable
Aug 21 19:32:21 ha apcsmart[7534]: send_to_all: synchronous mode was 'auto', will try 'on' for next connections

@jimklimov
Copy link
Member

jimklimov commented Aug 21, 2024

Whoa, wait. It seems you've built from commit 575d423 which is current tip of master branch. The PR is still open - its changes are not in your build.

Can you please retry with its actual source branch? :)

e.g.

:; git clone https://github.com/jimklimov/nut -b issue-704 nut-issue-704
:; cd nut-issue-704
# the config/build rituals

@nnov4k
Copy link

nnov4k commented Aug 21, 2024

of course, wait a moment

@jimklimov
Copy link
Member

Cheers, any news, by chance? :)

@nnov4k
Copy link

nnov4k commented Aug 24, 2024

i'm sorry for delay... )
everything is OK, driver correctly works with hardware restarts
uptime 3 days

root@ha:/var/log.hdd# service nut-server status
● nut-server.service - Network UPS Tools - power devices information server
Loaded: loaded (/lib/systemd/system/nut-server.service; enabled; vendor preset: enabled)
Active: active (running) since Wed 2024-08-21 23:44:20 MSK; 2 days ago
Process: 56418 ExecStartPre=/usr/bin/systemd-tmpfiles --create /nut-common-tmpfiles.conf (code=exited, status=1/FAILURE)
Process: 56421 ExecStartPost=/bin/grep -E Units|Max open files /proc/${MAINPID}/limits (code=exited, status=0/SUCCESS)
Main PID: 56420 (upsd)
Tasks: 1 (limit: 2171)
Memory: 2.2M
CGroup: /system.slice/nut-server.service
└─56420 /usr/sbin/upsd -F

Warning: journal has been rotated since unit was started, output may be incomplete.
root@ha:/var/log.hdd# service nut-driver status
● nut-driver.service - Network UPS Tools - power device driver controller
Loaded: loaded (/lib/systemd/system/nut-driver.service; static)
Active: inactive (dead)
root@ha:/var/log.hdd# upsc apc
battery.alarm.threshold: 0
battery.charge: 100.0
battery.charge.low: 80
battery.charge.restart: 00
battery.date: 11/14/06
battery.runtime: 1020
battery.runtime.low: 600
battery.voltage: 13.77
battery.voltage.nominal: 012
device.mfr: APC
device.model: Smart-UPS SC420
device.serial: QS0647321282
device.type: ups
driver.debug: 0
driver.flag.allow_killpower: 0
driver.name: apcsmart
driver.parameter.override.battery.charge.low: 80
driver.parameter.pollinterval: 2
driver.parameter.port: /dev/serial/by-id/usb-Prolific_Technology_Inc._USB_2.0_To_COM_Device-if00-port0
driver.parameter.synchronous: auto
driver.state: updateinfo
driver.version: 2.7.4.10178.17-10195-gd7f6d61b2
driver.version.internal: 3.34
input.frequency: 50.00
input.quality: FF
input.sensitivity: M
input.transfer.high: 253
input.transfer.low: 200
input.transfer.reason: simulated power failure or UPS test
input.voltage: 217.4
input.voltage.maximum: 217.4
input.voltage.minimum: 214.5
output.voltage: 214.5
output.voltage.nominal: 230
ups.delay.shutdown: 060
ups.delay.start: 000
ups.firmware: 725.1.I
ups.id: UPS_IDEN
ups.load: 018.2
ups.mfr: APC
ups.mfr.date: 11/14/06
ups.model: Smart-UPS SC420
ups.serial: QS0647321282
ups.status: OL
ups.test.interval: 1209600
ups.test.result: NO
root@ha:/var/log.hdd#

@jimklimov
Copy link
Member

jimklimov commented Aug 25, 2024

Super! And no more log-floods?

@jimklimov
Copy link
Member

jimklimov commented Aug 25, 2024

In case of a hiccup that would flood before, it should log "recovered the connection" now eventually.

Off-topic, version: 2.7.4.10178.17-10195-gd7f6d61b2 looks fun. I guess your checkout somehow missed newer git tags, so based the local view of history on the newest it had... 10k commits on master since Arnaud last released NUT, how time flies...

@jimklimov
Copy link
Member

Also, nowadays nut-driver.service as such should be inert. NDE (see wiki) would manage variable-named instances like nut-driver@apc.service.

@jimklimov jimklimov added the impacts-release-2.8.2 Issues reported against NUT release 2.8.2 (maybe vanilla or with minor packaging tweaks) label Aug 25, 2024
@nnov4k
Copy link

nnov4k commented Aug 26, 2024

as a result i think i have bad china USB to RS232 converter, not fully transparent for UPS.
but new version really fixes flood.

===autorun failed
Aug 26 18:57:01 ha systemd[1]: nut-monitor.service: Scheduled restart job, restart counter is at 5.
Aug 26 18:57:01 ha systemd[1]: Stopped Network UPS Tools - power device monitor and shutdown controller.
Aug 26 18:57:01 ha systemd[1]: nut-server.service: Start request repeated too quickly.
Aug 26 18:57:01 ha systemd[1]: nut-server.service: Failed with result 'exit-code'.
Aug 26 18:57:01 ha systemd[1]: Failed to start Network UPS Tools - power devices information server.
Aug 26 18:57:01 ha systemd[1]: nut-monitor.service: Start request repeated too quickly.
Aug 26 18:57:01 ha systemd[1]: nut-monitor.service: Failed with result 'exit-code'.
Aug 26 18:57:01 ha systemd[1]: Failed to start Network UPS Tools - power device monitor and shutdown controller.

===after manual upsdrvctl start
Aug 26 22:11:31 ha systemd[1]: Starting Network UPS Tools - power device driver controller...
Aug 26 22:11:31 ha upsdrvctl[15789]: Network UPS Tools upsdrvctl - UPS driver controller 2.7.4.10178.17-10195-gd7f6d61b2 (development iteration after 2.7.4)
Aug 26 22:11:31 ha upsdrvctl[15790]: Network UPS Tools driver 2.7.4.10178.17-10195-gd7f6d61b2 (development iteration after 2.7.4) - APC Smart protocol driver 3.34
Aug 26 22:11:31 ha upsdrvctl[15790]: APC command table version 3.1
Aug 26 22:11:31 ha kernel: pl2303 ttyUSB0: pl2303_get_line_request - failed: -32
Aug 26 22:11:31 ha kernel: pl2303 ttyUSB0: pl2303_get_line_request - failed: -32
Aug 26 22:11:33 ha upsdrvctl[15790]: [d] unrecognized
Aug 26 22:11:33 ha apcsmart[15790]: [d] unrecognized
Aug 26 22:11:39 ha apcsmart[15792]: Startup successful
Aug 26 22:11:39 ha apcsmart[15792]: upsnotify: failed to notify about state 2: no notification tech defined, will not spam more about it
Aug 26 22:11:39 ha systemd[1]: Started Network UPS Tools - power device driver controller.
Aug 26 22:11:39 ha systemd[1]: Stopping Network UPS Tools - power device driver controller...
Aug 26 22:11:39 ha upsdrvctl[15793]: Network UPS Tools upsdrvctl - UPS driver controller 2.7.4.10178.17-10195-gd7f6d61b2 (development iteration after 2.7.4)
Aug 26 22:11:39 ha apcsmart[15792]: Signal 15: exiting
Aug 26 22:11:40 ha kernel: pl2303 ttyUSB0: error sending break = -32
Aug 26 22:11:41 ha systemd[1]: nut-driver.service: Succeeded.
Aug 26 22:11:41 ha systemd[1]: Stopped Network UPS Tools - power device driver controller.
Aug 26 22:13:16 ha systemd[1]: Starting Network UPS Tools - power devices information server...
Aug 26 22:13:16 ha nut-server[15899]: Limit Soft Limit Hard Limit Units
Aug 26 22:13:16 ha nut-server[15899]: Max open files 1048576 1048576 files
Aug 26 22:13:16 ha systemd[1]: Started Network UPS Tools - power devices information server.
Aug 26 22:13:16 ha nut-server[15898]: Network UPS Tools upsd 2.7.4.10178.17-10195-gd7f6d61b2 (development iteration after 2.7.4)
Aug 26 22:13:16 ha nut-server[15898]: listening on 127.0.0.1 port 3493
Aug 26 22:13:16 ha upsd[15898]: listening on 127.0.0.1 port 3493
Aug 26 22:13:16 ha upsd[15898]: listening on ::1 port 3493
Aug 26 22:13:16 ha nut-server[15898]: listening on ::1 port 3493
Aug 26 22:13:16 ha nut-server[15898]: listening on 192.168.2.10 port 3493
Aug 26 22:13:16 ha upsd[15898]: listening on 192.168.2.10 port 3493
Aug 26 22:13:16 ha upsd[15898]: Can't connect to UPS [apc] (apcsmart-apc): No such file or directory
Aug 26 22:13:16 ha nut-server[15898]: Can't connect to UPS [apc] (apcsmart-apc): No such file or directory
Aug 26 22:13:16 ha nut-server[15898]: Found 1 UPS defined in ups.conf
Aug 26 22:13:16 ha upsd[15898]: Found 1 UPS defined in ups.conf
Aug 26 22:13:16 ha nut-server[15898]: Running as foreground process, not saving a PID file
Aug 26 22:13:16 ha nut-server[15898]: upsnotify: failed to notify about state 2: no notification tech defined, will not spam more about it
Aug 26 22:13:16 ha upsd[15898]: Running as foreground process, not saving a PID file
Aug 26 22:13:16 ha upsd[15898]: upsnotify: failed to notify about state 2: no notification tech defined, will not spam more about it
Aug 26 22:13:33 ha systemd[1]: Starting Network UPS Tools - power device driver controller...
Aug 26 22:13:33 ha upsdrvctl[15908]: Network UPS Tools upsdrvctl - UPS driver controller 2.7.4.10178.17-10195-gd7f6d61b2 (development iteration after 2.7.4)
Aug 26 22:13:33 ha upsdrvctl[15909]: Network UPS Tools driver 2.7.4.10178.17-10195-gd7f6d61b2 (development iteration after 2.7.4) - APC Smart protocol driver 3.34
Aug 26 22:13:33 ha upsdrvctl[15909]: APC command table version 3.1
Aug 26 22:13:33 ha kernel: pl2303 ttyUSB0: pl2303_get_line_request - failed: -32
Aug 26 22:13:35 ha upsdrvctl[15909]: [d] unrecognized
Aug 26 22:13:35 ha apcsmart[15909]: [d] unrecognized
Aug 26 22:13:41 ha apcsmart[15910]: Startup successful
Aug 26 22:13:41 ha apcsmart[15910]: upsnotify: failed to notify about state 2: no notification tech defined, will not spam more about it
Aug 26 22:13:41 ha systemd[1]: Started Network UPS Tools - power device driver controller.
Aug 26 22:13:41 ha systemd[1]: Stopping Network UPS Tools - power device driver controller...
Aug 26 22:13:41 ha upsdrvctl[15911]: Network UPS Tools upsdrvctl - UPS driver controller 2.7.4.10178.17-10195-gd7f6d61b2 (development iteration after 2.7.4)
Aug 26 22:13:41 ha apcsmart[15910]: Signal 15: exiting
Aug 26 22:13:42 ha kernel: pl2303 ttyUSB0: error sending break = -32
Aug 26 22:13:43 ha systemd[1]: nut-driver.service: Succeeded.
Aug 26 22:13:43 ha systemd[1]: Stopped Network UPS Tools - power device driver controller.
Aug 26 22:14:51 ha kernel: pl2303 ttyUSB0: pl2303_get_line_request - failed: -32
Aug 26 22:14:53 ha apcsmart[15983]: [d] unrecognized
Aug 26 22:14:59 ha apcsmart[15985]: Startup successful
Aug 26 22:14:59 ha apcsmart[15985]: upsnotify: failed to notify about state 2: no notification tech defined, will not spam more about it
Aug 26 22:14:59 ha nut-server[15898]: Connected to UPS [apc]: apcsmart-apc
Aug 26 22:14:59 ha upsd[15898]: Connected to UPS [apc]: apcsmart-apc
Aug 26 22:15:00 ha apcsmart[15985]: sock_connect: enabling asynchronous mode (auto)
Aug 26 22:15:00 ha apcsmart[15985]: sock_connect: enabling asynchronous mode (auto)

@nnov4k
Copy link

nnov4k commented Aug 27, 2024

reconnect log:
Aug 27 19:30:08 ha apcsmart[15985]: Communications with UPS lost: serial port write error: 1478(smartmode): Input/output error
Aug 27 19:30:09 ha apcsmart[15985]: Communications with UPS lost: serial port write error: 1478(smartmode): Input/output error
Aug 27 19:30:10 ha apcsmart[15985]: Communications with UPS lost: serial port write error: 1478(smartmode): Input/output error
Aug 27 19:30:11 ha apcsmart[15985]: Communications with UPS lost: serial port write error: 1478(smartmode): Input/output error
Aug 27 19:30:12 ha apcsmart[15985]: Communications with UPS lost: serial port write error: 1478(smartmode): Input/output error
Aug 27 19:30:13 ha apcsmart[15985]: Communications with UPS lost: serial port write error: 1478(smartmode): Input/output error
Aug 27 19:30:14 ha apcsmart[15985]: Communications with UPS lost: serial port write error: 1478(smartmode): Input/output error
Aug 27 19:30:15 ha apcsmart[15985]: Warning: excessive comm failures, limiting error reporting
Aug 27 19:30:15 ha apcsmart[15985]: Communications with UPS lost: serial port write error: 1478(smartmode): Input/output error
Aug 27 19:30:25 ha apcsmart[15985]: Communications with UPS lost: serial port write error: 1478(smartmode): Input/output error
Aug 27 19:30:35 ha apcsmart[15985]: Communications with UPS lost: serial port write error: 1478(smartmode): Input/output error
Aug 27 19:30:45 ha apcsmart[15985]: Communications with UPS lost: serial port write error: 1478(smartmode): Input/output error
Aug 27 19:30:55 ha apcsmart[15985]: Communications with UPS lost: serial port write error: 1478(smartmode): Input/output error
Aug 27 19:30:57 ha apcsmart[15985]: Trying to reconnect to the UPS
Aug 27 19:31:02 ha kernel: pl2303 ttyUSB7: pl2303_get_line_request - failed: -32
Aug 27 19:31:02 ha kernel: pl2303 ttyUSB7: pl2303_get_line_request - failed: -32
Aug 27 19:31:03 ha apcsmart[15985]: Communications with UPS re-established
Aug 27 19:31:04 ha apcsmart[15985]: [d] unrecognized
Aug 27 19:31:08 ha nut-server[15898]: UPS [apc] data is no longer stale
Aug 27 19:31:08 ha upsd[15898]: UPS [apc] data is no longer stale
Aug 27 19:31:10 ha apcsmart[15985]: upsdrv_updateinfo: recovered the connection

@jimklimov
Copy link
Member

Looks great now, thanks!

@jimklimov
Copy link
Member

PR merged, keeping issue open for HCL/DDL update reminder.

@jimklimov jimklimov added the DDL label Sep 12, 2024
@vadegdadeg
Copy link

vadegdadeg commented Oct 11, 2024

PR merged, keeping issue open for HCL/DDL update reminder.

Hello! Tell me, is your patch already in some release or do you have to apply it yourself?
Forgive my ignorance, but I have a hard time with Git Hub and all its tools...

@jimklimov
Copy link
Member

jimklimov commented Oct 11, 2024

Currently it is only on master branch, should be easy to build though (see wiki footer links). Will be part of release 2.8.3 and later.

UPDATE: link: https://github.com/networkupstools/nut/wiki/Building-NUT-for-in%E2%80%90place-upgrades-or-non%E2%80%90disruptive-tests

@vadegdadeg
Copy link

vadegdadeg commented Oct 12, 2024

Thank you very much for the work done! After all, it was a time bomb. If the USB to serial converter fell off, the disk became tightly clogged! But now it doesn't. It even tries to reconnect, although I didn’t have time to plug the converter back in and the driver closed. I compiled it in Gentoo as version 9999. If you need any tests on this topic, write to me. My UPS Smart UPS SC620 2 pieces.

Oct 12 12:39:12 test_box apcsmart[31543]: [d] unrecognized
Oct 12 12:39:17 test_box apcsmart[31546]: Startup successful
Oct 12 12:39:17 test_box apcsmart[31546]: upsnotify: failed to notify about state 2: no notification tech defined, will not spam more about it
Oct 12 12:40:18 test_box upsd[31578]: listening on 127.0.0.1 port 3493
Oct 12 12:40:18 test_box upsd[31578]: Connected to UPS [1-SC620]: apcsmart-1-SC620
Oct 12 12:40:18 test_box upsd[31578]: Found 1 UPS defined in ups.conf
Oct 12 12:40:18 test_box upsd[31581]: Startup successful
Oct 12 12:40:18 test_box upsd[31581]: upsnotify: failed to notify about state 2: no notification tech defined, will not spam more about it
Oct 12 12:40:18 test_box apcsmart[31546]: sock_connect: enabling asynchronous mode (auto)
Oct 12 12:41:17 test_box apcsmart[31546]: Communications with UPS lost: serial port write error: 859(update_status): Input/output error
Oct 12 12:41:17 test_box upsd[31581]: Data for UPS [1-SC620] is stale - check driver
Oct 12 12:41:27 test_box apcsmart[31546]: Communications with UPS lost: serial port write error: 1478(smartmode): Input/output error
Oct 12 12:41:28 test_box apcsmart[31546]: Communications with UPS lost: serial port write error: 1478(smartmode): Input/output error
Oct 12 12:41:29 test_box apcsmart[31546]: Communications with UPS lost: serial port write error: 1478(smartmode): Input/output error
Oct 12 12:41:30 test_box apcsmart[31546]: Communications with UPS lost: serial port write error: 1478(smartmode): Input/output error
Oct 12 12:41:31 test_box apcsmart[31546]: Communications with UPS lost: serial port write error: 1478(smartmode): Input/output error
Oct 12 12:41:32 test_box apcsmart[31546]: Communications with UPS lost: serial port write error: 1478(smartmode): Input/output error
Oct 12 12:41:33 test_box apcsmart[31546]: Communications with UPS lost: serial port write error: 1478(smartmode): Input/output error
Oct 12 12:41:34 test_box apcsmart[31546]: Communications with UPS lost: serial port write error: 1478(smartmode): Input/output error
Oct 12 12:41:35 test_box apcsmart[31546]: Warning: excessive comm failures, limiting error reporting
Oct 12 12:41:35 test_box apcsmart[31546]: Communications with UPS lost: serial port write error: 1478(smartmode): Input/output error
Oct 12 12:41:45 test_box apcsmart[31546]: Communications with UPS lost: serial port write error: 1478(smartmode): Input/output error
Oct 12 12:41:55 test_box apcsmart[31546]: Communications with UPS lost: serial port write error: 1478(smartmode): Input/output error
Oct 12 12:42:05 test_box apcsmart[31546]: Communications with UPS lost: serial port write error: 1478(smartmode): Input/output error
Oct 12 12:42:15 test_box apcsmart[31546]: Communications with UPS lost: serial port write error: 1478(smartmode): Input/output error
Oct 12 12:42:17 test_box apcsmart[31546]: Trying to reconnect to the UPS
Oct 12 12:42:22 test_box apcsmart[31546]: Fatal error: unusable configuration
Oct 12 12:42:22 test_box upsd[31581]: Can't connect to UPS [1-SC620] (apcsmart-1-SC620): No such file or directory
Oct 12 12:43:49 test_box /etc/init.d/upsdrv[31711]: ERROR: upsdrv failed to stop
Oct 12 12:44:25 test_box /etc/init.d/upsdrv[31730]: WARNING: upsdrv has already been started
Oct 12 12:44:33 test_box apcsmart[31753]: [d] unrecognized
Oct 12 12:44:39 test_box apcsmart[31754]: Startup successful
Oct 12 12:44:39 test_box apcsmart[31754]: upsnotify: failed to notify about state 2: no notification tech defined, will not spam more about it
Oct 12 12:44:40 test_box upsd[31581]: Connected to UPS [1-SC620]: apcsmart-1-SC620
Oct 12 12:44:40 test_box apcsmart[31754]: sock_connect: enabling asynchronous mode (auto)

@jimklimov
Copy link
Member

@vadegdadeg : I suppose, as a data point - a DDL dump with modern driver builds (to know what they report, what not, how correctly) would be great. Can be directly as a PR to https://github.com/networkupstools/nut-ddl/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
APC bug Connection stability issues Issues about driver<->device and/or networked connections (upsd<->upsmon...) going AWOL over time DDL impacts-release-2.7.4 Issues reported against NUT release 2.7.4 (maybe vanilla or with minor packaging tweaks) impacts-release-2.8.2 Issues reported against NUT release 2.8.2 (maybe vanilla or with minor packaging tweaks) Linux Some issues are specific to Linux as a platform serial port USB
Projects
Status: Todo
Development

No branches or pull requests

7 participants