Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling of lockfile on crash/restart #7

Closed
GWustenstein opened this issue Nov 5, 2022 · 28 comments
Closed

Handling of lockfile on crash/restart #7

GWustenstein opened this issue Nov 5, 2022 · 28 comments
Assignees

Comments

@GWustenstein
Copy link

I've spent the period since the last release beating uhppoted-tunnel in some pretty terrible ways; the vendor's app in monitor mode (5s updates) constantly, there's a python script running as a daemon to pull status and events parsing them into its own db (5s updates), there's another python daemon running as a UDP CRUD service on the same port that monitors the vendor's app MS Access database and if it sees any changes in the group and username tables it pushes them out to the remote site db using the same secure tunnel service (just uses a 'magic' device serial number, opcodes, and special data hash.)

Topology:
Win10-udp---udp-RPi-tls--------tls-VPS-tls------tls-RPi-udp---udp-UHPPOTE
db daemon db daemon
udp db client udp db server
udb db client

This entire mess has been working pretty well all this time with the new release, and a few random network hits, the system recovers after about 10minutes or so. And I've been doing development of the local and remote db applications, which means I've been nuking the event tables and re-syncing thousands of events several times a day over the tunnel all while the app is up and running.

Today, however, with several power hits hitting pretty much every node (and sub-node) in every permutation imaginable, the remote tunnel node got lost in such a way that it left the lockfile behind. Looking through syslog, there's a ton of lines, starting off after a reboot:

Nov 4 15:32:14 netmon systemd[1]: session-5.scope: Succeeded.
Nov 4 15:55:11 netmon rc.local[841]: fatal error: concurrent map read and map write
Nov 4 15:55:11 netmon rc.local[841]: goroutine 5329 [running]:
Nov 4 15:55:11 netmon rc.local[841]: runtime.throw({0x3c4576, 0x21})
Nov 4 15:55:11 netmon rc.local[841]: #011runtime/panic.go:992 +0x5c fp=0x2856ee0 sp=0x2856ecc pc=0x47bf8
Nov 4 15:55:11 netmon rc.local[841]: runtime.mapaccess2_fast32(0x378680, 0x2800480, 0x306ee)
Nov 4 15:55:11 netmon rc.local[841]: #011runtime/map_fast32.go:62 +0x194 fp=0x2856ef0 sp=0x2856ee0 pc=0x202ac
Nov 4 15:55:11 netmon rc.local[841]: github.com/uhppoted/uhppoted-tunnel/router.(*Router).get(0x625478, 0x306ee)
Nov 4 15:55:11 netmon rc.local[841]: #011github.com/uhppoted/uhppoted-tunnel/router/router.go:92 +0x3c fp=0x2856f50 sp=0x2856ef0 pc=0x2b74c8
Nov 4 15:55:11 netmon rc.local[841]: github.com/uhppoted/uhppoted-tunnel/router.(*Switch).Received(0x280e988, 0x306ee, {0x2a73006, 0x1c, 0x7fa}, 0x2a700f0)
Nov 4 15:55:11 netmon rc.local[841]: #011github.com/uhppoted/uhppoted-tunnel/router/router.go:62 +0x24 fp=0x2856f64 sp=0x2856f50 pc=0x2b7108
Nov 4 15:55:11 netmon rc.local[841]: github.com/uhppoted/uhppoted-tunnel/tunnel/tls.(*tlsClient).received(0x28a00a0, {0x2a73000, 0x22, 0x800}, 0x280e988, {0x4415b4, 0x29a2000})
Nov 4 15:55:11 netmon rc.local[841]: #011github.com/uhppoted/uhppoted-tunnel/tunnel/tls/tls_client.go:162 +0x128 fp=0x2856fcc sp=0x2856f64 pc=0x339c24
Nov 4 15:55:11 netmon rc.local[841]: github.com/uhppoted/uhppoted-tunnel/tunnel/tls.(*tlsClient).listen.func1()
Nov 4 15:55:11 netmon rc.local[841]: #011github.com/uhppoted/uhppoted-tunnel/tunnel/tls/tls_client.go:150 +0x54 fp=0x2856fec sp=0x2856fcc pc=0x339a88
Nov 4 15:55:11 netmon rc.local[841]: runtime.goexit()
Nov 4 15:55:11 netmon rc.local[841]: #011runtime/asm_arm.s:824 +0x4 fp=0x2856fec sp=0x2856fec pc=0x7a140
Nov 4 15:55:11 netmon rc.local[841]: created by github.com/uhppoted/uhppoted-tunnel/tunnel/tls.(*tlsClient).listen
Nov 4 15:55:11 netmon rc.local[841]: #011github.com/uhppoted/uhppoted-tunnel/tunnel/tls/tls_client.go:149 +0xfc
Nov 4 15:55:11 netmon rc.local[841]: goroutine 1 [chan receive, 77 minutes]:
Nov 4 15:55:11 netmon rc.local[841]: github.com/uhppoted/uhppoted-tunnel/commands.(*Run).run(0x625900, 0x28a80d8, {0x440ad8, 0x284e180}, 0x280e6b8, 0x289c140)
Nov 4 15:55:11 netmon rc.local[841]: #011github.com/uhppoted/uhppoted-tunnel/commands/run.go:259 +0x228
Nov 4 15:55:11 netmon rc.local[841]: github.com/uhppoted/uhppoted-tunnel/commands.(*Run).exec(0x625900, 0x28a80d8, {0x440ad8, 0x284e180}, 0x280e6b8)
Nov 4 15:55:11 netmon rc.local[841]: #011github.com/uhppoted/uhppoted-tunnel/commands/run_linux.go:66 +0x27c
Nov 4 15:55:11 netmon rc.local[841]: github.com/uhppoted/uhppoted-tunnel/commands.(*Run).Execute.func1(0x28a80d8, {0x440ad8, 0x284e180}, 0x280e6b8)
Nov 4 15:55:11 netmon rc.local[841]: #011github.com/uhppoted/uhppoted-tunnel/commands/run_linux.go:33 +0x3c
.
then ends up
.
Nov 4 15:55:12 netmon rc.local[841]: #011github.com/uhppoted/uhppoted-tunnel/tunnel/tunnel.go:60 +0x24c
Nov 4 15:55:12 netmon rc.local[841]: goroutine 33 [select, 77 minutes]:
Nov 4 15:55:12 netmon rc.local[841]: github.com/uhppoted/uhppoted-tunnel/tunnel/tls.(*tlsClient).connect.func1()
Nov 4 15:55:12 netmon rc.local[841]: #011github.com/uhppoted/uhppoted-tunnel/tunnel/tls/tls_client.go:109 +0x154
Nov 4 15:55:12 netmon rc.local[841]: created by github.com/uhppoted/uhppoted-tunnel/tunnel/tls.(*tlsClient).connect
Nov 4 15:55:12 netmon systemd[1]: rc-local.service: Control process exited, code=exited, status=2/INVALIDARGUMENT

All of this seems to happen before the system has really booted. Maybe the unit file needs to be modified to wait for full multi-user or maybe network? On this RPi4 running Bullseye, I have about a 1min network wait to deal with the dumpster fire that is Debian-11/RPi's handling of the WiFi drivers.

In any event, there seems to be something missing in clean-up; if it crashes, it should take the lockfile with it, or the unit script should look for the pid and 86 both it and the file and start anew.

@uhppoted
Copy link
Owner

uhppoted commented Nov 5, 2022

Those storms must have been quite something!

There's quite a bit to unpack in that log:

  • There might be a mutex missing around one of the router tables (concurrent map read and map write) - will look into that
  • The goroutine 33 [select, 77 minutes] ... goexit is something I've never seen - will have to dig into it to see exactly what is happening there
  • Good point on waiting for multi-user/network in the service file - will implement that (actually that needs to be implemented across the board)

The lockfile needs some thinking about though - basically there's no way to guarantee that a crash won't leave a lockfile behind so the recovery needs to happen on service start. There are a couple of options I can think of right off the bat:

  • Allow the lockfile to be opted out in the conf file (e.g. lockfile=(none) or somesuch). Probably not such a good idea but it's simple
  • Make the lockfile a soft lock a lá the fix for the uhppoted-mqtt client lockfile.
  • Deleting it the service start (also a lá the uhppoted-mqtt fix) . Am a bit reluctant to make that the default but it could be an 'advanced user' option
  • An "I'm alive" hook so something like an external service or cron job could monitor the daemon and clean up if necessary

Let me give it some thought..

@GWustenstein
Copy link
Author

There is a lot more in the log, if you give me a few tidbits of keywords to look for I can try to ferret out more gotchas, but we'd all like the ultimate goal of this beast to be unattended and robust. First and foremost, I need it to forward alarms and fortunately I'm using Pushover for that, which does a nice job of sending an event summary, a snap of the door, etc. to my phone/desktop/whatever so it does not depend on Tunnel. But who's in the building, door control, siren silence, and all that rubbish is dependent on Tunnel and I don't want to have to futz with trying to manage a wayward process from my phone (or write a watchdog script if I can help it).

@uhppoted
Copy link
Owner

uhppoted commented Nov 6, 2022

Thanks - think there's probably enough for me to start looking at it, but may ask you for more info once I get deeper into it. Or ... if there's nothing sensitive in the log file maybe attach it as a zip?

@uhppoted
Copy link
Owner

uhppoted commented Nov 7, 2022

Hi,

Are you running tunnel v0.8.2+ on the machine that crashed? I'm curious because:

Nov 4 15:55:11 netmon rc.local[841]: github.com/uhppoted/uhppoted-tunnel/router.(*Switch).Received(0x280e988, 0x306ee, {0x2a73006, 0x1c, 0x7fa}, 0x2a700f0)
Nov 4 15:55:11 netmon rc.local[841]: #011github.com/uhppoted/uhppoted-tunnel/router/router.go:62 +0x24 fp=0x2856f64 sp=0x2856f50 pc=0x2b7108

(in the actual code Received is around router.go:67 and it hasn't changed since the v0.8.2 release)

@GWustenstein
Copy link
Author

GWustenstein commented Nov 8, 2022

That would be strange, because v0.8.1 does not run as a daemon. Now you got me wondering if I managed some sort of frankeninstall, double-checking:

uhppoted-tunnel installed in /etc/uhppoted/uhpoted-tunnel and referenced as such in /etc/systemd/system/uhpoted-tunnel.service
MD5 of /etc/uhppoted/uhppoted-tunnel bcca16c461b9f2b721b0f3630ca74659

going over to the releases dir cloned from git as uhppoted_v0.8.2/uhppoted-tunnel, the MD5 is bcca16c461b9f2b721b0f3630ca74659, so I'm pretty sure it is what it is...

Is the go toolchain that messed up on ARM?

@uhppoted
Copy link
Owner

uhppoted commented Nov 8, 2022

Ummmmmmmmmm .. I get very different checksums to yours for the v0.8.2 binaries (downloaded from the releases page):

uhppoted-tunnel_v0.8.2 % cd arm7
arm7 % md5sum uhppoted-tunnel
2a336c4f898c71f7839d206745670916 uhppoted-tunnel
arm7 % sha1sum uhppoted-tunnel
4aec3de0996e0852fb067bc681f60ba7bfc9babf uhppoted-tunnel
arm7 % sha256sum uhppoted-tunnel
d2f7a39921fb7efb3ee42813c7f5be47be0191239d350f97b3073d5a81cd77a3 uhppoted-tunnel

arm7 % cd ../linux
linux % md5sum uhppoted-tunnel
ab339072716ae466ae6ef82abf0de605 uhppoted-tunnel
linux % sha1sum uhppoted-tunnel
5950dc514220713c765b2db4604ecfe4b44649a5 uhppoted-tunnel
linux % sha256sum uhppoted-tunnel
1c8471db2831cb4ec7438002734f40d65d9e0485db3cc6fc2b90eec188ce4f6d uhppoted-tunnel

Did you build it locally? Oh, wait, of course you did .. ARM64:

git checkout v0.8.2
env GOOS=linux GOARCH=arm64 GOWORK=off go build -trimpath -o dist/development/arm64 ./...
uhppoted-tunnel % cd dist/development/arm64
arm64 % md5sum uhppoted-tunnel
c351a32ccc845367b34eb6d1c02ab612 uhppoted-tunnel
arm64 % sha1sum uhppoted-tunnel
b0079f8a9cadfccde52c68228aea3b4b4144f6e5 uhppoted-tunnel
arm64 % sha256sum uhppoted-tunnel
f52f7d21a0eeaf17fc9bfb72086ecbb5f036b5fbc6970d9ab32f4a2996e691a7 uhppoted-tunnel

@GWustenstein
Copy link
Author

GWustenstein commented Nov 8, 2022

No, I'm much too lazy, lol... (NOTE THE RELEASES DIR!)
got from https://github.com/uhppoted/uhppoted/releases/download/v0.8.2/uhppoted_v0.8.2-arm7.tar.gz
just did a wget to a test directory, gunzip, tar xvf, to make sure it IS what I installed, and got the same MD5:
bcca16c461b9f2b721b0f3630ca74659

$ sha1sum uhppoted-tunnel
83439091bbfe268e5cc8cbc60fa4e01ba7e398e3 uhppoted-tunnel

@uhppoted
Copy link
Owner

uhppoted commented Nov 8, 2022

Ok, I get the same checksum now ... which is bizarre. I really hope it doesn't mean all those top level binaries are wrong - but you might want to switch to the uhppoted-tunnel releases while I figure out what is going on.

@uhppoted
Copy link
Owner

uhppoted commented Nov 9, 2022

FYI, turns out the checksums are different because the top-level release builds were not being built with GO WORK=off - not catastrophic but I'll fix the release script and add a checksum verification step.

Now to figure why the logs and the source code do not match..

@uhppoted
Copy link
Owner

uhppoted commented Nov 10, 2022

Ok, at this point in time I am reasonably convinced that when your machine crashed it was running tunnel v0.8.1:

  • the log entries match the source code of v0.8.1 exactly (and v0.8.2 not at all)
  • I've dumped and checked the symbol tables for both the uhppoted-tunnel v0.8.2 and the uhppoted v0.8.2 releases and they're both correct for v0.8.2 (and not v0.8.1) so as far as I can tell neither of the executables would have produced those log entries.

I'm afraid I have absolutely no idea how that could have happened - but here are some things you could look at if you wanted to:

  • The tunnel daemon prints its version on startup - there may be something in the logs for 4th November like:
Nov 10 21:32:25 uhppoted uhppoted-tunnel[14728]: 2022/11/10 21:32:25 uhppoted-tunnel service v0.8.2 - Linux (PID 14728)
  • sudo systemctl status "uhppoted-tunnel" should show the tunnel version when the daemon starts:
sudo systemctl status "uhppoted-tunnel-qwerty"
uhppoted-tunnel-qwerty.service - UHPPOTE UTO311-L0x access card controllers UDP tunnel service/daemon
     Loaded: loaded (/etc/systemd/system/uhppoted-tunnel-qwerty.service; disabled; preset: enabled)
     Active: active (running) since Thu 2022-11-10 21:32:25 UTC; 1min 48s ago
       Docs: https://github.com/uhppoted/uhppoted-tunnel
   Main PID: 14728 (uhppoted-tunnel)
      Tasks: 5 (limit: 1020)
     Memory: 1.3M
        CPU: 7ms
     CGroup: /system.slice/uhppoted-tunnel-qwerty.service
             └─14728 /opt/uhppoted/uhppoted_v0.8.2/uhppoted-tunnel --lockfile /var/uhppoted/uhppoted-tunnel-qwerty.pid --in udp/listen:0.0.0.0:60000 --out tcp/server:0.0.0.0:12345

Nov 10 21:32:25 uhppoted systemd[1]: Started UHPPOTE UTO311-L0x access card controllers UDP tunnel service/daemon.
Nov 10 21:32:25 uhppoted uhppoted-tunnel[14728]: 2022/11/10 21:32:25 uhppoted-tunnel service v0.8.2 - Linux (PID 14728)
  • Is (was ?) the service file (/etc/systemd/system/uhppoted-tunnel.service) pointing to the correct executable?

I'm going to look at the goroutine 33 [select, 77 minutes] thing just to try understand that - don't think there's much else I can usefully offer around the fatal error: concurrent map read and map write log entry.

@GWustenstein
Copy link
Author

Going back in time to find when this run was launched before this epic storm...
Oct 24 13:32:23 netmon systemd[1]: Started UHPPOTE UTO311-L0x access card controllers UDP tunnel service/daemon.
Oct 24 13:32:23 netmon uhppoted-tunnel[5463]: 2022/10/24 13:32:23 uhppoted-tunnel service v0.8.2 - Linux (PID 5463)
Oct 24 13:34:06 netmon systemd[1]: session-606.scope: Succeeded.
(PID 5463 was up for nearly a week until the storm hit, then it was rock-n-roll for several hours, several power and network outages).

Went forward in the logs and pulled the part where it crashed when it found the stale pid file...
Nov 4 21:15:00 netmon systemd[1]: Started UHPPOTE UTO311-L0x access card controllers UDP tunnel service/daemon.
Nov 4 21:15:00 netmon uhppoted-tunnel[1563]: 2022/11/04 21:15:00 uhppoted-tunnel service v0.8.2 - Linux (PID 1563)
Nov 4 21:15:01 netmon uhppoted-tunnel[1563]: ERROR: PID lockfile '/var/uhppoted/uhppoted-tunnel.pid' already in use
Nov 4 21:15:01 netmon systemd[1]: uhppoted-tunnel.service: Main process exited, code=exited, status=1/FAILURE
Nov 4 21:15:01 netmon systemd[1]: uhppoted-tunnel.service: Failed with result 'exit-code'.
Nov 4 21:16:25 netmon systemd[1]: Started UHPPOTE UTO311-L0x access card controllers UDP tunnel service/daemon.
Nov 4 21:16:25 netmon uhppoted-tunnel[1591]: 2022/11/04 21:16:25 uhppoted-tunnel service v0.8.2 - Linux (PID 1591)

This is the RPi-4/Debian-11 (Bullseye) build, pulled as bin off git, as lazy as possible.
The pivot node is x86 running on a VPS, but still Deb-11, tunnel v0.8.2 if that matters at all, it's been rock-solid with no crashes. The x86/win10 side has been solid as well. If I were a bettin' man, I'd say go/arm is a little effed...

@uhppoted
Copy link
Owner

Ok, am going to admit to being completely baffled by the concurrent map access issue:

Nov 4 15:55:11 netmon rc.local[841]: fatal error: concurrent map read and map write
Nov 4 15:55:11 netmon rc.local[841]: goroutine 5329 [running]:
Nov 4 15:55:11 netmon rc.local[841]: runtime.throw({0x3c4576, 0x21})
Nov 4 15:55:11 netmon rc.local[841]: #011runtime/panic.go:992 +0x5c fp=0x2856ee0 sp=0x2856ecc pc=0x47bf8
Nov 4 15:55:11 netmon rc.local[841]: runtime.mapaccess2_fast32(0x378680, 0x2800480, 0x306ee)
Nov 4 15:55:11 netmon rc.local[841]: #011runtime/map_fast32.go:62 +0x194 fp=0x2856ef0 sp=0x2856ee0 pc=0x202ac
Nov 4 15:55:11 netmon rc.local[841]: github.com/uhppoted/uhppoted-tunnel/router.(*Router).get(0x625478, 0x306ee)
Nov 4 15:55:11 netmon rc.local[841]: #011github.com/uhppoted/uhppoted-tunnel/router/router.go:92 +0x3c fp=0x2856f50 sp=0x2856ef0 pc=0x2b74c8

As far as I can see that can only occur in v0.8.1. Am going to put that on the back burner for now (pending another storm and/or inspiration :-)) and focus on the lockfile problem which is at least tangible.

twystd pushed a commit to uhppoted/uhppoted-lib that referenced this issue Nov 15, 2022
@GWustenstein
Copy link
Author

Had another wind storm, and power was glitchy, this time the terminus (end-node with the broadcast) just caught a simple reboot, which left the lockfile behind. Of course it was a major pain because the ISP rolled the IP assignment and I spent an hour divining a way to figure that out so I can SSH into the thing, but the node lacks any form of orderly shutdown on power failure so there's nothing in the log to confirm what happened other than to assume that the .pid file was simply left behind.

BTW, after profiling system usage, I'm about to migrate this function to a Pi Zero-W with a camera, so it will be moving from a 4core 64bit to 1core 32bit. There will be a python app that will be snapping pics and short video segments when events are captured from the access controller, pics will be posted to the pivot node near real-time while vids forward on a rate-limited basis for record keeping. Since all this seems to fit the Zero-W, I can return the Pi-4 to the shop for another project since they're pretty hard to get right now.

@uhppoted
Copy link
Owner

That's quite an area you live in! Have you considered using something like e.g. Tailscale for the SSH?

re. lockfile. Am working on it - turns out there's a lot more to lock files than one would think (the official Go implementation has been stuck in limbo for like 5 years). However - as an interim fix you could set the lock file to live in the /tmp folder which is (supposedly) cleared on reboot e.g.

uhppoted-tunnel --lockfile /tmp/uhppoted-tunnel.lock .....

re. camera. Huh! Interesting idea!! And yeah, a Pi 4 is way overkill for most of this stuff :-).

@GWustenstein
Copy link
Author

GWustenstein commented Nov 21, 2022

Oh, snap... Why didn't I think of that! I'm editing the unit spec rn... Thanks

Thanks for the tip on Tailscale, I'll give that a shot. VNC has been a go-to, but begrudgingly since they eliminated the free tier for anything but 30-day trials (and spawning trial credentials is even more annoying than dealing with finding the new IP)

EDIT: Wow, Tailscale is cool: My very own SDN! Thank you so much...now I can eliminate all these firewall holes!

@uhppoted
Copy link
Owner

uhppoted commented Nov 21, 2022

Tailscale is neat and does seem to have pretty much got it right - unlike all the other hacky and marginal solutions that peope have resorted to over the years.

Although I see they do have a CVE today.

@GWustenstein
Copy link
Author

GWustenstein commented Nov 21, 2022

Just read through that, since my install was today, I have the latest, but #7 is nasty, like log4j nasty. Not running http through it, but ran the curl tests to be sure there are no gotchas.

@uhppoted
Copy link
Owner

Ref. #7. Eeesh, security researchers do not think in straight lines - looks like Tailscale is more or less on top of it though.

You managed to figure out logging to /dev/null ? By default tunnel logs to stdout and stderr which means the service typically logs to syslog (depending on how your system is configured of course). Editing the command-line in the unit file to redirect stdout to /dev/null should do it.

@GWustenstein
Copy link
Author

logging: I saw the log level directive in the description, set it to warn in the Service description.

Hey, here's a weird inconsistency; In my typical laziness I pull the 0.8.2 arm linux binary to load up on the pi-zero-w. It crashes out with Illegal Instruction, so I go, oh, built with 64bit. No big deal, install golang and build it from source. Pi-zero, so go make dinner, it'll be a while... come back, stick the binary where I typically do (as /etc/uhppoted/uhppoted-tunnel) only this time when I go to deamonize it, it complains about not finding the .toml file. That's funny, the 64bit binary downloaded from git does not do that on the pi4. Made a uhppoted-tunnel.toml with some defaults I liked, including cert and key file locations, etc. Well, it ignores those directives, but it stopped complaining.

TL;DR: The .toml madness hints there is something out of sync with the pre-built binaries on git and the source.

@uhppoted
Copy link
Owner

uhppoted commented Nov 22, 2022

Just checked the code again and actually it defaults to /var/log/uhppoted/uhppoted-tunnel.log - something must have been weird with my config yesterday when I was checking it - so the log level thing is the way to go.

re. TOML. The repo had a partially completed implementation for Allow tunnel configuration from a config file which wasn't quite finished so you built against an incomplete development version (I'd merged the branch a bit early for other reasons). Am working on the remaining bits and pieces for Linux - should be done today/tomorrow.

FYI, there's a sample TOML file in the examples. The Makefile has some examples, but basically:

./uhppoted-tunnel --config "/opt/uhppoted/uhppoted-tunnel.toml#client"

or (for the default TOML file):

./uhppoted-tunnel --config "#client"

@GWustenstein
Copy link
Author

OMG, you're going to love this, and I feel like an idiot: I have no idea how this happened, but there were both versions of uhppoted-tunnel running concurrently on this system, and looking at /etc/systemd/system I don't see how it is even possible. It was hidden because I am running different client sessions to different hosts, so I expect to see a few out there when running a ps, but until I stopped all the services and saw one still hanging around, and pointing to /home/pi which is not even an active user directory(!) I got suspicious.

This Pi is definitely haunted... mentioned before this was a Buster->Bullseye upgrade, which is contraindicated due to hits on the networking stack, well, methinks there's other weirdness with apt and make. This one gets wiped and a do-over. The pi-zero is now up and running and is a clean build from 0.8.2.

@uhppoted
Copy link
Owner

Ha! I knew it :-) :-) :-) - but basically 'Whew!' - am really, really happy to have that very puzzling anomaly explained.

Oh, BTW - figured out the hack for routing service logs to /dev/null - if you run the service with the--console argument then the logs go to stdout and you can redirect them to wherever you like. Not recommended though because if anything goes wrong there won't be a scrap of evidence to work with - --log-level is definitely the way to go.

twystd pushed a commit to uhppoted/uhppoted-lib that referenced this issue Nov 29, 2022
twystd pushed a commit to uhppoted/uhppoted-lib that referenced this issue Nov 29, 2022
@uhppoted
Copy link
Owner

uhppoted commented Dec 6, 2022

Quick update on this - tunnel is pretty much good to go:

  • the systemd unit file now waits on network-online.target
  • the default lockfile implementation uses the flock syscall for locking at the process level (i.e. the lock is only held while the process is alive)
  • the default lockfile is created in the system temp folder
  • the internal error handling has been cleaned up

I need to migrate the lockfile and unit file changes to the other modules so a release is still a couple of weeks away - but if you need the updates they're in the development version in the repo.

@GWustenstein
Copy link
Author

Thank you, all of these should be quite helpful! Per your prior recommendations a few weeks ago, updated the unit file, and the combination of net online and lockfile in /tmp appear to have solved the disorderly shutdown/restart issue on the Pi-4 (which I now call FrankenBullseye due to it's many, many issues). Power's been weird as well, and in the interest in making sure the solution is robust I've avoided putting these on the UPS and everything seems to recover as I'd hope...that's a huge win!

I've not noticed any spurious crashes of Tunnel on either Bullseye machine (Franken or otherwise; FrankenBullseye the machine who when through the now 'discouraged' version upgrade from Buster, which left it's networking somewhat unstable).

I'll wait for the next release and push those out over the following weekend when available.

twystd added a commit to uhppoted/uhppoted-rest that referenced this issue Dec 6, 2022
twystd added a commit to uhppoted/uhppoted-app-wild-apricot that referenced this issue Dec 6, 2022
@uhppoted
Copy link
Owner

uhppoted commented Dec 6, 2022

Oh, that sounds great!!! Really glad to hear that, and thanks for the update!

The flock syscall is the last piece of the puzzle I think - it's honestly the correct way to do locking and should in theory add a whole new level of robustness to the lockfile thing .. been an interesting delve in the innards of systems.

twystd added a commit to uhppoted/uhppoted-app-sheets that referenced this issue Dec 7, 2022
@uhppoted
Copy link
Owner

uhppoted commented Dec 16, 2022

v0.8.3 is available for whenever you have time to try it out (and a convenient storm!). I'll leave this issue open for a bit..

@uhppoted
Copy link
Owner

'k, closing this - feel free to reopen if you need to.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Status: Done
Development

No branches or pull requests

3 participants