Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1.5.3 crashing repeatedly starting about a hour ago #65

Closed
sbr2004 opened this issue Nov 9, 2021 · 19 comments
Closed

1.5.3 crashing repeatedly starting about a hour ago #65

sbr2004 opened this issue Nov 9, 2021 · 19 comments

Comments

@sbr2004
Copy link

sbr2004 commented Nov 9, 2021

fort 1.5.3 (installed from fort-1.5.3-1.el8.x86_64.rpm)

Started to crash repeatedly beginning about a hour ago.
Unable to get it working.
Tried to wipe /var/fort_cache, did not help

Stack trace:
/usr/local/bin/fort(print_stack_trace+0x1a) [0x415c6a]
/usr/local/bin/fort(pr_crit+0x7f) [0x417b7f]
/usr/local/bin/fort() [0x412c25]
/usr/local/bin/fort(deferstack_pop+0x3b) [0x412e0b]
/usr/local/bin/fort() [0x423b70]
/lib64/libpthread.so.0(+0x7ea5) [0x7f715b7abea5]
/lib64/libc.so.6(clone+0x6d) [0x7f715b4d48dd]
(Stack size was 7.)

@ydahhrk
Copy link
Member

ydahhrk commented Nov 9, 2021

Ok.

Quick question: Do you know if this also happens with 1.5.2?

@sbr2004
Copy link
Author

sbr2004 commented Nov 9, 2021

Sorry, cant check it.

@sbr2004
Copy link
Author

sbr2004 commented Nov 9, 2021

Same on another box, with 1.4.2:

Stack trace:
/usr/local/bin/fort(print_stack_trace+0x1f) [0x55632b9e482f]
/usr/local/bin/fort(pr_crit+0x89) [0x55632b9e76d9]
/usr/local/bin/fort(+0x1d2b3) [0x55632b9e12b3]
/usr/local/bin/fort(deferstack_pop+0x2f) [0x55632b9e14cf]
/usr/local/bin/fort(+0x31fba) [0x55632b9f5fba]
/usr/local/bin/fort(+0x329f1) [0x55632b9f69f1]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x7fa3) [0x7f8dac6a9fa3]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x3f) [0x7f8dac5da4cf]
(Stack size was 8.)

fort -V
fort 1.4.2

@ydahhrk
Copy link
Member

ydahhrk commented Nov 9, 2021

Thank you.

@sbr2004
Copy link
Author

sbr2004 commented Nov 9, 2021

I have 3 boxes with fort.
1.5.3 and 1.4.2 crashed, seems only v1.5.1-10-gc69a140 is still working.

@lukastribus
Copy link

Not sure if there is a relation, but @job tweeted the following earlier today:

Today I published BGPsec Router Keys for AS 15562

https://twitter.com/JobSnijders/status/1458126836391686149

@ties
Copy link

ties commented Nov 9, 2021

Unfortunately I can not get fort to finish the initial sync when firewalling the hostname of the repo containing the P256/bgpsec certificate. This does however prevent the crash. Next step would be to serve another rsync repo at that IP and see what happens.

@ydahhrk
Copy link
Member

ydahhrk commented Nov 9, 2021

Is the stack trace really the only output?
It seems as though it should print some error message before that.

@sbr2004
Copy link
Author

sbr2004 commented Nov 9, 2021

Yes, it is. No error message.

@sbr2004
Copy link
Author

sbr2004 commented Nov 9, 2021

Started it with debug argument, last lines from syslog:

Nov 10 01:00:28 nuff fort: DBG: Client closed the socket.
Nov 10 01:00:29 nuff fort: DBG: Thread Validation.1: Task '/var/tal/lacnic.tal' ended
Nov 10 01:00:29 nuff fort: DBG: Thread Validation.1: Waiting for work...
Nov 10 01:00:29 nuff fort: DBG: - Active workers: 0
Nov 10 01:00:29 nuff fort: DBG: - Task queue: Empty
Nov 10 01:00:29 nuff fort: DBG: Pool 'Validation': All work has been completed.
Nov 10 01:00:29 nuff fort: WRN: Validation from TAL '/var/tal/apnic.tal' yielded error, discarding any other validation results.
Nov 10 01:00:29 nuff fort: INF: Validation finished:
Nov 10 01:00:29 nuff fort: INF: - Valid ROAs: 0
Nov 10 01:00:29 nuff fort: INF: - Valid Router Keys: 0
Nov 10 01:00:29 nuff fort: INF: - No serial number.
Nov 10 01:00:29 nuff fort: INF: - Real execution time: 331 secs.
Nov 10 01:00:29 nuff fort: ERR: First validation wasn't successful.
Nov 10 01:00:29 nuff fort: Stack trace:
Nov 10 01:00:29 nuff fort: /usr/local/bin/fort(print_stack_trace+0x1f) [0x417e2f]
Nov 10 01:00:29 nuff fort: /usr/local/bin/fort(__pr_op_err+0x84) [0x4183f4]
Nov 10 01:00:29 nuff fort: /usr/local/bin/fort(main+0x146) [0x413e26]
Nov 10 01:00:29 nuff fort: /lib64/libc.so.6(__libc_start_main+0xf5) [0x7f2e19722555]
Nov 10 01:00:29 nuff fort: /usr/local/bin/fort() [0x413e65]
Nov 10 01:00:29 nuff fort: (End of stack trace)
Nov 10 01:00:29 nuff fort: DBG: Destroying thread pool 'Server'.
Nov 10 01:00:29 nuff fort: DBG: Thread Server.1: Returning.
Nov 10 01:00:29 nuff fort: DBG: Thread Server.2: Returning.
Nov 10 01:00:29 nuff fort: DBG: Thread Server.3: Returning.
Nov 10 01:00:29 nuff fort: DBG: Thread Server.4: Returning.
Nov 10 01:00:29 nuff fort: DBG: Thread Server.5: Returning.
Nov 10 01:00:29 nuff fort: DBG: Thread Server.6: Returning.
Nov 10 01:00:29 nuff fort: DBG: Thread Server.7: Returning.
Nov 10 01:00:29 nuff fort: DBG: Thread Server.8: Returning.
Nov 10 01:00:29 nuff fort: DBG: Thread Server.10: Returning.
Nov 10 01:00:29 nuff fort: DBG: Thread Server.9: Returning.
Nov 10 01:00:29 nuff fort: DBG: Thread Server.11: Returning.
Nov 10 01:00:29 nuff fort: DBG: Thread Server.12: Returning.
Nov 10 01:00:29 nuff fort: DBG: Thread Server.13: Returning.
Nov 10 01:00:29 nuff fort: DBG: Thread Server.14: Returning.
Nov 10 01:00:29 nuff fort: DBG: Thread Server.15: Returning.
Nov 10 01:00:29 nuff fort: DBG: Thread Server.16: Returning.
Nov 10 01:00:29 nuff fort: DBG: Thread Server.17: Returning.
Nov 10 01:00:29 nuff fort: DBG: Thread Server.18: Returning.
Nov 10 01:00:29 nuff fort: DBG: Thread Server.19: Returning.
Nov 10 01:00:29 nuff fort: DBG: Thread Server.20: Returning.
Nov 10 01:00:29 nuff fort: DBG: Destroyed.
Nov 10 01:00:29 nuff fort: DBG: Destroying thread pool 'Validation'.
Nov 10 01:00:29 nuff fort: DBG: Thread Validation.4: Returning.
Nov 10 01:00:29 nuff fort: DBG: Thread Validation.2: Returning.
Nov 10 01:00:29 nuff fort: DBG: Thread Validation.3: Returning.
Nov 10 01:00:29 nuff fort: DBG: Thread Validation.5: Returning.
Nov 10 01:00:29 nuff fort: DBG: Thread Validation.1: Returning.
Nov 10 01:00:29 nuff fort: DBG: Destroyed.
Nov 10 01:00:29 nuff fort: DBG: Destroying thread pool 'Internal'.
Nov 10 01:00:29 nuff fort: DBG: Thread Internal.1: Returning.
Nov 10 01:00:29 nuff fort: DBG: Thread Internal.2: Returning.
Nov 10 01:00:29 nuff fort: DBG: Thread Internal.4: Returning.
Nov 10 01:00:29 nuff fort: DBG: Thread Internal.5: Returning.
Nov 10 01:00:29 nuff fort: DBG: Thread Internal.6: Returning.
Nov 10 01:00:29 nuff fort: DBG: Thread Internal.3: Returning.
Nov 10 01:00:29 nuff fort: DBG: Thread Internal.7: Returning.
Nov 10 01:00:29 nuff fort: DBG: Thread Internal.8: Returning.
Nov 10 01:00:29 nuff fort: DBG: Thread Internal.9: Returning.
Nov 10 01:00:29 nuff fort: DBG: Thread Internal.10: Returning.
Nov 10 01:00:29 nuff fort: DBG: Destroyed.

@ydahhrk
Copy link
Member

ydahhrk commented Nov 9, 2021

That stack trace looks very different, though. (It's not a fatal one.)

@ties
Copy link

ties commented Nov 9, 2021

1.5.0:

sudo fort --tal /etc/pki/tals --log.level=debug --rsync.retry.count=1 --rrdp.retry.count=1 --rsync.retry.interval=1 --rrdp.retry.interval=1
Nov 10 00:00:44 WRN: 'rrdp.retry.count' will be deprecated, use 'http.retry.count' instead.
Nov 10 00:00:44 WRN: 'rrdp.retry.interval' will be deprecated, use 'http.retry.interval' instead.
Nov 10 00:00:44 INF: Disabling validation logging on syslog.
Nov 10 00:00:44 INF: Disabling validation logging on standard streams.
Nov 10 00:00:44 INF: Console log output configured; disabling operation logging on syslog.
Nov 10 00:00:44 INF: (Operation Logs will be sent to the standard streams only.)
Nov 10 00:00:44 INF: Configuration {
Nov 10 00:00:44 INF: tal: /etc/pki/tals
Nov 10 00:00:44 INF: local-repository: /tmp/fort/repository
Nov 10 00:00:44 INF: sync-strategy: root-except-ta
Nov 10 00:00:44 INF: shuffle-uris: false
Nov 10 00:00:44 INF: maximum-certificate-depth: 32
Nov 10 00:00:44 INF: slurm: (null)
Nov 10 00:00:44 INF: mode: server
Nov 10 00:00:44 INF: work-offline: false
Nov 10 00:00:44 INF: daemon: false
Nov 10 00:00:44 INF: server.address:
Nov 10 00:00:44 INF:   <Nothing>
Nov 10 00:00:44 INF: server.port: 323
Nov 10 00:00:44 INF: server.backlog: 128
Nov 10 00:00:44 INF: server.interval.validation: 3600
Nov 10 00:00:44 INF: server.interval.refresh: 3600
Nov 10 00:00:44 INF: server.interval.retry: 600
Nov 10 00:00:44 INF: server.interval.expire: 7200
Nov 10 00:00:44 INF: rsync.enabled: true
Nov 10 00:00:44 INF: rsync.priority: 50
Nov 10 00:00:44 INF: rsync.strategy: root-except-ta
Nov 10 00:00:44 INF: rsync.retry.count: 1
Nov 10 00:00:44 INF: rsync.retry.interval: 1
Nov 10 00:00:44 INF: rsync.program: rsync
Nov 10 00:00:44 INF: rsync.arguments-recursive:
Nov 10 00:00:44 INF:   --recursive
Nov 10 00:00:44 INF:   --delete
Nov 10 00:00:44 INF:   --times
Nov 10 00:00:44 INF:   --contimeout=20
Nov 10 00:00:44 INF:   --timeout=15
Nov 10 00:00:44 INF:   $REMOTE
Nov 10 00:00:44 INF:   $LOCAL
Nov 10 00:00:44 INF: rsync.arguments-flat:
Nov 10 00:00:44 INF:   --times
Nov 10 00:00:44 INF:   --contimeout=20
Nov 10 00:00:44 INF:   --timeout=15
Nov 10 00:00:44 INF:   --dirs
Nov 10 00:00:44 INF:   $REMOTE
Nov 10 00:00:44 INF:   $LOCAL
Nov 10 00:00:44 INF: rrdp.enabled: true
Nov 10 00:00:44 INF: rrdp.priority: 60
Nov 10 00:00:44 INF: rrdp.retry.count: 1
Nov 10 00:00:44 INF: rrdp.retry.interval: 1
Nov 10 00:00:44 INF: http.enabled: true
Nov 10 00:00:44 INF: http.priority: 60
Nov 10 00:00:44 INF: http.retry.count: 1
Nov 10 00:00:44 INF: http.retry.interval: 1
Nov 10 00:00:44 INF: http.user-agent: fort/1.5.0
Nov 10 00:00:44 INF: http.connect-timeout: 30
Nov 10 00:00:44 INF: http.transfer-timeout: 0
Nov 10 00:00:44 INF: http.idle-timeout: 15
Nov 10 00:00:44 INF: http.ca-path: (null)
Nov 10 00:00:44 INF: log.enabled: true
Nov 10 00:00:44 INF: log.output: console
Nov 10 00:00:44 INF: log.level: debug
Nov 10 00:00:44 INF: log.tag: (null)
Nov 10 00:00:44 INF: log.facility: daemon
Nov 10 00:00:44 INF: log.file-name-format: global-url
Nov 10 00:00:44 INF: log.color-output: false
Nov 10 00:00:44 INF: validation-log.enabled: false
Nov 10 00:00:44 INF: validation-log.output: console
Nov 10 00:00:44 INF: validation-log.level: warning
Nov 10 00:00:44 INF: validation-log.tag: Validation
Nov 10 00:00:44 INF: validation-log.facility: daemon
Nov 10 00:00:44 INF: validation-log.file-name-format: global-url
Nov 10 00:00:44 INF: validation-log.color-output: false
Nov 10 00:00:44 INF: Custom incidences:
Nov 10 00:00:44 INF:   incid-hashalg-has-params (Signed Object's hash algorithm has NULL object as parameters): ignore
Nov 10 00:00:44 INF:   incid-obj-not-der-encoded (Object isn't DER encoded): ignore
Nov 10 00:00:44 INF:   incid-file-at-mft-not-found (File listed at manifest doesn't exist): error
Nov 10 00:00:44 INF:   incid-file-at-mft-hash-not-match (File hash listed at manifest doesn't match the actual file hash): error
Nov 10 00:00:44 INF:   incid-mft-stale (The current time is after the nextUpdate field at the manifest): error
Nov 10 00:00:44 INF:   incid-crl-stale (The current time is after the nextUpdate field at the CRL): error
Nov 10 00:00:44 INF: output.roa: (null)
Nov 10 00:00:44 INF: output.bgpsec: (null)
Nov 10 00:00:44 INF: asn1-decode-max-stack: 4096
Nov 10 00:00:44 INF: stale-repository-period: 43200
Nov 10 00:00:44 INF: init-tals: false
Nov 10 00:00:44 INF: thread-pool.server.max: 20
Nov 10 00:00:44 INF: thread-pool.validation.max: 5
Nov 10 00:00:44 INF: }
Nov 10 00:00:44 DBG: rpkiManifest registered. Its nid is 1205.
Nov 10 00:00:44 DBG: signedObject registered. Its nid is 1206.
Nov 10 00:00:44 DBG: rpkiNotify registered. Its nid is 1207.
Nov 10 00:00:44 DBG: id-cp-ipAddr-asNumber (RFC 6484) registered. Its nid is 1208.
Nov 10 00:00:44 DBG: id-cp-ipAddr-asNumber-v2 (RFC 8360) registered. Its nid is 1209.
Nov 10 00:00:44 DBG: id-pe-ipAddrBlocks-v2 registered. Its nid is 1210.
Nov 10 00:00:44 DBG: id-pe-autonomousSysIds-v2 registered. Its nid is 1211.
Nov 10 00:00:44 DBG: id-kp-bgpsec-router registered. Its nid is 1212.
Nov 10 00:00:44 DBG: Thread waiting for work...
Nov 10 00:00:44 DBG: Pool thread #0 spawned
Nov 10 00:00:44 DBG: Thread waiting for work...
Nov 10 00:00:44 DBG: Pool thread #1 spawned
Nov 10 00:00:44 DBG: Thread waiting for work...
Nov 10 00:00:44 DBG: Pool thread #2 spawned
Nov 10 00:00:44 DBG: Thread waiting for work...
Nov 10 00:00:44 DBG: Pool thread #3 spawned
Nov 10 00:00:44 DBG: Thread waiting for work...
Nov 10 00:00:44 DBG: Pool thread #4 spawned
Nov 10 00:00:44 DBG: Thread waiting for work...
Nov 10 00:00:44 DBG: Pool thread #5 spawned
Nov 10 00:00:44 DBG: Thread waiting for work...
Nov 10 00:00:44 DBG: Pool thread #6 spawned
Nov 10 00:00:44 DBG: Thread waiting for work...
Nov 10 00:00:44 DBG: Pool thread #7 spawned
Nov 10 00:00:44 DBG: Thread waiting for work...
Nov 10 00:00:44 DBG: Pool thread #8 spawned
Nov 10 00:00:44 DBG: Thread waiting for work...
Nov 10 00:00:44 DBG: Pool thread #9 spawned
Nov 10 00:00:44 DBG: Thread waiting for work...
Nov 10 00:00:44 DBG: Pool thread #0 spawned
Nov 10 00:00:44 DBG: Thread waiting for work...
Nov 10 00:00:44 DBG: Pool thread #1 spawned
Nov 10 00:00:44 DBG: Thread waiting for work...
Nov 10 00:00:44 DBG: Pool thread #2 spawned
Nov 10 00:00:44 DBG: Thread waiting for work...
Nov 10 00:00:44 DBG: Pool thread #3 spawned
Nov 10 00:00:44 DBG: Thread waiting for work...
Nov 10 00:00:44 DBG: Pool thread #4 spawned
Nov 10 00:00:44 INF: Attempting to bind socket to address 'any', port '323'.
Nov 10 00:00:44 INF: Success; bound to address 'any', port '323'.
Nov 10 00:00:44 DBG: Created server socket with FD 3.
Nov 10 00:00:44 DBG: Thread waiting for work...
Nov 10 00:00:44 DBG: Pool thread #0 spawned
Nov 10 00:00:44 DBG: Thread waiting for work...
Nov 10 00:00:44 DBG: Pool thread #1 spawned
Nov 10 00:00:44 DBG: Thread waiting for work...
Nov 10 00:00:44 DBG: Pool thread #2 spawned
Nov 10 00:00:44 DBG: Thread waiting for work...
Nov 10 00:00:44 DBG: Pool thread #3 spawned
Nov 10 00:00:44 DBG: Thread waiting for work...
Nov 10 00:00:44 DBG: Pool thread #4 spawned
Nov 10 00:00:44 DBG: Thread waiting for work...
Nov 10 00:00:44 DBG: Pool thread #5 spawned
Nov 10 00:00:44 DBG: Thread waiting for work...
Nov 10 00:00:44 DBG: Pool thread #6 spawned
Nov 10 00:00:44 DBG: Thread waiting for work...
Nov 10 00:00:44 DBG: Pool thread #7 spawned
Nov 10 00:00:44 DBG: Thread waiting for work...
Nov 10 00:00:44 DBG: Pool thread #8 spawned
Nov 10 00:00:44 DBG: Thread waiting for work...
Nov 10 00:00:44 DBG: Pool thread #9 spawned
Nov 10 00:00:44 DBG: Thread waiting for work...
Nov 10 00:00:44 DBG: Pool thread #10 spawned
Nov 10 00:00:44 DBG: Thread waiting for work...
Nov 10 00:00:44 DBG: Pool thread #11 spawned
Nov 10 00:00:44 DBG: Thread waiting for work...
Nov 10 00:00:44 DBG: Pool thread #12 spawned
Nov 10 00:00:44 DBG: Thread waiting for work...
Nov 10 00:00:44 DBG: Pool thread #13 spawned
Nov 10 00:00:44 DBG: Thread waiting for work...
Nov 10 00:00:44 DBG: Pool thread #14 spawned
Nov 10 00:00:44 DBG: Thread waiting for work...
Nov 10 00:00:44 DBG: Pool thread #15 spawned
Nov 10 00:00:44 DBG: Thread waiting for work...
Nov 10 00:00:44 DBG: Pool thread #16 spawned
Nov 10 00:00:44 DBG: Thread waiting for work...
Nov 10 00:00:44 DBG: Pool thread #17 spawned
Nov 10 00:00:44 DBG: Thread waiting for work...
Nov 10 00:00:44 DBG: Pool thread #18 spawned
Nov 10 00:00:44 DBG: Thread waiting for work...
Nov 10 00:00:44 DBG: Pool thread #19 spawned
Nov 10 00:00:44 WRN: First validation cycle has begun, wait until the next notification to connect your router(s)
Nov 10 00:00:44 INF: Starting validation.
Nov 10 00:00:44 DBG: Pushing a task to the pool
Nov 10 00:00:44 DBG: Pushing a task to the pool
Nov 10 00:00:44 DBG: Pulling a task from the pool
Nov 10 00:00:44 DBG: Working on task #1
Nov 10 00:00:44 DBG: Pulling a task from the pool
Nov 10 00:00:44 DBG: Working on task #2
Nov 10 00:00:44 DBG: Pushing a task to the pool
Nov 10 00:00:44 DBG: Pulling a task from the pool
Nov 10 00:00:44 DBG: Working on task #3
Nov 10 00:00:44 DBG: Pushing a task to the pool
Nov 10 00:00:44 DBG: Waiting all tasks from the pool to end
Nov 10 00:00:44 DBG: - Stop: false
Nov 10 00:00:44 DBG: - Working count: 3
Nov 10 00:00:44 DBG: - Thread count: 5
Nov 10 00:00:44 DBG: - Empty queue: false
Nov 10 00:00:44 DBG: Pulling a task from the pool
Nov 10 00:00:44 DBG: Working on task #4
Nov 10 00:00:56 DBG: Task ended
Nov 10 00:00:56 DBG: Thread waiting for work...
Nov 10 00:01:16 DBG: Task ended
Nov 10 00:01:16 DBG: Thread waiting for work...
Nov 10 00:02:25 DBG: Task ended
Nov 10 00:02:25 DBG: Thread waiting for work...
Nov 10 00:03:12 CRT: /etc/pki/tals/ripe.tal: Attempted to pop empty X509 stack
Stack trace:
  fort(print_stack_trace+0x1a) [0x41894a]
  fort(pr_crit+0x89) [0x41b829]
  fort() [0x4152e1]
  fort(deferstack_pop+0x2f) [0x4154ef]
  fort() [0x42a004]
  fort() [0x42a9b1]
  fort() [0x439131]
  /lib64/libc.so.6(+0x8db17) [0x7f226b144b17]
  /lib64/libc.so.6(+0x1126c0) [0x7f226b1c96c0]
(Stack size was 9.)

1.5.3 passes on my machine (now). Of course I do not know if the same content was served to this version: A repository operator could serve different content depending on the user-agent etc.

Curiously, valid router keys is 0 while there is one in the repo:

Nov 10 00:05:57 INF: Validation finished:
Nov 10 00:05:57 INF: - Valid ROAs: 246846
Nov 10 00:05:57 INF: - Valid Router Keys: 0
Nov 10 00:05:57 INF: - Serial: 1
Nov 10 00:05:57 INF: - Real execution time: 134 secs.
Nov 10 00:05:57 WRN: First validation cycle successfully ended, now you can connect your router(s)

@ydahhrk
Copy link
Member

ydahhrk commented Nov 9, 2021

Nov 10 00:03:12 CRT: /etc/pki/tals/ripe.tal: Attempted to pop empty X509 stack

Ok. We do have the error message, though. Thanks.

@ydahhrk
Copy link
Member

ydahhrk commented Nov 9, 2021

Curiously, valid router keys is 0 while there is one in the repo:

Normal. It's supposed to skip BGP certificates, since issue58-proper hasn't been merged.

@ties
Copy link

ties commented Nov 9, 2021

If I run

$ find .   -name \*.cer   -exec sh -c 'openssl x509 -in $1 -noout -text -inform DER > $1.txt' x {} \;
$ rg P256

In the cache dir after running 1.5.3 I do not see the P256/bgpsec certificate. With 1.5.0 I do. My routinator also sees two valid bgpsec certificates. It's late in my local timezone so I will not dive deeper - but I would expect the files to still be in the repo?

Could be fort cleaning the directory. Could be a split view being served depending on the user-agent.

@ydahhrk
Copy link
Member

ydahhrk commented Nov 19, 2021

With 1.5.0 I do.

Does anybody happen to still have a copy of the cache from when the crash happened?

@ties
Copy link

ties commented Nov 21, 2021

$ sudo fort --tal /etc/pki/tals --log.level=debug --rsync.retry.count=1 --rrdp.retry.count=1 --rsync.retry.interval=1 --rrdp.retry.interval=1
...
Nov 21 17:41:24 DBG: Thread Validation.4: Claimed task '/etc/pki/tals/afrinic.tal'
Nov 21 17:41:24 DBG: Thread Validation.3: Claimed task '/etc/pki/tals/lacnic.tal'
Nov 21 17:41:32 DBG: Thread Validation.4: Task '/etc/pki/tals/afrinic.tal' ended
Nov 21 17:41:32 DBG: Thread Validation.4: Waiting for work...
Nov 21 17:41:52 DBG: Thread Validation.3: Task '/etc/pki/tals/lacnic.tal' ended
Nov 21 17:41:52 DBG: Thread Validation.3: Waiting for work...
Nov 21 17:42:24 ERR: https://ca.rg.net/rrdp/notify.xml: Error requesting URL: The requested URL returned error: 404. (HTTP code: 404)
Stack trace:
  fort(print_stack_trace+0x1f) [0x417e5f]
  fort(__pr_op_err+0x84) [0x418424]
  fort() [0x423537]
  fort() [0x42387d]
  fort(http_download_file+0x35) [0x423a65]
  fort(rrdp_parse_notification+0x13a) [0x42c15a]
  fort() [0x42a565]
  fort() [0x4252e3]
  fort(certificate_traverse+0x8ae) [0x426eae]
  fort() [0x428ca3]
  fort() [0x4296c9]
  fort() [0x437307]
  /lib64/libc.so.6(+0x8db17) [0x7f16341b7b17]
  /lib64/libc.so.6(+0x1126c0) [0x7f163423c6c0]
(End of stack trace)
$ fort --version
fort 1.5.3
$  lsb_release -a
LSB Version:	:core-4.1-amd64:core-4.1-noarch
Distributor ID:	Fedora
Description:	Fedora release 35 (Thirty Five)
Release:	35
Codename:	ThirtyFive
$ sha256sum `which fort`
7f9919544b4643572815913b01ba8cee0f04a8ef1115aa3dd7df2d13ce217c6b  /usr/bin/fort
/tmp $ tar -Jcf 20211121-fort.tar.xz fort
...a 1 minute wait...

Local repository is in https://drive.google.com/file/d/13SVrMK4xDENcS6kqBI3wNtUo0yOD29z0/view?usp=sharing

@ydahhrk
Copy link
Member

ydahhrk commented Nov 22, 2021

One thing I failed to mention in the release notes is that, since 1.5.3, Fort is now more trigger-happy with the stack traces. I thought this might help since bug reports have been on the rise. But it's too trigger happy, evidently.

You can tell simply by running the binary without arguments:

$ fort
Nov 22 09:34:16 ERR: The TAL(s) location (--tal) is mandatory.
Stack trace:
  src/fort(print_stack_trace+0x3b) [0x55e3959afccb]
  src/fort(__pr_op_err+0x9f) [0x55e3959b040f]
  src/fort(handle_flags_config+0x7a2) [0x55e3959ae102]
  src/fort(main+0x7c) [0x55e3959ab4fc]
  /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3) [0x7fb3e0eee0b3]
  src/fort(_start+0x2e) [0x55e3959ab64e]
(End of stack trace)
Nov 22 09:34:16 ERR: Try 'src/fort --usage' or 'src/fort --help' for more information.
Stack trace:
  src/fort(print_stack_trace+0x3b) [0x55e3959afccb]
  src/fort(__pr_op_err+0x9f) [0x55e3959b040f]
  src/fort(handle_flags_config+0x396) [0x55e3959adcf6]
  src/fort(main+0x7c) [0x55e3959ab4fc]
  /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3) [0x7fb3e0eee0b3]
  src/fort(_start+0x2e) [0x55e3959ab64e]
(End of stack trace)

I'm going to have this toned down by the next release.

Just to be clear: Bug #65 refers to this critical:

Nov 10 00:03:12 CRT: /etc/pki/tals/ripe.tal: Attempted to pop empty X509 stack
Stack trace:
  fort(print_stack_trace+0x1a) [0x41894a]
  fort(pr_crit+0x89) [0x41b829]
  fort() [0x4152e1]
  fort(deferstack_pop+0x2f) [0x4154ef]
  fort() [0x42a004]
  fort() [0x42a9b1]
  fort() [0x439131]
  /lib64/libc.so.6(+0x8db17) [0x7f226b144b17]
  /lib64/libc.so.6(+0x1126c0) [0x7f226b1c96c0]
(Stack size was 9.)

@ydahhrk
Copy link
Member

ydahhrk commented Dec 1, 2023

As stated in the 1.6.0 release notes, I have found and patched several instances of undefined behavior during the reviews. There is no way to prove that these caused this particular crash (particularly considering that I never managed to reproduce it, and the OP already probably left), but the code has changed so much, at this point I expect the bug to manifest in a completely different way, if at all.

Please upgrade to the latest version. If it crashes again, please open a new bug.

@ydahhrk ydahhrk closed this as completed Dec 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants