Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make stream shutdown if self-node has been removed #2125

Merged
merged 3 commits into from
Sep 11, 2024

Conversation

kradalby
Copy link
Collaborator

@kradalby kradalby commented Sep 11, 2024

Currently we will read the node from database, and since it is
deleted, the id might be set to nil. Keep the node around and
just shutdown, so it is cleanly removed from notifier.

Fixes #2118

Also adds the ability to check integration test headscale for panics
and a test case producing the error, which is then fixed.

Signed-off-by: Kristoffer Dalby kristoffer@tailscale.com

Summary by CodeRabbit

  • New Features

    • Introduced a new test case to validate the deletion of an online node, enhancing system robustness.
    • Enhanced logging capabilities during shutdown processes, providing paths to log files for better diagnostics.
  • Bug Fixes

    • Improved error handling and logging during node removal and shutdown operations to prevent system panics.
  • Documentation

    • Added comments to indicate future enhancements regarding panic log assertions.

Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
Copy link
Contributor

coderabbitai bot commented Sep 11, 2024

Walkthrough

The changes in this pull request enhance the integration testing and error handling of the Headscale application. A new test case is introduced to ensure that deleting an online node does not cause a panic. Modifications to several methods improve logging capabilities by returning additional information, such as log file paths, alongside error statuses. These updates focus on increasing the robustness and reliability of the application during node management operations.

Changes

File Change Summary
.github/workflows/test-integration.yaml Added a new test case Test2118DeletingOnlineNodePanics to improve integration testing coverage.
hscontrol/poll.go Added logic to handle node removal in serveLongPoll, improving long polling robustness.
integration/control.go Modified Shutdown and SaveLog methods to return two strings and an error for enhanced logging.
integration/dockertestutil/logs.go Updated SaveLog to return log file paths along with an error, improving error handling.
integration/general_test.go Introduced Test2118DeletingOnlineNodePanics to validate behavior when deleting an online node.
integration/hsic/hsic.go Changed Shutdown and SaveLog methods to return log file paths and an error for better diagnostics.
integration/scenario.go Added ShutdownAssertNoPanics to check for panics during shutdown, enhancing reliability.
integration/tsic/tsic.go Modified SaveLog to handle errors from log saving and added a comment for future enhancements.

Assessment against linked issues

Objective Addressed Explanation
Ensure deletion of an online node does not cause a panic (#[2118])
Improve logging during shutdown and save operations (#[2118])
Validate behavior of the system after node deletion (#[2118])

Poem

In the burrow where code does play,
A rabbit hops in a joyful way.
With tests to guard against a crash,
And logs to help when things go splash!
Hooray for changes, so bright and neat,
In Headscale's world, we can't be beat! 🐇✨


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share
Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai generate interesting stats about this repository and render them as a table.
    • @coderabbitai show all the console.log statements in this repository.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
Currently we will read the node from database, and since it is
deleted, the id might be set to nil. Keep the node around and
just shutdown, so it is cleanly removed from notifier.

Fixes juanfont#2118

Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Outside diff range and nitpick comments (2)
integration/scenario.go (1)

192-211: LGTM, with a minor suggestion.

The new ShutdownAssertNoPanics function looks good. It effectively shuts down the control servers and asserts that no panics occurred in the logs when a testing context is provided.

Suggestion:
Consider improving the error handling by checking the error returned from control.Shutdown() before proceeding to read the log files. You could log the error or perform some other action if the shutdown fails.

integration/tsic/tsic.go (1)

1001-1003: LGTM, but consider converting the TODO comment to a GitHub issue.

The changes to propagate the error from SaveLog function call are approved.

However, to better track the future enhancement of asserting panic logs from Tailscale, consider converting the TODO comment to a GitHub issue.

Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 4b02dc9 and eb71f8a.

Files selected for processing (8)
  • .github/workflows/test-integration.yaml (1 hunks)
  • hscontrol/poll.go (2 hunks)
  • integration/control.go (1 hunks)
  • integration/dockertestutil/logs.go (2 hunks)
  • integration/general_test.go (1 hunks)
  • integration/hsic/hsic.go (2 hunks)
  • integration/scenario.go (4 hunks)
  • integration/tsic/tsic.go (1 hunks)
Additional comments not posted (16)
integration/dockertestutil/logs.go (8)

20-20: LGTM!

The change to the return signature of the SaveLog function is approved. Returning the paths to the saved log files provides more information to the caller, which is beneficial.


23-23: LGTM!

The change to the error return statement is approved. Returning empty strings for the log file paths along with the error maintains a consistent return pattern and allows the caller to handle errors effectively.


44-44: LGTM!

The change to the error return statement is approved. Returning empty strings for the log file paths along with the error maintains a consistent return pattern and allows the caller to handle errors effectively.


49-49: LGTM!

Using path.Join to construct the file path is a good practice as it ensures cross-platform compatibility.


51-56: LGTM!

  • Using os.WriteFile to write the stdout log file is a good practice as it provides a simple and efficient way to write data to a file.
  • The change to the error return statement is approved. Returning empty strings for the log file paths along with the error maintains a consistent return pattern and allows the caller to handle errors effectively.

59-59: LGTM!

Using path.Join to construct the file path is a good practice as it ensures cross-platform compatibility.


61-66: LGTM!

  • Using os.WriteFile to write the stderr log file is a good practice as it provides a simple and efficient way to write data to a file.
  • The change to the error return statement is approved. Returning empty strings for the log file paths along with the error maintains a consistent return pattern and allows the caller to handle errors effectively.

69-69: LGTM!

Returning the paths to the stdout and stderr log files along with a nil error is approved. This change provides more information to the caller, which is beneficial.

.github/workflows/test-integration.yaml (1)

55-55: Brilliant addition to the test suite!

The inclusion of the Test2118DeletingOnlineNodePanics test case is a fantastic enhancement to the integration testing workflow. It directly addresses the critical issue mentioned in the PR objectives, where deleting an online node causes a panic in the Headscale application.

By adding this specific test case, you are ensuring that the scenario of deleting an online node is thoroughly validated, and any potential panics or crashes are caught during the testing phase. This proactive approach will greatly improve the robustness and reliability of the application.

Furthermore, the naming convention of the test case follows the established pattern in the test suite, making it consistent and easily understandable.

Excellent work on enhancing the test coverage and addressing a critical issue!

integration/scenario.go (1)

235-241: Looks good!

The modification to the Shutdown function to call ShutdownAssertNoPanics(nil) is a clean way to integrate the new panic assertions into the existing shutdown process for non-test scenarios.

integration/hsic/hsic.go (2)

401-402: LGTM! The changes to the Shutdown method signature and logic are approved.

The updated Shutdown method now returns stdout and stderr paths in addition to the error, which will be helpful for diagnostics by providing direct access to the log file paths.


466-466: LGTM! The changes to the SaveLog method signature are approved.

The updated SaveLog method now returns stdout and stderr paths in addition to the error, which aligns with the changes made to the Shutdown method. This ensures that both methods can effectively communicate the results of their operations.

hscontrol/poll.go (2)

8-8: LGTM!

The import statement for the slices package is correctly added.


277-281: Looks good!

The added code correctly handles the case when the current node has been removed from Headscale. It checks if the node ID is present in the update.Removed slice using slices.Contains and closes the stream by returning from the method if the node is found. The tracing and stream closure logic is properly implemented.

integration/general_test.go (2)

958-1054: LGTM!

The test is well-structured, follows clear steps, and uses appropriate assertions to verify the expected behaviour. The test ensures that deleting an online node does not cause a panic and that the node is actually deleted.


1003-1013: Skipped reviewing executeAndUnmarshal function.

The executeAndUnmarshal function is not defined in the provided code snippet. Its usage in the test seems appropriate for executing Headscale CLI commands and parsing the JSON output. If the function definition is added in the future, it should be reviewed separately.

Also applies to: 1039-1048

integration/control.go Show resolved Hide resolved
integration/control.go Show resolved Hide resolved
@kradalby kradalby merged commit 64319f7 into juanfont:main Sep 11, 2024
118 of 119 checks passed
@kradalby kradalby deleted the kradalby/2118-panic-online-delete branch September 11, 2024 10:00
kradalby added a commit to kradalby/headscale that referenced this pull request Sep 30, 2024
* add shutdown that asserts if headscale had panics

Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>

* add test case producing 2118 panic

Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>

* make stream shutdown if self-node has been removed

Currently we will read the node from database, and since it is
deleted, the id might be set to nil. Keep the node around and
just shutdown, so it is cleanly removed from notifier.

Fixes juanfont#2118

Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>

---------

Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug] Crash when deleting an online node
2 participants