Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check instance reachable status in machine-controller-manager while checking new machine joining machine deployment #729

Open
Tracked by #724
neo-liang-sap opened this issue Jun 21, 2022 · 3 comments
Labels
area/ops-productivity Operator productivity related (how to improve operations) kind/enhancement Enhancement, improvement, extension lifecycle/rotten Nobody worked on this for 12 months (final aging stage) needs/planning Needs (more) planning with other MCM maintainers priority/2 Priority (lower number equals higher priority)

Comments

@neo-liang-sap
Copy link

neo-liang-sap commented Jun 21, 2022

How to categorize this issue?
/area control-plane
/kind enhancement
/priority 3

What would you like to be added:

in AWS, sometimes instance is running but not reachable, in aws there's a command to check this reachable status
aws ec2 describe-instance-status --instance-ids i-01e71990bfe658adc

aws ec2 describe-instance-status --instance-ids i-01e71990bfe658adc
{
    "InstanceStatuses": [
        {
            "AvailabilityZone": "eu-central-1a",
            "InstanceId": "i-01e71990bfe658adc",
            "InstanceState": {
                "Code": 16,
                "Name": "running"
            },
            "InstanceStatus": {
                "Details": [
                    {
                        "ImpairedSince": "2022-06-21T06:28:00+00:00",
                        "Name": "reachability",
                        "Status": "failed"
                    }
                ],
                "Status": "impaired"
            },
            "SystemStatus": {
                "Details": [
                    {
                        "Name": "reachability",
                        "Status": "passed"
                    }
                ],
                "Status": "ok"
            }
        }
    ]
}

this instance is running but not reachable

Is it possible to add some check in MCM whether the instance is reachable?

Why is this needed:

To have better understanding what's the process of machine joining the cluster, e.g. sometime machine created, after 20mins, deleted by MCM and recreated another one....

CC @dguendisch

@neo-liang-sap neo-liang-sap added the kind/enhancement Enhancement, improvement, extension label Jun 21, 2022
@gardener-robot
Copy link

@neo-liang-sap Label area/todo does not exist.

@gardener-robot gardener-robot added the priority/3 Priority (lower number equals higher priority) label Jun 21, 2022
@himanshu-kun himanshu-kun changed the title check instance reachable status in machine-controller-manager while checking new machine joining machine deployment Check instance reachable status in machine-controller-manager while checking new machine joining machine deployment Jun 27, 2022
@himanshu-kun
Copy link
Contributor

Yes we will work on adding such feature. Some research is required first to see if other providers also provide such networking info of an instance directly or not.

@gardener-robot gardener-robot added the lifecycle/stale Nobody worked on this for 6 months (will further age) label Dec 24, 2022
@himanshu-kun
Copy link
Contributor

Post Grooming discussion

We need to enhance driver method GetMachineStatus to also do some checks like reachability mentioned above, and enahance GetMachineStatusResponse to contain the result of the check.
Then we should update the error in machine status to reflect that, so that it goes till the status of higher level controllers and get reflected in dashboard for user to see.

@himanshu-kun himanshu-kun added priority/2 Priority (lower number equals higher priority) area/ops-productivity Operator productivity related (how to improve operations) needs/planning Needs (more) planning with other MCM maintainers and removed priority/3 Priority (lower number equals higher priority) lifecycle/stale Nobody worked on this for 6 months (will further age) labels Feb 23, 2023
@gardener-robot gardener-robot added the lifecycle/stale Nobody worked on this for 6 months (will further age) label Nov 2, 2023
@gardener-robot gardener-robot added lifecycle/rotten Nobody worked on this for 12 months (final aging stage) and removed lifecycle/stale Nobody worked on this for 6 months (will further age) labels Jul 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/ops-productivity Operator productivity related (how to improve operations) kind/enhancement Enhancement, improvement, extension lifecycle/rotten Nobody worked on this for 12 months (final aging stage) needs/planning Needs (more) planning with other MCM maintainers priority/2 Priority (lower number equals higher priority)
Projects
None yet
Development

No branches or pull requests

3 participants