Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add transparent caching, --cache-create, --cache-delete and --cache-q… #155

Merged
merged 6 commits into from
Mar 6, 2024

Conversation

klmcwhirter
Copy link
Contributor

@klmcwhirter klmcwhirter commented Feb 24, 2024

…uery options

As mentioned elsewhere please DO NOT MERGE until a commit shows up with fixes 79 in it.

TODO:
* more testing
* update README - https://github.com/klmcwhirter/envycontrol?tab=readme-ov-file#%EF%B8%8F-usage - what else?

@klmcwhirter
Copy link
Contributor Author

@bayasdev I installed akmod-nvidia from rpmfusion on my Fedora 39 install and tested all state transitions. All tests passing.

I believe we are good to merge this now.

Thanks for all the help!

@bayasdev
Copy link
Owner

bayasdev commented Mar 2, 2024

Thanks @klmcwhirter!

I'll check this on the weekend but LGTM and works as expected on my Ubuntu 23.10 test machine.

However, I don't think pushing a new release immediately after merging this, so we can see if anyone living on the edge pulls EnvyControl from main branch and reports an issue related to the cache system 😆

Copy link
Owner

@bayasdev bayasdev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

 def create_cache_obj(self, nvidia_gpu_pci_bus):
        from datetime import datetime
        return {
            'switch': {
                'nvidia_gpu_pci_bus': nvidia_gpu_pci_bus
            },
            'metadata': {
                'audit_iso_tmstmp': datetime.now().isoformat(),
                'args': vars(self.app_args),
                'amd_igpu_name': get_amd_igpu_name(),
                'current_mode': self.current_mode,
                'display_manager': get_display_manager(),
                'igpu_pci_bus': get_igpu_bus_pci_bus(),
                'igpu_vendor': get_igpu_vendor(),
            }
        }

We can cache only the nvidia_gpu_pci_bus for now, major changes should be delayed for version 4. That would simplify a lot this PR.

@klmcwhirter
Copy link
Contributor Author

klmcwhirter commented Mar 5, 2024

 def create_cache_obj(self, nvidia_gpu_pci_bus):
        from datetime import datetime
        return {
            'switch': {
                'nvidia_gpu_pci_bus': nvidia_gpu_pci_bus
            },
            'metadata': {
                'audit_iso_tmstmp': datetime.now().isoformat(),
                'args': vars(self.app_args),
                'amd_igpu_name': get_amd_igpu_name(),
                'current_mode': self.current_mode,
                'display_manager': get_display_manager(),
                'igpu_pci_bus': get_igpu_bus_pci_bus(),
                'igpu_vendor': get_igpu_vendor(),
            }
        }

We can cache only the nvidia_gpu_pci_bus for now, major changes should be delayed for version 4. That would simplify a lot this PR.

It won't change much, but as promised earlier I will make the change here and in README.md.

You see one of the most challenging aspects of troubleshooting a risky system change like this is having enough visibility into the information used by a process to make decisions. And, you can never predict which information might be needed given a particular scenario. It is best to have everything.

Putting it all in the new cache file is the least risky approach - as opposed to, say, adding a set of rolling log files in /var/log/envycontrol/...

Change made. I will puish it up once I have the README adjustments made and perform asnother round of testing.

Here is what the cache file looks like now:

$ sudo python -m envycontrol --cache-query
{
    "switch": {
        "nvidia_gpu_pci_bus": "PCI:1:0:0"
    }
}

@klmcwhirter klmcwhirter requested a review from bayasdev March 5, 2024 17:43
Copy link
Owner

@bayasdev bayasdev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update the object so it looks like this:

{
    "nvidia_gpu_pci_bus_id": "PCI:1:0:0"
}

@klmcwhirter
Copy link
Contributor Author

Please update the object so it looks like this:

{
    "nvidia_gpu_pci_bus_id": "PCI:1:0:0"
}

That will take more time and retesting. I will look at it tomorrow when I have more time.

@bayasdev
Copy link
Owner

bayasdev commented Mar 5, 2024

@klmcwhirter

You see one of the most challenging aspects of troubleshooting a risky system change like this is having enough visibility into the information used by a process to make decisions. And, you can never predict which information might be needed given a particular scenario. It is best to have everything.

If the user does not provide enough system logs we can mark their issue as invalid.

Putting it all in the new cache file is the least risky approach - as opposed to, say, adding a set of rolling log files in /var/log/envycontrol/...

Writing to syslog is a good idea for version 4, that would be much cleaner than caching user's system data that will eventually get stale.

Edit: If you feel comfortable you can add a trace command in another PR (or delay it for v4) that gets fresh data from the user's system for troubleshooting purposes. It's always possible for an user (specially Linux ones) to change their DE/WM, etc so caching such data makes no sense.

@klmcwhirter
Copy link
Contributor Author

Please update the object so it looks like this:

{
    "nvidia_gpu_pci_bus_id": "PCI:1:0:0"
}

That will take more time and retesting. I will look at it tomorrow when I have more time. Oops - this was not posted yesterday.

I went through the 6 switch operations and 6 reboots to test and I think it looks good.

But, that is it. Take it or leave it. I won't be making any more changes and think the last 2 are a mistake.

As mentioned I will be using the version from my tests branch. The reason is that I am only getting about 2-1/2 hr battery life in integrated mode. Which means I either need to find the magic bullet thatr is missing in the config mods, or accelerate the process of buying a laptop without nvidia hardware. FYI.

Push is coming in a minute.

This is what my test plan looks like.

  1. envycontrol --reset and reboot
  2. envycontrol -s integrated and reboot
  3. envyvcontrol -s nvidia and reboot # use cache in this step
  4. envycontrol -s hybrid and reboot
  5. envycontrol --cache-delete # force recreating cache in next step
  6. envycontrol -s nvidia and reboot
  7. envycontrol -s integrated and reboot

All state transitions happened without issue.

Enjoy!

@bayasdev bayasdev merged commit 8ad51d6 into bayasdev:main Mar 6, 2024
@bayasdev
Copy link
Owner

bayasdev commented Mar 6, 2024

Thanks @klmcwhirter! this is exactly what I needed the cache to be. Will issue a new release in the following hours.

@klmcwhirter klmcwhirter deleted the main branch March 6, 2024 14:20
@klmcwhirter
Copy link
Contributor Author

Glad I could help. As mentioned, my tests branch will remain and I'll be adding more details to help troubleshoot to the cache file there.

Feel free to borrow code as you would like from that branch.

But to be honest, I probably will time box my effort there.

I have 2 projects on hold while I have been working on this. One of them is my pi-day-2024-with-py project ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants