Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Monitor mode #630

Open
wants to merge 8 commits into
base: main
Choose a base branch
from
Open

Monitor mode #630

wants to merge 8 commits into from

Conversation

ArneTR
Copy link
Member

@ArneTR ArneTR commented Dec 26, 2023

This PR adds the monitor mode which was originally brought in as idea from @davidkopp in this PR #556

The monitor mode sets the GMT in an endless loop where the providers continue to collect data.

Main change is that providers that do not collect system metrics but rather transient data like container metrics need an auto discovery mode.

This has in this PR draft so far been implemented for the CpuUtilizationCgroupContainerProvider.

Currently the other phases show as blank, but they will be removed in a future versions: https://metrics.green-coding.berlin/stats.html?id=b3527cc8-c17d-4284-95c4-aa4f353682fa

Caveats / TODOs:

  • Currently the monitor mode is ended through a SIGINT (CTRL+C). If used in production it should not react to SIGHUP and also properly terminate on SIGTERM
  • The GMT gets confused when monitor runs are compared. This has to be guard-claused.
  • Untested how the tool behaves when left logging for 24 hours. This was not the actual design case of the monitor mode but people might use it in that form. TBD
  • The implementation for the container discovery is done with a native linux cgroup implementation.
    • Advantages: Extremely performant
    • Disadvantages:
      • Since docker API is not used the names for the containers are unknown
      • We cannot scan containers under macOS

For docker name problem one could argue that a hashmap can be used where the already known containers are kept in so that lookups are kept to a minimum.
However bringing a hashmap into the C code is quite some work and will create some overhead.

For the macOS problem: This is also already the case with the GMT. However what we could actually do is enter the VM and inject a script.
Entering the VM via a tty is possible like so:
stty -echo -icanon && nc -U ~/Library/Containers/com.docker.docker/Data/debug-shell.sock && stty sane

Leaving this as a point of discussion ...

@ArneTR ArneTR requested a review from ribalba December 26, 2023 07:38
@ArneTR
Copy link
Member Author

ArneTR commented Dec 26, 2023

Here for discussion my test implementation with glib hashmap:

    typedef struct container_t { // struct is a specification and this static makes no sense here
        char path[BUFSIZ];
        char *id;
    } container_t;

    container_t* docker1 = malloc(sizeof(container_t));
    sprintf((*docker1).path,"asdasd");

    container_t* docker2 = malloc(sizeof(container_t));
    sprintf((*docker2).path,"xxxx");


    // Create a new hash table with custom string hash and equal functions
    GHashTable *myHashTable = g_hash_table_new_full(g_str_hash, g_str_equal, g_free, g_free);

    // Insert some key-value pairs
    g_hash_table_insert(myHashTable, g_strdup("docker1"), docker1);
    g_hash_table_insert(myHashTable, g_strdup("docker2"), docker2);

    gpointer lookup_key = NULL;
    lookup_key = g_hash_table_lookup(myHashTable, "docker1");
    if(lookup_key != NULL) {
        printf("Found 'docker1': %s\n", ((container_t*)lookup_key)->path);
    }

    lookup_key = g_hash_table_lookup(myHashTable, "docker2");
    if(lookup_key != NULL) {
        printf("Found 'docker2': %s\n", ((container_t*)lookup_key)->path);
    }

    lookup_key = g_hash_table_lookup(myHashTable, "docker3");
    if(lookup_key != NULL) {
        printf("Found 'docker3': %s\n", ((container_t*)lookup_key)->path);
    } else {
        printf("Could not find docker3\n");
    }


    printf("Ready to free\n");
    g_hash_table_destroy(myHashTable);

Copy link

github-actions bot commented Dec 26, 2023

Old Energy Estimation

Eco-CI Output:

Label 🖥 avg. CPU utilization [%] 🔋 Total Energy [Joules] 🔌 avg. Power [Watts] Duration [Seconds]
Total Run 10.4723 1737.71 2.57438 683
Measurement #1 10.5478 1737.71 2.57438 677

📈 Energy graph:

 
 7.86 ┤                                                ╭╮
 7.25 ┤                                                ││
 6.64 ┤                                                │╰╮                 ╭─╮╭╮
 6.03 ┤                                                │ │          ╭╮╭╮   │ │││
 5.42 ┤                                                │ │          │╰╯╰╮  │ │││
 4.81 ┤          ╭╮                         ╭╮         │ │  ╭─╮╭╮   │   ╰╮ │ ╰╯│
 4.21 ┤        ╭─╯╰╮   ╭───╮ ╭╮    ╭╮   ╭╮  │╰─╮       │ ╰──╯ ╰╯╰╮  │    │ │   │        ╭╮   ╭╮                                   ╭╮╭╮                                                                          ╭────╮                                                                                          ╭─╮                                                    ╭╮                         ╭╮                                                       ╭╮                           ╭╮                                                                                                         ╭╮                                                                       ╭╮
 3.60 ┤     ╭──╯   ╰───╯   ╰─╯╰──╮╭╯╰───╯╰──╯  ╰───╮ ╭─╯         │  │    ╰╮│   ╰╮       │╰───╯╰╮         ╭╮╭╮        ╭╮ ╭─────────╯││╰───╮         ╭──╮         ╭╮╭─╮         ╭──╮          ╭─╮         ╭───────╯    ╰─────╮╭────────╮         ╭──╮         ╭──╮         ╭──╮         ╭──╮         ╭──╮         │ ╰─╮         ╭╮        ╭╮ ╭───╮                   ╭╮ ╭╯╰─╮                   ╭╮╭─╯╰─╮         ╭╮         ╭╮ ╭────╮        ╭─╮         ╭╮ ╭╯╰─╮         ╭╮╭╮        ╭╮╭─╯╰─╮          ╭╮╭╮        ╭╮╭───╮          ╭╮         ╭───╮         ╭─╮          ╭─╮         ╭───╮         ││╭╮╭╮          ╭──╮         ╭╮            ╭──╮         ╭╮           ╭───╯╰─╮         ╭╮          ╭╮╭─
 2.99 ┤    ╭╯                    ╰╯                ╰╮│           ╰╮╭╯     ││    │       │      │         │╰╯│        ││ │          ╰╯    │         │  │         │╰╯ │        ╭╯  │        ╭─╯ │         │                  ╰╯        │        ╭╯  │         │  │         │  │         │  │         │  │         │   │         ││        ││ │   ╰╮        ╭╮        ││ │   │         ╭╮       ╭╯││    │        ╭╯│╭╮       │╰╮│    │        │ │         ││ │   │        ╭╯│││       ╭╯││    │          ││││       ╭╯││   │         ╭╯╰╮        │   │         │ ╰╮         │ │         │   │        ╭╯│││││          │  │         ││            │  │        ╭╯│           │      │         ││         ╭╯││
 2.38 ┤    │                                        ││            ││      ││    │      ╭╯      │        ╭╯  │        ││ │                │         │  ╰╮        │   │        │   │        │   │        ╭╯                            │        │   │         │  │         │  │         │  │        ╭╯  │         │   │         ││        ││ │    │        ││        ││ │   │         ││       │ ││    │       ╭╯ ╰╯│       │ ││    │       ╭╯ ╰╮        ││ │   │        │ │││       │ ││    │       ╭╮ ││││       │ ││   │         │  │        │   │         │  │        ╭╯ │         │   │        │ ╰╯│││       ╭╮ │  ╰╮       ╭╯╰─╮         ╭╯  │       ╭╯ ╰╮          │      │        ╭╯╰─╮       │ ││
 1.77 ┼────╯                                        ╰╯            ╰╯      ╰╯    ╰──────╯       ╰────────╯   ╰────────╯╰─╯                ╰─────────╯   ╰────────╯   ╰────────╯   ╰────────╯   ╰────────╯                             ╰────────╯   ╰─────────╯  ╰─────────╯  ╰─────────╯  ╰────────╯   ╰─────────╯   ╰─────────╯╰────────╯╰─╯    ╰────────╯╰────────╯╰─╯   ╰─────────╯╰───────╯ ╰╯    ╰───────╯    ╰───────╯ ╰╯    ╰───────╯   ╰────────╯╰─╯   ╰────────╯ ╰╯╰───────╯ ╰╯    ╰───────╯╰─╯╰╯╰───────╯ ╰╯   ╰─────────╯  ╰────────╯   ╰─────────╯  ╰────────╯  ╰─────────╯   ╰────────╯   ╰╯╰───────╯╰─╯   ╰───────╯   ╰─────────╯   ╰───────╯   ╰──────────╯      ╰────────╯   ╰───────╯ ╰╯
                                                                                                                                                                                                                                                                                                                                                 Watts over time

@ribalba
Copy link
Member

ribalba commented Dec 31, 2023

Hey, I really like how clean this is. Not quite how I would have implemented it though. I am not quite sure what the result of the monitor mode is. As far as I can see there is nothing that orchestrates the containers anymore and such it is more like a tool that just runs on the system and logs the resource usage of all containers. I think this is a really useful functionality and could be something we could really use. Also is the frontend built to handle data like this? Wouldn't we need to add more details on when containers are started and killed?
I understood the monitor mode to still adhere to some sort of orchestration. So let's say you want to see how the scaling of containers impacts energy usage you would still have some sort of flow but just monitor all containers on the system and not the predefined. I would also keep the option to have services just with a service also being able to start other containers that are then monitored. Not that easy to describe.

@ArneTR
Copy link
Member Author

ArneTR commented Jan 1, 2024

Keeping the services as optionals is a good idea.

Also a "trigger" will be implemented that can start an action and then see how the system behaves.

@ArneTR ArneTR mentioned this pull request Jan 14, 2024
* main: (206 commits)
  Bump psycopg-pool from 3.2.1 to 3.2.2 (#771)
  Event listener most be removed after repaint as it might trigger too often
  Bump redis from 5.0.3 to 5.0.4 (#756)
  Bump orjson from 3.10.2 to 3.10.3 (#764)
  Bump schema from 0.7.5 to 0.7.7 (#765)
  Added more sanity checks for duplicate and wrong container names; Added also better error display if docker container boot failed (#762)
  Bump tqdm from 4.66.2 to 4.66.4 (#760)
  Bump orjson from 3.10.1 to 3.10.2 (#759)
  Bump fastapi from 0.110.2 to 0.110.3 (#758)
  Updated XGBoost
  Bump green-coding-solutions/eco-ci-energy-estimation from 2 to 3 (#754)
  Eco-CI must render even for old data
  CI Badge in Readme update
  Bump orjson from 3.10.0 to 3.10.1 (#747)
  Bump aiohttp from 3.9.4 to 3.9.5 (#749)
  Bump fastapi from 0.110.1 to 0.110.2 (#751)
  Added run_id to cluster error
  Added basic optimizations (#752)
  Redis must restart when docker agent receives sigint
  Added reduced mail to client also in cluster
  ...
@ArneTR ArneTR marked this pull request as ready for review May 15, 2024 16:04
* main:
  Fixed tests
  Schema checker better error message
  Changed default branch to event-bound for measurement control workload
@ArneTR ArneTR removed the request for review from ribalba May 15, 2024 16:48
Copy link

Eco-CI Output:

Label 🖥 avg. CPU utilization [%] 🔋 Total Energy [Joules] 🔌 avg. Power [Watts] Duration [Seconds]
Total Run 23.7141 1291.81 3.61852 366
Measurement #1 23.9958 1291.81 3.61852 358

📈 Energy graph:

 
 8.18 ┤                                                                                                     ╭──╮
 7.54 ┤                                                                                                  ╭╮ │  │
 6.90 ┤                                                                                 ╭╮          ╭╮   ││╭╯  │
 6.26 ┤                                                                                 ││          ││   │││   │
 5.62 ┤                                                                                 ││         ╭╯╰╮  │││   │
 4.97 ┤                                     ╭─╮ ╭╮                                      │╰╮ ╭╮ ╭╮  │  ╰╮ │╰╯   │
 4.33 ┤       ╭───╮      ╭╮         ╭╮     ╭╯ │ │╰╮ ╭╮                       ╭╮         │ ╰─╯╰─╯│  │   ╰╮│     │      ╭╮
 3.69 ┤ ╭╮  ╭─╯   ╰──────╯╰───────╮╭╯╰─────╯  ╰─╯ │╭╯╰───────────────────────╯╰─────────╯       ╰╮╭╯    ╰╯     ╰╮    ╭╯╰────╮╭──╮         ╭╮            ╭─────╮╭──────────────────────────────────────────────╮╭──╮                    ╭─────────╮╭───────────────────────────────────────────────────────────────╮ ╭────────────╮╭──────────╮╭─╮╭─────╮╭─╮╭╮╭─╮╭───╮╭─────
 3.05 ┤ ││ ╭╯                     ╰╯              ││                                             ││             │    │      ││  │         ││         ╭─╮│     ╰╯                                              ╰╯  │                   ╭╯         ││                                                               │ │            ╰╯          ╰╯ ╰╯     ╰╯ ││╰╯ ╰╯   ╰╯
 2.41 ┤╭╯╰╮│                                      ╰╯                                             ╰╯             │    │      ╰╯  ╰╮        ││╭╮       │ ││                                                         │                   │          ││                                                               │ │                                     ╰╯
 1.77 ┼╯  ╰╯                                                                                                    ╰────╯           ╰────────╯╰╯╰───────╯ ╰╯                                                         ╰───────────────────╯          ╰╯                                                               ╰─╯
                                                                                                                                                                                  Watts over time

🌳 CO2 Data:
City: Chicago, Lat: 41.8819, Lon: -87.6278
Carbon Intensity for this location: 393 gCO₂eq/kWh
SCI: 0.507681 gCO₂eq / pipeline run emitted

@ribalba
Copy link
Member

ribalba commented May 29, 2024

Shall I take another look at this?

@ArneTR
Copy link
Member Author

ArneTR commented May 30, 2024

No, i removed you as a reviewer. I have kept this open since I sent it to an external party to review and found it much clearer to leave it open that just merge it in for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants