Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Json output #21

Merged
merged 7 commits into from
May 22, 2024
Merged

Conversation

shashank-boyapally
Copy link
Contributor

@shashank-boyapally shashank-boyapally commented Feb 20, 2024

Type of change

  • Refactor
  • New feature
  • Bug fix
  • Optimization
  • Documentation Update

Description

Some of the changes are from metadata change PR please ignore them. We output the change point detection in a tabular form which was inherently provided by hunter. Now we provide it as a json output in stdout with the use of the flag --output or -o, this flag defaults to text and can be set to json. Below is a concise version of the json output. This does not change the csv file output. This PR aims to just provide the logic for json output, this will be separated in daemon mode.

[{
        "uuid": "5b06c830-be55-4c84-b4da-92720b55c304",
        "timestamp": 1701866097,
        "metrics": {
            "etcdDisk_duration_avg": {
                "value": 0.0134358362,
                "percentage_change": 0
            },
            "P99": {
                "value": 13000,
                "percentage_change": 0
            },
            "etcdCPU_cpu_avg": {
                "value": 8.6963812859,
                "percentage_change": 0
            },
            "ovnCPU_cpu_avg": {
                "value": 2.8599566281,
                "percentage_change": 0
            },
            "apiserverCPU_cpu_avg": {
                "value": 11.0500421666,
                "percentage_change": 0
            }
        },
        "is_changepoint": false
    },
    {
        "uuid": "c5f95336-27d4-4615-94c1-d618e975a36b",
        "timestamp": 1703075837,
        "metrics": {
            "etcdDisk_duration_avg": {
                "value": 0.0137848279,
                "percentage_change": 42.39053258714572
            },
            "P99": {
                "value": 13000,
                "percentage_change": 0
            },
            "etcdCPU_cpu_avg": {
                "value": 8.2014834841,
                "percentage_change": 0
            },
            "ovnCPU_cpu_avg": {
                "value": 2.7795422015,
                "percentage_change": 0
            },
            "apiserverCPU_cpu_avg": {
                "value": 10.9252901816,
                "percentage_change": 0
            }
        },
        "is_changepoint": true
    },]

Related Tickets & Documents

Checklist before requesting a review

  • I have performed a self-review of my code.
  • If it is a core feature, I have added thorough tests.

Testing

  • Please describe the System Under Test.
  • Please provide detailed steps to perform tests related to this code change.
  • How were the fix/results from this change verified? Please provide relevant screenshots or results.

@jtaleric
Copy link
Member

jtaleric commented Mar 5, 2024

Can we add some additional data to the JSON like the build, platform, cni, etc?

@shashank-boyapally
Copy link
Contributor Author

Thank you for the suggestion @jtaleric, will be adding and pushing the changes.

orion.py Outdated
@@ -11,8 +12,13 @@
import pandas as pd

from fmatch.matcher import Matcher
from utils.orion_funcs import run_hunter_analyze, get_metadata, \
set_logging, load_config, get_metric_data
from utils.orion_funcs import (
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thoughts on just importing the whole file, so you dont have to specify every function as we continue to add

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, if we are using almost all the functions in that file.

orion.py Outdated
@@ -11,8 +12,13 @@
import pandas as pd

from fmatch.matcher import Matcher
from utils.orion_funcs import run_hunter_analyze, get_metadata, \
set_logging, load_config, get_metric_data
from utils.orion_funcs import (
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, if we are using almost all the functions in that file.

output_table = report.produce_report(
test_name="test", report_type=ReportType.LOG
)
print(output_table)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we use a logger here?

)
print(output_table)
elif output == "json":
change_points_by_metric = series.analyze().change_points
Copy link
Collaborator

@vishnuchalla vishnuchalla Mar 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we don't need this custom logic at all. We can just use the same Report class to get change points reported in json using inbuilt functionality.

output = report.produce_report(test_name="test",report_type=ReportType.JSON)
print(output)

{"test": [{"time": 1703075837, "changes": [{"metric": "etcdDisk_duration_avg", "forward_change_percent": "43"}]}, {"time": 1704890296, "changes": [{"metric": "ovnCPU_cpu_avg", "forward_change_percent": "26"}, {"metric": "etcdCPU_cpu_avg", "forward_change_percent": "13"}]}, {"time": 1704994226, "changes": [{"metric": "apiserverCPU_cpu_avg", "forward_change_percent": "20"}]}]}

This will give the exact timestamps of all the change points along with metric and their percentage change attributes. If we want the value we can just get it from the merged dataframe by doing some thing like below.

index = df[df['timestamp'] == given_timestamp].index.item()
value = df.loc[index, 'value']

Also instead of entire data dump, having only change points in the JSON output would be cleaner. just my .02 cents.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the comment @vishnuchalla. I believe the main reason we find to have json is reusability of data and logic in other tools. If we have change points it might just be constrained to changepoint detection, if we have the whole data, the same can be used in cpt dashboards such that we can view the changepoints in graphs too. I think this is the major reason we went ahead with implementing a custom logic. Also custom logic allows us to add the buildUrl and else if we needed in the json document. Thoughts?

Copy link
Collaborator

@vishnuchalla vishnuchalla Mar 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood. So can we have both regulated with a toogle between just changepoints and the entire list of docs with changepoints? Because getting the entire data might be very huge sometimes. Also can we have a follow-up story to only detect regressions within a given time range? To have something like this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll have this toggle and time range implemented along with the next PR, as already the current PR #26 has too many tasks going on.

print(json.dumps(output_json, indent=4))


def parse_json_output(merged_df, change_points_by_metric):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be removed based on the logic suggested in the above comment

Copy link
Collaborator

@vishnuchalla vishnuchalla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.

Signed-off-by: Shashank Reddy Boyapally <sboyapal@redhat.com>
Signed-off-by: Shashank Reddy Boyapally <sboyapal@redhat.com>
Signed-off-by: Shashank Reddy Boyapally <sboyapal@redhat.com>
Signed-off-by: Shashank Reddy Boyapally <sboyapal@redhat.com>
Signed-off-by: Shashank Reddy Boyapally <sboyapal@redhat.com>
Signed-off-by: Shashank Reddy Boyapally <sboyapal@redhat.com>
Signed-off-by: Shashank Reddy Boyapally <sboyapal@redhat.com>
Copy link
Member

@jtaleric jtaleric left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we suppress the log output when users pass --output json ? Usually they would run jq with the output, which might get messed up w/ additional output?

Copy link
Member

@jtaleric jtaleric left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

happy to see this merged and we add the option to suppress output.

Copy link
Collaborator

@vishnuchalla vishnuchalla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@jtaleric jtaleric merged commit daad3af into cloud-bulldozer:main May 22, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants