Experiment results analysis generation #44

hellais · 2023-11-17T17:04:11Z

To make it easier to review I am separating all the analysis diff into a separate PR that targets the experiment_results branch.

This way it's possible to view the diff more cleanly and make it easier to understand what is going on.

Overview of scope of this PR

Inside of this PR I implement an Experiment Results analysis approach based on the analysis tables.

The basic idea is to take the analysis keys that are generated by comparing an individual observation with ground truth data. Through a very large set of rules we are able to assign individual blocking, down and ok rules based on how confident we are in that particular signal being a sign for censorship.

We then take all the scores pertaining to a particular observation group relevant to a measurement and generate a MeasurementExperimentResult which should be backward compatible with out existing PR.

Based on this we add support for generating the experiment results based on the analysis inside of the mkanalysis command and a simple web interface for inspecting them.

In terms of performance some cursory benchmarks were run the dataset from 2023-09-01 - 2023-11-01 and it was processing data at a rate of ~7k observations per second scaling on 34 cores.

Summary of changes

Implement Experiment Result generation based on the analysis tables
Implement minimal UI for MeasurementExperimentResult
Add support for generating MeasurementExperimentResult as part of mkanalysis cli command
Add more tests for all of the above

codecov · 2023-11-17T18:41:36Z

Codecov Report

Attention: 148 lines in your changes are missing coverage. Please review.

Comparison is base (1f24e63) 82.14% compared to head (9b8f594) 81.19%.

❗ Current head 9b8f594 differs from pull request most recent head 0a0aeaa. Consider uploading reports for the commit 0a0aeaa to get more accurate results

Files	Patch %	Lines
oonidata/analysis/website_experiment_results.py	77.14%	109 Missing ⚠️
oonidata/dataviz/web.py	0.00%	31 Missing ⚠️
oonidata/analysis/web_analysis.py	84.61%	2 Missing ⚠️
oonidata/cli/command.py	80.00%	1 Missing ⚠️
oonidata/db/connections.py	50.00%	1 Missing ⚠️
...ata/transforms/nettests/measurement_transformer.py	80.00%	1 Missing ⚠️
oonidata/workers/common.py	98.48%	1 Missing ⚠️
tests/conftest.py	91.66%	1 Missing ⚠️
tests/test_analysis.py	80.00%	1 Missing ⚠️

Additional details and impacted files

@@                  Coverage Diff                   @@
##           experiment-results      #44      +/-   ##
======================================================
- Coverage               82.14%   81.19%   -0.96%     
======================================================
  Files                      69       69              
  Lines                    6005     6067      +62     
======================================================
- Hits                     4933     4926       -7     
- Misses                   1072     1141      +69

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

= Overview of scope of this PR Inside of this PR I implement an Experiment Results analysis approach based on the analysis tables. The basic idea is to take the analysis keys that are generated by comparing an individual observation with ground truth data. Through a very large set of rules we are able to assign individual blocking, down and ok rules based on how confident we are in that particular signal being a sign for censorship. We then take all the scores pertaining to a particular observation group relevant to a measurement and generate a `MeasurementExperimentResult` which should be backward compatible with out existing PR. Based on this we add support for generating the experiment results based on the analysis inside of the `mkanalysis` command and a simple web interface for inspecting them. In terms of performance some cursory benchmarks were run the dataset from 2023-09-01 - 2023-11-01 and it was processing data at a rate of ~7k observations per second scaling on 34 cores. = Summary of changes * Implement Experiment Result generation based on the analysis tables * Implement minimal UI for MeasurementExperimentResult * Add support for generating MeasurementExperimentResult as part of mkanalysis cli command * Add more tests for all of the above

* Make utcnow() calls timezone aware * Implement workardound for clickhouse bug mymarilyn/clickhouse-driver#388 * Implement more tests for range deletions * Refactoring of get_prev_range functions * Fix problem in experiment result generation

hellais · 2023-11-21T08:39:41Z

Performed some clean up of the git commit history to make it possible to merge this without needing to rely on the github squash and merge function.

The reason to do that is that I want to preserve the history of certain changes in a dedicated commit (the ones about deleting dead code and the one about the clickhouse bug). This is so that in the future if we need to recover the dead code it's easy and so it's possible to easily see all the changes that are needed in order to address the clickhouse bug.

bassosimone · 2023-11-21T13:48:26Z

Performed some clean up of the git commit history to make it possible to merge this without needing to rely on the github squash and merge function.

The reason to do that is that I want to preserve the history of certain changes in a dedicated commit (the ones about deleting dead code and the one about the clickhouse bug). This is so that in the future if we need to recover the dead code it's easy and so it's possible to easily see all the changes that are needed in order to address the clickhouse bug.

Yeah, this seems like a good choice 💯

This implements: #35 This branch is based on top of #44. That one should be merged before this one is ready to be landed in master.

hellais force-pushed the er-website branch from 6229caf to 54df5a5 Compare November 18, 2023 14:11

hellais added the funder/drl2022-2024 label Nov 20, 2023

hellais added 3 commits November 21, 2023 09:35

Delete dead code for previous version of experiment results

0f2069f

hellais force-pushed the er-website branch from 9b8f594 to 0a0aeaa Compare November 21, 2023 08:37

hellais mentioned this pull request Nov 21, 2023

New proposal for instant experiment result data model #43

Merged

hellais merged commit c9ffc63 into experiment-results Nov 21, 2023
3 checks passed

hellais deleted the er-website branch November 21, 2023 08:41

hellais mentioned this pull request Nov 21, 2023

Fastapi based API #45

Merged

hellais mentioned this pull request Jan 11, 2024

Generate ExperimentResults and BlockingEvents ooni/ooni.org#1282

Closed

hellais added a commit that referenced this pull request Sep 2, 2024

Fastapi based API (#45)

a06a793

This implements: #35 This branch is based on top of #44. That one should be merged before this one is ready to be landed in master.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Experiment results analysis generation #44

Experiment results analysis generation #44

hellais commented Nov 17, 2023 •

edited

Loading

codecov bot commented Nov 17, 2023 •

edited

Loading

hellais commented Nov 21, 2023

bassosimone commented Nov 21, 2023

Experiment results analysis generation #44

Experiment results analysis generation #44

Conversation

hellais commented Nov 17, 2023 • edited Loading

Overview of scope of this PR

Summary of changes

codecov bot commented Nov 17, 2023 • edited Loading

Codecov Report

hellais commented Nov 21, 2023

bassosimone commented Nov 21, 2023

hellais commented Nov 17, 2023 •

edited

Loading

codecov bot commented Nov 17, 2023 •

edited

Loading