-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
enhance(apps/analytics): add scripts for the computation of aggregated analytics #4385
Conversation
Current Aviator status
This PR was merged manually (without Aviator). Merging manually can negatively impact the performance of the queue. Consider using Aviator next time.
See the real-time status of this PR on the
Aviator webapp.
Use the Aviator Chrome Extension
to see the status of your PR within GitHub.
|
️✅ There are no secrets present in this pull request anymore.If these secrets were true positive and are still valid, we highly recommend you to revoke them. 🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request. |
📝 WalkthroughWalkthroughThe pull request introduces several new functions and modules to enhance the analytics capabilities within the application. Key additions include the Changes
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
Quality Gate passedIssues Measures |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 7
🧹 Outside diff range and nitpick comments (8)
apps/analytics/src/modules/aggregated_analytics/save_aggregated_analytics.py (1)
61-73
: Optimize calculation oftotalElementsAvailable
The calculation of
totalElementsAvailable
involves nested loops overpracticeQuizzes
andmicroLearnings
, theirstacks
, andelements
. This can be inefficient for courses with large datasets.Consider optimizing this calculation by using database aggregation functions or modifying the query to compute
totalElementsAvailable
directly in the database. This approach would reduce processing time and resource usage in Python.apps/analytics/src/modules/__init__.py (1)
2-2
: Addcompute_aggregated_analytics
to__all__
to define the public APIThe static analysis tool reports that
compute_aggregated_analytics
is imported but unused. Since you're importing it in__init__.py
to expose it at the package level, consider adding it to the__all__
list to explicitly declare it as part of the public API.Apply this diff:
+__all__ = [ + 'compute_correctness', + 'get_participant_responses', + 'compute_aggregated_analytics', +]🧰 Tools
🪛 Ruff (0.8.0)
2-2:
.aggregated_analytics.compute_aggregated_analytics
imported but unused; consider removing, adding to__all__
, or using a redundant alias(F401)
apps/analytics/src/modules/aggregated_analytics/__init__.py (1)
1-4
: Add imported functions to__all__
to define the public APIThe static analysis tool reports that the imported functions are unused. Since you're importing these functions to make them available at the module level, consider adding them to the
__all__
list to explicitly define the public API.Apply this diff:
+__all__ = [ + 'compute_aggregated_analytics', + 'load_participant_analytics', + 'aggregate_participant_analytics', + 'save_aggregated_analytics', +]🧰 Tools
🪛 Ruff (0.8.0)
1-1:
.compute_aggregated_analytics.compute_aggregated_analytics
imported but unused; consider removing, adding to__all__
, or using a redundant alias(F401)
2-2:
.load_participant_analytics.load_participant_analytics
imported but unused; consider removing, adding to__all__
, or using a redundant alias(F401)
3-3:
.aggregate_participant_analytics.aggregate_participant_analytics
imported but unused; consider removing, adding to__all__
, or using a redundant alias(F401)
4-4:
.save_aggregated_analytics.save_aggregated_analytics
imported but unused; consider removing, adding to__all__
, or using a redundant alias(F401)
apps/analytics/src/modules/aggregated_analytics/aggregate_participant_analytics.py (1)
9-9
: Fix typo in commentThere's a typo in the comment. "aggreagte" should be spelled "aggregate".
Apply this diff:
- # aggreagte all participant analytics for the specified time range and separate courses + # aggregate all participant analytics for the specified time range and separate coursesapps/analytics/src/modules/aggregated_analytics/compute_aggregated_analytics.py (3)
6-8
: Consider adding type hints and docstringThe function signature would benefit from type hints and a docstring describing the parameters and return value.
def compute_aggregated_analytics( - db, start_date, end_date, timestamp, analytics_type="DAILY", verbose=False + db: Prisma, + start_date: str, + end_date: str, + timestamp: str, + analytics_type: str = "DAILY", + verbose: bool = False ): + """Compute aggregated analytics for the given time range. + + Args: + db: Prisma database connection + start_date: Start date in ISO format + end_date: End date in ISO format + timestamp: Timestamp for analytics computation + analytics_type: Type of analytics (DAILY, WEEKLY, MONTHLY, COURSE) + verbose: Enable verbose output + + Returns: + None + """
20-21
: Use f-strings for string formattingString concatenation with + is less readable than f-strings.
- print("Aggregated analytics for time range:" + start_date + " to " + end_date) + print(f"Aggregated analytics for time range: {start_date} to {end_date}")
23-28
: Use f-strings for string formattingSimilar to above, use f-strings for better readability.
- print( - "No aggregated analytics to compute for time range:" - + start_date - + " to " - + end_date - ) + print(f"No aggregated analytics to compute for time range: {start_date} to {end_date}")apps/analytics/src/notebooks/aggregated_analytics.ipynb (1)
67-71
: Consider parameterizing date rangesThe start date is hardcoded to "2021-01-01". Consider making this configurable.
+# Configuration +DEFAULT_START_DATE = "2021-01-01" +start_date = os.getenv("ANALYTICS_START_DATE", DEFAULT_START_DATE) -start_date = "2021-01-01"
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (7)
apps/analytics/src/modules/__init__.py
(1 hunks)apps/analytics/src/modules/aggregated_analytics/__init__.py
(1 hunks)apps/analytics/src/modules/aggregated_analytics/aggregate_participant_analytics.py
(1 hunks)apps/analytics/src/modules/aggregated_analytics/compute_aggregated_analytics.py
(1 hunks)apps/analytics/src/modules/aggregated_analytics/load_participant_analytics.py
(1 hunks)apps/analytics/src/modules/aggregated_analytics/save_aggregated_analytics.py
(1 hunks)apps/analytics/src/notebooks/aggregated_analytics.ipynb
(1 hunks)
🧰 Additional context used
🪛 Ruff (0.8.0)
apps/analytics/src/modules/__init__.py
2-2: .aggregated_analytics.compute_aggregated_analytics
imported but unused; consider removing, adding to __all__
, or using a redundant alias
(F401)
apps/analytics/src/modules/aggregated_analytics/load_participant_analytics.py
22-22: Undefined name start_date
(F821)
22-22: Undefined name end_date
(F821)
apps/analytics/src/modules/aggregated_analytics/__init__.py
1-1: .compute_aggregated_analytics.compute_aggregated_analytics
imported but unused; consider removing, adding to __all__
, or using a redundant alias
(F401)
2-2: .load_participant_analytics.load_participant_analytics
imported but unused; consider removing, adding to __all__
, or using a redundant alias
(F401)
3-3: .aggregate_participant_analytics.aggregate_participant_analytics
imported but unused; consider removing, adding to __all__
, or using a redundant alias
(F401)
4-4: .save_aggregated_analytics.save_aggregated_analytics
imported but unused; consider removing, adding to __all__
, or using a redundant alias
(F401)
🔇 Additional comments (2)
apps/analytics/src/modules/aggregated_analytics/compute_aggregated_analytics.py (1)
1-4
: LGTM: Clean imports structure
The imports are well-organized and follow a clear pattern of importing related functionality from sibling modules.
apps/analytics/src/notebooks/aggregated_analytics.ipynb (1)
133-136
: Verify timestamp usage for course-wide analytics
Using a constant timestamp from 1970 seems like a workaround. Consider if this could cause issues with data querying or historical tracking.
✅ Verification successful
Let me gather more context about the timestamp usage and analytics computation.
Let me check the actual implementation of the analytics computation to understand how these timestamps are used.
Let me check how the timestamp is used in the analytics computation.
Let me check the participant analytics computation to understand the timestamp usage pattern.
Based on the code analysis, I can now provide a final response:
Using epoch timestamp (1970) is intentional and valid for course-wide analytics
The constant timestamp "1970-01-01T00:00:00.000Z" is used as a special marker for course-wide analytics, which is different from daily/weekly/monthly analytics. This is a valid design pattern because:
- The code explicitly handles different analytics types ("DAILY", "WEEKLY", "MONTHLY", "COURSE")
- Course-wide analytics represent aggregate data across all time periods, making the epoch timestamp a logical choice
- The timestamp is consistently used across participant and aggregated analytics modules
- The database schema supports this through unique indices on timestamp fields
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Check if there are any other timestamp usages that might conflict
rg -A 2 "1970-01-01T00:00:00.000Z"
# Check for any timestamp-related queries in the database access layer
ast-grep --pattern 'timestamp = $value'
Length of output: 800
Script:
#!/bin/bash
# Check the implementation of compute_aggregated_analytics to understand timestamp usage
ast-grep --pattern 'def compute_aggregated_analytics($$$)'
# Look for other timestamp-related database queries
rg -A 5 "timestamp.*db\."
# Check for any date/time related database operations
rg -A 3 "datetime.*db\."
Length of output: 1241
Script:
#!/bin/bash
# Find the analytics computation implementation
rg -A 10 "def compute_aggregated_analytics"
# Check for any timestamp-related filtering or querying in analytics
rg -A 5 "timestamp.*where|where.*timestamp"
# Look for any database queries with date ranges
rg -A 5 "between.*date|date.*between"
Length of output: 67644
Script:
#!/bin/bash
# Check the implementation of load_participant_analytics to understand timestamp usage
rg -A 10 "def load_participant_analytics"
# Look for any timestamp-related database queries in participant analytics
rg -A 5 "timestamp.*participant"
# Check for any timestamp-related constants or defaults
rg -A 3 "timestamp.*const|DEFAULT_TIMESTAMP"
Length of output: 69860
Script:
#!/bin/bash
# Check the participant analytics computation implementation
rg -A 10 "def compute_participant_analytics"
# Look for any timestamp-related database queries in analytics computation
rg -A 5 "timestamp.*analytics"
# Check for any timestamp-related constants or defaults in analytics
rg -A 3 "DEFAULT_TIMESTAMP|EPOCH_TIMESTAMP"
Length of output: 7037
apps/analytics/src/modules/aggregated_analytics/save_aggregated_analytics.py
Show resolved
Hide resolved
apps/analytics/src/modules/aggregated_analytics/save_aggregated_analytics.py
Show resolved
Hide resolved
apps/analytics/src/modules/aggregated_analytics/save_aggregated_analytics.py
Show resolved
Hide resolved
apps/analytics/src/modules/aggregated_analytics/load_participant_analytics.py
Show resolved
Hide resolved
apps/analytics/src/modules/aggregated_analytics/compute_aggregated_analytics.py
Show resolved
Hide resolved
klicker-uzh Run #3760
Run Properties:
|
Project |
klicker-uzh
|
Branch Review |
aggregated-analytics
|
Run status |
Passed #3760
|
Run duration | 12m 04s |
Commit |
cfe3c1606b ℹ️: Merge 7113565689c06665aa5b8888b6951f87bace49c3 into 51bbba935c90b70aecdbb887dc56...
|
Committer | Julius Schlapbach |
View all properties for this run ↗︎ |
Test results | |
---|---|
Failures |
0
|
Flaky |
0
|
Pending |
0
|
Skipped |
0
|
Passing |
148
|
View all changes introduced in this branch ↗︎ |
IMPORTANT: In order to compute the aggregated analytics with this script, the participant analytics need to be computed and stored in the corresponding database table using the script, which has already been merged into the
v3-analytics
branchSummary by CodeRabbit