Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enhance(apps/analytics): add computation logic for participant activity performance #4409

Merged
merged 8 commits into from
Dec 23, 2024

Conversation

sjschlapbach
Copy link
Member

@sjschlapbach sjschlapbach commented Dec 23, 2024

Summary by CodeRabbit

  • New Features

    • Introduced new scripts for different environments in the analytics package.
    • Added a new Jupyter notebook for participant analytics.
    • Implemented new functions for processing and saving participant activity performance data.
  • Bug Fixes

    • Enhanced module imports to ensure accessibility of new functionalities.
  • Database Changes

    • Created a new table for participant activity performance with relevant fields and constraints.
    • Updated the schema to include relationships with participant and activity models.
  • Refactor

    • Modified existing models to streamline participant performance data management.

Copy link

aviator-app bot commented Dec 23, 2024

Current Aviator status

Aviator will automatically update this comment as the status of the PR changes.
Comment /aviator refresh to force Aviator to re-examine your PR (or learn about other /aviator commands).

This PR was merged manually (without Aviator). Merging manually can negatively impact the performance of the queue. Consider using Aviator next time.


See the real-time status of this PR on the Aviator webapp.
Use the Aviator Chrome Extension to see the status of your PR within GitHub.

@sjschlapbach sjschlapbach changed the base branch from v3 to v3-analytics December 23, 2024 07:12
Copy link

gitguardian bot commented Dec 23, 2024

️✅ There are no secrets present in this pull request anymore.

If these secrets were true positive and are still valid, we highly recommend you to revoke them.
While these secrets were previously flagged, we no longer have a reference to the
specific commits where they were detected. Once a secret has been leaked into a git
repository, you should consider it compromised, even if it was deleted immediately.
Find here more information about risks.


🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.

Copy link

coderabbitai bot commented Dec 23, 2024

Warning

Rate limit exceeded

@sjschlapbach has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 15 minutes and 25 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between da0bacf and 5b77e29.

📒 Files selected for processing (3)
  • apps/analytics/package.json (1 hunks)
  • apps/analytics/src/notebooks/participant_activity_performance.ipynb (1 hunks)
  • apps/analytics/src/notebooks/participant_analytics.ipynb (3 hunks)
📝 Walkthrough

Walkthrough

This pull request introduces a comprehensive enhancement to the analytics module for tracking participant activity performance. It includes the creation of a new database model ParticipantActivityPerformance, adding functionality to prepare, aggregate, and save performance data for practice quizzes and micro-learnings. The changes span multiple files across the analytics and Prisma packages, introducing new scripts, data processing functions, and database schema modifications to support detailed performance tracking.

Changes

File Change Summary
apps/analytics/package.json Added three new scripts for running Python modules with different Doppler configurations: script, script:prod, and script:qa
apps/analytics/src/modules/__init__.py Added import for participant_activity_performance module
apps/analytics/src/modules/participant_activity_performance/__init__.py Added imports for three new functions: prepare_participant_activity_data, save_participant_activity_performance, and agg_participant_activity_performance
apps/analytics/src/modules/participant_activity_performance/... Added new Python modules for preparing, aggregating, and saving participant activity performance data
apps/analytics/src/notebooks/participant_activity_performance.ipynb New Jupyter notebook for computing activity-specific participant analytics
packages/prisma/src/prisma/schema/... Created new ParticipantActivityPerformance model, updated Participant, PracticeQuiz, and MicroLearning models
packages/prisma/src/prisma/migrations/... Added SQL migration script for creating ParticipantActivityPerformance table

Possibly related PRs

Suggested reviewers

  • rschlaefli

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

cypress bot commented Dec 23, 2024

klicker-uzh    Run #3815

Run Properties:  status check failed Failed #3815  •  git commit f46a15b91b ℹ️: Merge 5b77e29a34586b8831026a9888c2332cbd627716 into 00be5b1b20750e056392d6e796f0...
Project klicker-uzh
Branch Review participant-activity-performance
Run status status check failed Failed #3815
Run duration 16m 51s
Commit git commit f46a15b91b ℹ️: Merge 5b77e29a34586b8831026a9888c2332cbd627716 into 00be5b1b20750e056392d6e796f0...
Committer Julius Schlapbach
View all properties for this run ↗︎

Test results
Tests that failed  Failures 11
Tests that were flaky  Flaky 0
Tests that did not run due to a developer annotating a test with .skip  Pending 0
Tests that did not run due to a failure in a mocha hook  Skipped 0
Tests that passed  Passing 137
View all changes introduced in this branch ↗︎

Tests for review

Failed  cypress/e2e/G-microlearning-workflow.cy.ts • 11 failed tests

View Output

Test Artifacts
Different microlearning workflows > Edit the running microlearnings content Test Replay Screenshots
Different microlearning workflows > Duplicate a microlearning and check the editors content Test Replay Screenshots
Different microlearning workflows > Check if the drafted microlearning can be accessed by the lecturer through the activity preview Test Replay Screenshots
Different microlearning workflows > Publish a microlearning that will immediately be running Test Replay Screenshots
Different microlearning workflows > Check if the running microlearning can be accessed by the lecturer through the activity preview Test Replay Screenshots
Different microlearning workflows > Extend the running microlearning Test Replay Screenshots
Different microlearning workflows > Respond to the first stack of the running microlearning from a laptop Test Replay Screenshots
Different microlearning workflows > Check that the student's previous response is correctly loaded and respond to the second stack Test Replay Screenshots
Different microlearning workflows > End the running microlearning Test Replay Screenshots
Different microlearning workflows > Cleanup: Delete the running microlearning to avoid name collisions Test Replay Screenshots
The first 10 failed tests are shown, see all 11 tests in Cypress Cloud.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Nitpick comments (6)
apps/analytics/src/modules/__init__.py (1)

8-8: Consider avoiding wildcard imports.
Using wildcard imports (e.g., from .participant_activity_performance import *) can lead to namespace pollution and hamper clarity about which functions are exported. It is often considered best practice to import specific functions or classes.

🧰 Tools
🪛 Ruff (0.8.2)

8-8: from .participant_activity_performance import * used; unable to detect undefined names

(F403)

apps/analytics/src/modules/participant_activity_performance/save_participant_activity_performance.py (1)

1-35: Handle unknown activity types or invalid rows gracefully.
Currently, the function supports only "practiceQuizzes" or "microLearnings". If an unknown activity type is passed in, the function silently does nothing. Consider adding a default branch or error handling to account for unexpected input. Additionally, if the DataFrame row is missing keys like "activityId" or "participantId", an exception will occur. Paranoid checks could prevent potential runtime issues.

apps/analytics/src/modules/participant_activity_performance/agg_participant_activity_performance.py (1)

20-20: Rename unused loop variable 'idx' to '_idx'.
The variable 'idx' is not used in the loop body. Renaming it to '_idx' or removing it would make the code clearer and address the linter's suggestion.

Here's a sample diff:

- for idx, activity in df_activities.iterrows():
+ for _idx, activity in df_activities.iterrows():
🧰 Tools
🪛 Ruff (0.8.2)

20-20: Loop control variable idx not used within loop body

Rename unused idx to _idx

(B007)

packages/prisma/src/prisma/schema/quiz.prisma (1)

217-221: Promote unified approach for activity performance tracking.

Like in PracticeQuiz, the MicroLearning model also receives participantPerformances, possibly coexisting with performance and progress fields. Clarify if these older fields remain relevant or can be deprecated to avoid partial duplication of data.

apps/analytics/src/notebooks/participant_activity_performance.ipynb (1)

57-258: Thorough logic, but watch out for large data sets.

The loop over each course aggregates participant activity. For large courses, DataFrame concatenations may be memory-intensive. Consider chunking or an alternative approach if performance becomes a concern.

apps/analytics/src/modules/participant_activity_performance/__init__.py (1)

1-3: Add explicit exports declaration.

To address the static analysis warnings and make the module's public interface explicit, consider adding an __all__ declaration:

 from .prepare_participant_activity_data import prepare_participant_activity_data
 from .save_participant_activity_performance import save_participant_activity_performance
 from .agg_participant_activity_performance import agg_participant_activity_performance
+
+__all__ = [
+    "prepare_participant_activity_data",
+    "save_participant_activity_performance",
+    "agg_participant_activity_performance",
+]
🧰 Tools
🪛 Ruff (0.8.2)

1-1: .prepare_participant_activity_data.prepare_participant_activity_data imported but unused; consider removing, adding to __all__, or using a redundant alias

(F401)


2-2: .save_participant_activity_performance.save_participant_activity_performance imported but unused; consider removing, adding to __all__, or using a redundant alias

(F401)


3-3: .agg_participant_activity_performance.agg_participant_activity_performance imported but unused; consider removing, adding to __all__, or using a redundant alias

(F401)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 00be5b1 and da0bacf.

📒 Files selected for processing (11)
  • apps/analytics/package.json (1 hunks)
  • apps/analytics/src/modules/__init__.py (1 hunks)
  • apps/analytics/src/modules/participant_activity_performance/__init__.py (1 hunks)
  • apps/analytics/src/modules/participant_activity_performance/agg_participant_activity_performance.py (1 hunks)
  • apps/analytics/src/modules/participant_activity_performance/prepare_participant_activity_data.py (1 hunks)
  • apps/analytics/src/modules/participant_activity_performance/save_participant_activity_performance.py (1 hunks)
  • apps/analytics/src/notebooks/participant_activity_performance.ipynb (1 hunks)
  • packages/prisma/src/prisma/migrations/20241223092511_participant_activity_performance/migration.sql (1 hunks)
  • packages/prisma/src/prisma/schema/analytics.prisma (1 hunks)
  • packages/prisma/src/prisma/schema/participant.prisma (1 hunks)
  • packages/prisma/src/prisma/schema/quiz.prisma (2 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
apps/analytics/src/modules/__init__.py

8-8: from .participant_activity_performance import * used; unable to detect undefined names

(F403)

apps/analytics/src/modules/participant_activity_performance/agg_participant_activity_performance.py

20-20: Loop control variable idx not used within loop body

Rename unused idx to _idx

(B007)

apps/analytics/src/modules/participant_activity_performance/__init__.py

1-1: .prepare_participant_activity_data.prepare_participant_activity_data imported but unused; consider removing, adding to __all__, or using a redundant alias

(F401)


2-2: .save_participant_activity_performance.save_participant_activity_performance imported but unused; consider removing, adding to __all__, or using a redundant alias

(F401)


3-3: .agg_participant_activity_performance.agg_participant_activity_performance imported but unused; consider removing, adding to __all__, or using a redundant alias

(F401)

🔇 Additional comments (8)
apps/analytics/src/modules/participant_activity_performance/agg_participant_activity_performance.py (2)

59-83: Ensure consistent approach for 'empty' vs. 'non-empty' responses.
Currently, "empty" scenarios set completion to 0, while "non-empty" ones dynamically compute completion as responseCount / instanceCount. Ensure this approach aligns with all business rules for measuring participant "completion" in partial or borderline cases.
[approve]


8-17: Validate grouping logic for large datasets.
When grouping a large number of responses, memory usage can grow significantly. Consider verifying that data volumes remain within acceptable levels or implementing chunked processing.

packages/prisma/src/prisma/schema/participant.prisma (1)

36-54: Ensure consistency for newly consolidated performance fields.

These fields still reference multiple legacy relationships (e.g., questionResponses, feedbacks, etc.) alongside the new participantActivityPerformances. If the intent was to remove or refactor various participant relationships into the consolidated participantActivityPerformances, please verify that the old fields are indeed needed.

packages/prisma/src/prisma/schema/quiz.prisma (1)

32-36: Confirm legacy performance fields vs. participantPerformances.

The addition of participantPerformances might overlap with existing fields such as performance or progress. Ensure these older fields (e.g., performance = ActivityPerformance?) are either retained intentionally or removed if they are superseded.

✅ Verification successful

Let me gather information about the relationship between these fields by examining the schema and related code.


Based on the code analysis, I can now provide a definitive response:

The fields serve different purposes and should be retained.

The codebase shows that:

  • participantPerformances tracks individual participant performance metrics
  • performance (ActivityPerformance) tracks aggregate performance metrics for the entire activity, including error rates and correct rates
  • progress (ActivityProgress) tracks activity completion statistics (started, completed, repeated counts)

Each field serves a distinct purpose in the analytics system, as evidenced by their separate usage in the frontend components (ActivityProgressPlot, PerformanceRates) and GraphQL operations. They are not redundant but rather complementary metrics tracking different aspects of activity engagement and performance.

🏁 Scripts executed

The following scripts were executed for the analysis:

Error: No shell scripts found


Script:

#!/bin/bash
# First, let's look at the complete quiz.prisma schema to understand the context
cat packages/prisma/src/prisma/schema/quiz.prisma

# Then, let's search for any references to these fields in the codebase
rg "ActivityPerformance" --type ts -A 3
rg "ParticipantActivityPerformance" --type ts -A 3
rg "ActivityProgress" --type ts -A 3

Length of output: 32460

packages/prisma/src/prisma/schema/analytics.prisma (1)

251-268: Double-check foreign key uniqueness constraints.

The ParticipantActivityPerformance model includes unique constraints on (participantId, practiceQuizId) and (participantId, microLearningId). This design makes sense for a single performance record per participant per quiz. However, ensure no concurrency issues arise if multiple upsert operations target the same record simultaneously.

apps/analytics/src/notebooks/participant_activity_performance.ipynb (2)

1-11: Good descriptive introduction.

You provide a clear explanation of how completion and totalScore are calculated. The high-level purpose is well documented and helps future contributors.


260-267: Proper resource cleanup.

Disconnecting from the database is a best practice to avoid lingering connections. This final cell is well-placed.

packages/prisma/src/prisma/migrations/20241223092511_participant_activity_performance/migration.sql (1)

19-26: Consider the impact of CASCADE deletions.

The CASCADE deletion on foreign keys means that when a Participant, PracticeQuiz, or MicroLearning is deleted, all related performance records will be automatically deleted. Verify if this aligns with the data retention requirements.

✅ Verification successful

ON DELETE CASCADE is appropriate for this performance tracking system

The cascade deletion pattern is consistent throughout the application's data model, particularly for analytics and performance tracking. This is appropriate because:

  1. ParticipantActivityPerformance records are purely analytical data that have no value without their parent entities (Participant, PracticeQuiz, or MicroLearning).
  2. The pattern matches existing cascade deletions for similar analytics tables like:
    • ParticipantPerformance
    • InstancePerformance
    • ActivityPerformance
    • ParticipantAnalytics

While there are privacy-related files (mentions of privacy policy in CHANGELOG.md), there are no explicit data retention requirements that would conflict with the cascade deletion pattern. The cascade deletion ensures clean data management by automatically removing orphaned analytics data when the primary entities are deleted.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check for data retention policies or requirements
fd . -e md -e txt | xargs rg -l "retention|GDPR|backup"

Length of output: 57


Script:

#!/bin/bash
# Let's check for any configuration files or documentation that might contain data retention policies
fd -e yaml -e yml -e json -e env -e config -e md | xargs rg -i "retention|gdpr|data.*policy|privacy"

# Also check for any schema definitions that might give us context about the data model
fd schema.prisma | xargs cat

# Let's also check for any existing cascade delete patterns in migrations
fd -e sql | xargs rg "ON DELETE CASCADE"

Length of output: 38075

@sjschlapbach sjschlapbach merged commit ebbb581 into v3-analytics Dec 23, 2024
11 of 12 checks passed
@sjschlapbach sjschlapbach deleted the participant-activity-performance branch December 23, 2024 10:28
Copy link

sonarqubecloud bot commented Dec 23, 2024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

1 participant