enhance(apps/analytics): add computation logic for participant course performance #4390

sjschlapbach · 2024-12-05T16:04:17Z

Summary by CodeRabbit

New Features
- Introduced new analytics capabilities for tracking participant performance and activity metrics across various components.
- Added new Jupyter notebooks for computing aggregated analytics and participant analytics.
- Implemented URL structure changes for quizzes, enhancing navigation and user experience.
Bug Fixes
- Resolved inconsistencies in routing paths for accessing quiz-related functionalities.
Documentation
- Updated README files and localization entries to reflect new features and changes.
Style
- Enhanced UI elements and navigation components for better clarity and usability.
Tests
- Improved Cypress end-to-end tests for course and quiz workflows, ensuring robust validation of functionalities.

aviator-app · 2024-12-05T16:04:20Z

Current Aviator status

Aviator will automatically update this comment as the status of the PR changes.
Comment /aviator refresh to force Aviator to re-examine your PR (or learn about other /aviator commands).

This PR was merged manually (without Aviator). Merging manually can negatively impact the performance of the queue. Consider using Aviator next time.

See the real-time status of this PR on the Aviator webapp.

Use the Aviator Chrome Extension to see the status of your PR within GitHub.

coderabbitai · 2024-12-05T16:04:27Z

📝 Walkthrough

Walkthrough

This pull request includes modifications across various files, primarily focusing on the .gitignore file, the addition of new README documentation, and the introduction of several new scripts and functions related to analytics. Changes to the project structure include updates to package configurations, new database migration scripts, and enhancements to participant analytics functionality. Additionally, routing paths for quizzes have been updated throughout the frontend components, reflecting a shift from session-based to quiz-based navigation.

Changes

File Path	Change Summary
`.gitignore`	Added `!out/.gitkeep` to track the `.gitkeep` file in the `out/` directory; retained existing entries for Python cache, Node.js modules, etc.
`apps/analytics/README.md`	New README file added detailing the KlickerUZH Analytics service, including setup instructions and available commands.
`apps/analytics/package.json`	New `package.json` file created for the `@klicker-uzh/analytics` project, specifying project details and scripts for development.
`apps/analytics/pyproject.toml`	Updated authors list, modified dependency versions, added development dependencies, and changed task command for main execution.
`apps/analytics/src/__init__.py`	Added imports for all entities from `modules` and `notebooks` submodules.
`apps/analytics/src/modules/__init__.py`	Consolidated imports from various analytics submodules into the module's namespace.
`apps/analytics/src/modules/aggregated_analytics/__init__.py`	Added imports for new analytics functions.
`apps/analytics/src/modules/aggregated_analytics/aggregate_participant_analytics.py`	Introduced `aggregate_participant_analytics` function for aggregating participant analytics.
`apps/analytics/src/modules/aggregated_analytics/compute_aggregated_analytics.py`	Added `compute_aggregated_analytics` function to load and aggregate participant analytics data.
`apps/analytics/src/modules/aggregated_analytics/load_participant_analytics.py`	Introduced functions to convert analytics data into DataFrames and load participant analytics from the database.
`apps/analytics/src/modules/aggregated_analytics/save_aggregated_analytics.py`	Created `save_aggregated_analytics` function to save aggregated analytics data to the database.
`apps/analytics/src/modules/aggregated_course_analytics/__init__.py`	Added import for `compute_weekday_activity` function.
`apps/analytics/src/modules/aggregated_course_analytics/compute_weekday_activity.py`	Introduced `compute_weekday_activity` function for calculating weekly participant activity.
`apps/analytics/src/modules/participant_analytics/__init__.py`	Added imports for various participant analytics functions.
`apps/analytics/src/modules/participant_analytics/aggregate_analytics.py`	Created `aggregate_analytics` function for processing participant response data.
`apps/analytics/src/modules/participant_analytics/compute_correctness.py`	Introduced functions for evaluating correctness of responses.
`apps/analytics/src/modules/participant_analytics/get_participant_responses.py`	Added functions to retrieve and process participant response data.
`apps/analytics/src/modules/participant_course_analytics/__init__.py`	Added imports for course-related analytics functions.
`apps/analytics/src/modules/participant_course_analytics/compute_participant_activity.py`	Introduced `compute_participant_activity` function for calculating participant activity metrics.
`apps/analytics/src/notebooks/`	Multiple new Jupyter notebooks added for various analytics computations.
`apps/frontend-control/next.config.mjs`	Added new asynchronous `redirects` function for routing changes.
`apps/frontend-control/src/components/`	Updated various components to change URL paths from `/sessions/...` to `/quizzes/...`.
`apps/frontend-manage/src/pages/`	Modified routing logic in multiple page components to reflect quiz-based navigation.
`packages/i18n/messages/`	Updated localization files to include new keys for quizzes and analytics.
`packages/prisma/src/prisma/migrations/`	Several migration scripts added or modified to update database schema for analytics tracking.
`packages/prisma/src/prisma/schema/`	Multiple schema updates to include new performance tracking fields and models.

Possibly related PRs

enhance: add deletion confirmation steps to live quiz abortion process #4293: Enhances user experience during live quiz abortion by adding deletion confirmation steps.
enhance: implement more detailed deletion confirmation modal for practice quizzes #4296: Implements a detailed deletion confirmation modal for practice quizzes.
enhance: implement more detailed deletion confirmation modal for group activities #4298: Enhances deletion confirmation modal for group activities.
enhance: resolve seed-dependency of group activity through extended test suite #4305: Resolves seed-dependency issues in the test suite.
refactor(cypress): improve structure of test suite and separate deletion steps #4307: Improves test suite structure and separates deletion steps.
fix: migrate numerical question evaluation on asynchronous activities to new histogram #4340: Migrates numerical question evaluation to a new histogram.
enhance: add direct forwarding for live quiz evaluation embedding #4359: Adds direct forwarding for live quiz evaluation embedding.
fix(apps/frontend-manage): make sure that runtime counter on live quiz cockpit does not trigger re-rendering of timeline #4364: Ensures runtime counter on live quiz cockpit does not trigger re-rendering.
chore: fix test suite after combination of extended tests with new live quiz URLs #4369: Fixes the test suite after combining extended tests with new live quiz URLs.
enhance(apps/analytics): add scripts for the computation of aggregated analytics #4385: Adds scripts for computation of aggregated analytics.
fix(apps/analytics): ensure that correct response data is queried for course-duration participant analytics #4386: Ensures correct response data is queried for course-duration participant analytics.
chore(apps/analytics): add database tables for learning analytics progress and performance computations #4389: Adds database tables for learning analytics progress and performance computations.

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

gitguardian · 2024-12-05T16:04:27Z

️✅ There are no secrets present in this pull request anymore.

If these secrets were true positive and are still valid, we highly recommend you to revoke them.
Once a secret has been leaked into a git repository, you should consider it compromised, even if it was deleted immediately.
Find here more information about risks.

^{_{🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.}}

sonarqubecloud · 2024-12-05T16:08:05Z

Quality Gate failed

Failed conditions
E Security Rating on New Code (required ≥ A)

See analysis details on SonarQube Cloud

Catch issues before they fail your Quality Gate with our IDE extension SonarQube for IDE

coderabbitai

Caution

Inline review comments failed to post. This is likely due to GitHub's limits when posting large numbers of comments.

🛑 Comments failed to post (56)

packages/prisma/src/prisma/migrations/20240826113537_participant_analytics_correctness_counts/migration.sql (2)
4-6: ⚠️ Potential issue

Consider implementing a data migration strategy

The migration drops columns containing historical analytics data without preserving or transforming the existing values. This could lead to loss of important historical performance metrics.

Consider adding a data transformation step to convert the mean values into the new count-based format before dropping the columns. For example:
-- First, create new columns
ALTER TABLE "ParticipantAnalytics"
ADD COLUMN "firstCorrectCount" REAL,
ADD COLUMN "lastCorrectCount" REAL;

-- Then populate them with transformed data (example)
UPDATE "ParticipantAnalytics"
SET "firstCorrectCount" = "meanFirstCorrectCount" * total_attempts,
    "lastCorrectCount" = "meanLastCorrectCount" * total_attempts;

-- Finally drop old columns
ALTER TABLE "ParticipantAnalytics"
DROP COLUMN "meanFirstCorrectCount",
DROP COLUMN "meanLastCorrectCount";
13-16: 🛠️ Refactor suggestion

Consider using INTEGER type for count columns

The columns firstCorrectCount, firstWrongCount, lastCorrectCount, and lastWrongCount are designed to store count data but use the REAL type. Since these appear to be counting discrete occurrences, INTEGER might be more appropriate to prevent floating-point precision issues and ensure data integrity.

Consider modifying the column types:
-ADD COLUMN     "firstCorrectCount" REAL,
-ADD COLUMN     "firstWrongCount" REAL,
-ADD COLUMN     "lastCorrectCount" REAL,
-ADD COLUMN     "lastWrongCount" REAL;
+ADD COLUMN     "firstCorrectCount" INTEGER NOT NULL DEFAULT 0,
+ADD COLUMN     "firstWrongCount" INTEGER NOT NULL DEFAULT 0,
+ADD COLUMN     "lastCorrectCount" INTEGER NOT NULL DEFAULT 0,
+ADD COLUMN     "lastWrongCount" INTEGER NOT NULL DEFAULT 0;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
ADD COLUMN     "firstCorrectCount" INTEGER NOT NULL DEFAULT 0,
ADD COLUMN     "firstWrongCount" INTEGER NOT NULL DEFAULT 0,
ADD COLUMN     "lastCorrectCount" INTEGER NOT NULL DEFAULT 0,
ADD COLUMN     "lastWrongCount" INTEGER NOT NULL DEFAULT 0;
apps/frontend-control/tsconfig.json (1)
20-20: 💡 Codebase verification

Consider removing the "~/*" path alias as it's currently unused

The codebase scan reveals that:

No imports are currently using the "~/*" path alias

Existing imports use either relative paths (../components/) or package imports (@klicker-uzh/)

The addition of an unused path alias may create confusion about the preferred import pattern

Recommendation: Either remove the unused "~/*" path alias or document its intended usage pattern if it's being added for future use.

🔗 Analysis chain

Verify consistent usage of the new path alias.

The addition of the "~/*" path alias is valid, but let's ensure it's used consistently across the codebase.
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Check for mixed usage of direct 'src/' imports vs '~/' alias
# to ensure consistent import patterns

echo "Checking for direct src/ imports..."
rg "from\s+'\.\.*/src/" "apps/frontend-control/src"

echo "Checking for usage of new ~/ alias..."
rg "from\s+'~/" "apps/frontend-control/src"
Length of output: 267

Script:
#!/bin/bash
# Let's try a broader search pattern to catch all import variations
echo "Checking for any imports from src directory..."
rg -t ts -t tsx "from ['\"].*src/" apps/frontend-control/src

echo "Checking for any imports using ~ alias..."
rg -t ts -t tsx "from ['\"]~/" apps/frontend-control/src

# Let's also check import statements at the start of lines
echo "Checking for imports at line start..."
rg -t ts -t tsx "^import .* from ['\"]~/" apps/frontend-control/src

# Check if there are any relative path imports that could be simplified with ~
echo "Checking for deep relative imports..."
rg -t ts -t tsx "from ['\"]\.\./.+/" apps/frontend-control/src
Length of output: 742

Script:
#!/bin/bash
# Let's try without file type filters since tsx is not recognized
echo "Checking for any imports from src directory..."
rg "from ['\"].*src/" apps/frontend-control/src

echo "Checking for any imports using ~ alias..."
rg "from ['\"]~/" apps/frontend-control/src

# Let's also check import statements at line start
echo "Checking for imports at line start..."
rg "^import .* from ['\"]~/" apps/frontend-control/src

# Check if there are any relative path imports that could be simplified with ~
echo "Checking for deep relative imports..."
rg "from ['\"]\.\./.+/" apps/frontend-control/src

# Let's also check if the src directory exists and list its contents
echo "Checking src directory structure..."
ls -R apps/frontend-control/src
Length of output: 3787
packages/prisma/src/prisma/migrations/20240827152536_computed_at_analytics/migration.sql (2)
1-11: ⚠️ Potential issue

Critical: Migration will fail for non-empty tables

The migration attempts to add required updatedAt columns without default values to multiple tables. This will fail if any of these tables contain existing records.

Consider this safer approach:

First migration:
-- Add columns as nullable
ALTER TABLE "AggregatedAnalytics" ADD COLUMN "updatedAt" TIMESTAMP(3);
-- Repeat for other tables

-- Set initial values
UPDATE "AggregatedAnalytics" SET "updatedAt" = CURRENT_TIMESTAMP;
-- Repeat for other tables
Second migration:
-- Make columns NOT NULL
ALTER TABLE "AggregatedAnalytics" ALTER COLUMN "updatedAt" SET NOT NULL;
-- Repeat for other tables
13-15: 🛠️ Refactor suggestion

Consider using consistent datetime types

computedAt uses DATE while createdAt/updatedAt use TIMESTAMP(3). This inconsistency might cause issues:

DATE loses time information

Different timezone handling between types

Potential comparison issues in queries

Consider using TIMESTAMP(3) consistently:
-"computedAt" DATE NOT NULL DEFAULT CURRENT_TIMESTAMP,
+"computedAt" TIMESTAMP(3) NOT NULL DEFAULT CURRENT_TIMESTAMP,
Also applies to: 30-32
packages/prisma/src/prisma/migrations/20241204165711_update_daily_activity_types/migration.sql (1)

1-8: ⚠️ Potential issue

⚠️ Review data migration strategy before proceeding

The warnings indicate potential data loss and constraints that could cause migration failure:

All data in participantCount will be lost

Adding non-nullable courseParticipantCount requires empty table

Unique constraint on courseId may fail if duplicates exist

Consider the following safer migration approach:

First migration:

Add nullable courseParticipantCount

Copy data from participantCount to courseParticipantCount

Second migration:

Make courseParticipantCount non-nullable

Drop participantCount

Third migration:

Add unique constraint after cleaning duplicates

Would you like me to help generate these separate migration files?

apps/frontend-manage/src/pages/analytics/index.tsx (1)

4-6: ⚠️ Potential issue

Implementation appears incomplete for participant course performance analytics

The current implementation is a placeholder and doesn't align with the PR objective of adding computation logic for participant course performance. The component should include:

Data fetching for participant course data

Performance computation visualization

Interactive analytics dashboard elements

Would you like help implementing the participant performance analytics features? I can help create:

Data fetching hooks for course performance

Performance computation components

Analytics visualization components
apps/analytics/src/modules/participant_analytics/compute_participant_course_analytics.py (1)
19-19: 🛠️ Refactor suggestion

Remove redundant assignment to course_id

The variable course_id is assigned twice. The second assignment at line 19 is unnecessary since course_id was already assigned at line 16. Removing the redundant assignment improves code clarity.

Apply this diff to remove the redundant assignment:
-        course_id = course["id"]
Committable suggestion skipped: line range outside the PR's diff.
apps/analytics/src/modules/participant_analytics/save_participant_analytics.py (1)
35-36: ⚠️ Potential issue

Specify update data in upsert operation

In the upsert operation for analytics types "DAILY", "WEEKLY", and "MONTHLY", the update field is empty. This means that if a record already exists, it won't be updated with new data. To ensure existing records are updated, specify the fields that need to be updated.

Apply this diff to include the update data:
                     "update": {
+                        "computedAt": computedAt,
+                        "trialsCount": row["trialsCount"],
+                        "responseCount": row["responseCount"],
+                        "totalScore": row["totalScore"],
+                        "totalPoints": row["totalPoints"],
+                        "totalXp": row["totalXp"],
+                        "meanCorrectCount": row["meanCorrectCount"],
+                        "meanPartialCorrectCount": row["meanPartialCount"],
+                        "meanWrongCount": row["meanWrongCount"],
                     },
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
                    "update": {
                        "computedAt": computedAt,
                        "trialsCount": row["trialsCount"],
                        "responseCount": row["responseCount"],
                        "totalScore": row["totalScore"],
                        "totalPoints": row["totalPoints"],
                        "totalXp": row["totalXp"],
                        "meanCorrectCount": row["meanCorrectCount"],
                        "meanPartialCorrectCount": row["meanPartialCount"],
                        "meanWrongCount": row["meanWrongCount"],
                    },
apps/analytics/src/modules/aggregated_analytics/save_aggregated_analytics.py (2)
28-30: ⚠️ Potential issue

Address TODO for totalElementsAvailable calculation

There's a TODO comment indicating that totalElementsAvailable is set to an invalid value -1 and needs to be correctly computed for rolling updates in production code. This placeholder should be replaced with the actual computation to ensure accurate analytics.

Would you like assistance in implementing the logic to calculate totalElementsAvailable for rolling updates?

40-73: 🛠️ Refactor suggestion

Optimize data fetching for course elements

The current implementation fetches all practice quizzes and microlearnings along with their stacks and elements to compute totalElementsAvailable. This may lead to performance issues due to the large amount of data being retrieved.

Consider fetching only the count of elements rather than the entire elements data. You can modify the query to aggregate counts directly in the database.

Example:
# Modify the query to get counts instead of full data
course = db.course.find_unique_or_raise(
    where={"id": row["courseId"]},
    include={
        "practiceQuizzes": {
            "include": {
                "stacks": {
                    "select": {"elements": {"select": {"_count": True}}}
                }
            }
        },
        "microLearnings": {
            "include": {
                "stacks": {
                    "select": {"elements": {"select": {"_count": True}}}
                }
            }
        },
    },
)
Then, adjust the logic to sum up the counts without loading all element details.
apps/analytics/src/modules/aggregated_analytics/aggregate_participant_analytics.py (1)
3-7: 🛠️ Refactor suggestion

Add input validation for required columns.

The function assumes certain columns exist but doesn't validate their presence.
     if df_participant_analytics.empty:
         if verbose:
             print("No participant analytics to aggregate")
         return None
+
+    required_columns = {"courseId", "id", "responseCount", "totalScore", "totalPoints", "totalXp"}
+    missing_columns = required_columns - set(df_participant_analytics.columns)
+    if missing_columns:
+        raise ValueError(f"Missing required columns: {missing_columns}")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
    if df_participant_analytics.empty:
        if verbose:
            print("No participant analytics to aggregate")
        return None

    required_columns = {"courseId", "id", "responseCount", "totalScore", "totalPoints", "totalXp"}
    missing_columns = required_columns - set(df_participant_analytics.columns)
    if missing_columns:
        raise ValueError(f"Missing required columns: {missing_columns}")
apps/analytics/src/modules/aggregated_analytics/load_participant_analytics.py (3)
4-10: 🛠️ Refactor suggestion

Add type hints and error handling to the conversion function.

The function should include type hints and handle potential conversion errors.
-def convert_to_df(analytics):
+from typing import List, Dict, Any
+
+def convert_to_df(analytics: List[Dict[str, Any]]) -> pd.DataFrame:
+    """Convert database analytics records to a pandas DataFrame.
+
+    Args:
+        analytics (List[Dict[str, Any]]): List of analytics records
+
+    Returns:
+        pd.DataFrame: Converted DataFrame
+    """
     # convert the database query result into a pandas dataframe
-    rows = []
-    for item in analytics:
-        rows.append(dict(item))
+    try:
+        return pd.DataFrame([dict(item) for item in analytics])
+    except (AttributeError, TypeError) as e:
+        raise ValueError(f"Failed to convert analytics to DataFrame: {e}")
-    return pd.DataFrame(rows)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
from typing import List, Dict, Any

def convert_to_df(analytics: List[Dict[str, Any]]) -> pd.DataFrame:
    """Convert database analytics records to a pandas DataFrame.

    Args:
        analytics (List[Dict[str, Any]]): List of analytics records

    Returns:
        pd.DataFrame: Converted DataFrame
    """
    # convert the database query result into a pandas dataframe
    try:
        return pd.DataFrame([dict(item) for item in analytics])
    except (AttributeError, TypeError) as e:
        raise ValueError(f"Failed to convert analytics to DataFrame: {e}")
13-16: 🛠️ Refactor suggestion

Add error handling for database operations.

The function should handle potential database errors gracefully.
+from prisma.models import ParticipantAnalytics
+from prisma import Prisma
+
-def load_participant_analytics(db, timestamp, analytics_type, verbose=False):
+def load_participant_analytics(
+    db: Prisma,
+    timestamp: str,
+    analytics_type: str,
+    verbose: bool = False
+) -> pd.DataFrame:
+    """Load participant analytics from the database.
+
+    Args:
+        db (Prisma): Database connection
+        timestamp (str): Timestamp to filter analytics
+        analytics_type (str): Type of analytics to load
+        verbose (bool, optional): Enable verbose output. Defaults to False.
+
+    Returns:
+        pd.DataFrame: Loaded analytics data
+    """
+    try:
         participant_analytics = db.participantanalytics.find_many(
             where={"timestamp": timestamp, "type": analytics_type},
         )
+    except Exception as e:
+        raise RuntimeError(f"Failed to load participant analytics: {e}")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
from prisma.models import ParticipantAnalytics
from prisma import Prisma

def load_participant_analytics(
    db: Prisma,
    timestamp: str,
    analytics_type: str,
    verbose: bool = False
) -> pd.DataFrame:
    """Load participant analytics from the database.

    Args:
        db (Prisma): Database connection
        timestamp (str): Timestamp to filter analytics
        analytics_type (str): Type of analytics to load
        verbose (bool, optional): Enable verbose output. Defaults to False.

    Returns:
        pd.DataFrame: Loaded analytics data
    """
    try:
        participant_analytics = db.participantanalytics.find_many(
            where={"timestamp": timestamp, "type": analytics_type},
        )
    except Exception as e:
        raise RuntimeError(f"Failed to load participant analytics: {e}")
20-24: ⚠️ Potential issue

Fix undefined variables in format string.

The format string references undefined variables start_date and end_date.
         print(
             "Found {} analytics for the timespan from {} to {}".format(
-                len(participant_analytics), start_date, end_date
+                len(participant_analytics), timestamp, analytics_type
             )
         )
Committable suggestion skipped: line range outside the PR's diff.

🧰 Tools

🪛 Ruff (0.8.0)

22-22: Undefined name start_date

(F821)

22-22: Undefined name end_date

(F821)
apps/analytics/src/modules/participant_course_analytics/save_participant_course_analytics.py (2)
1-2: 🛠️ Refactor suggestion

Add type hints and input validation.

The function parameters lack type hints and input validation. Consider adding type hints and validating the DataFrame structure before processing.
-def save_participant_course_analytics(db, df_activity):
+from pandas import DataFrame
+from prisma import Prisma
+
+def save_participant_course_analytics(db: Prisma, df_activity: DataFrame) -> None:
+    required_columns = ['courseId', 'participantId', 'activeWeeks', 'activeDaysPerWeek', 
+                       'meanElementsPerDay', 'activityLevel']
+    if not all(col in df_activity.columns for col in required_columns):
+        raise ValueError(f"DataFrame missing required columns: {required_columns}")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
from pandas import DataFrame
from prisma import Prisma

def save_participant_course_analytics(db: Prisma, df_activity: DataFrame) -> None:
    required_columns = ['courseId', 'participantId', 'activeWeeks', 'activeDaysPerWeek', 
                       'meanElementsPerDay', 'activityLevel']
    if not all(col in df_activity.columns for col in required_columns):
        raise ValueError(f"DataFrame missing required columns: {required_columns}")
    for _, row in df_activity.iterrows():
2-26: ⚠️ Potential issue

Add error handling and logging mechanism.

The database operations lack error handling and logging. Consider wrapping the upsert operation in a try-catch block and adding logging.
+import logging
+from prisma.errors import PrismaError
+
 def save_participant_course_analytics(db: Prisma, df_activity: DataFrame) -> None:
+    logger = logging.getLogger(__name__)
     for _, row in df_activity.iterrows():
-        db.participantcourseanalytics.upsert(
-            where={
+        try:
+            db.participantcourseanalytics.upsert(
+                where={
+                    "courseId_participantId": {
+                        "courseId": row["courseId"],
+                        "participantId": row["participantId"],
+                    }
+                },
+                data={
+                    "create": {
+                        "activeWeeks": row["activeWeeks"],
+                        "activeDaysPerWeek": row["activeDaysPerWeek"],
+                        "meanElementsPerDay": row["meanElementsPerDay"],
+                        "activityLevel": row["activityLevel"],
+                        "course": {"connect": {"id": row["courseId"]}},
+                        "participant": {"connect": {"id": row["participantId"]}},
+                    },
+                    "update": {
+                        "activeWeeks": row["activeWeeks"],
+                        "activeDaysPerWeek": row["activeDaysPerWeek"],
+                        "meanElementsPerDay": row["meanElementsPerDay"],
+                        "activityLevel": row["activityLevel"],
+                    },
+                },
+            )
+        except PrismaError as e:
+            logger.error(f"Failed to upsert analytics for course {row['courseId']} and participant {row['participantId']}: {str(e)}")
+            raise
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
    import logging
    from prisma.errors import PrismaError

    def save_participant_course_analytics(db: Prisma, df_activity: DataFrame) -> None:
        logger = logging.getLogger(__name__)
        for _, row in df_activity.iterrows():
            try:
                db.participantcourseanalytics.upsert(
                    where={
                        "courseId_participantId": {
                            "courseId": row["courseId"],
                            "participantId": row["participantId"],
                        }
                    },
                    data={
                        "create": {
                            "activeWeeks": row["activeWeeks"],
                            "activeDaysPerWeek": row["activeDaysPerWeek"],
                            "meanElementsPerDay": row["meanElementsPerDay"],
                            "activityLevel": row["activityLevel"],
                            "course": {"connect": {"id": row["courseId"]}},
                            "participant": {"connect": {"id": row["participantId"]}},
                        },
                        "update": {
                            "activeWeeks": row["activeWeeks"],
                            "activeDaysPerWeek": row["activeDaysPerWeek"],
                            "meanElementsPerDay": row["meanElementsPerDay"],
                            "activityLevel": row["activityLevel"],
                        },
                    },
                )
            except PrismaError as e:
                logger.error(f"Failed to upsert analytics for course {row['courseId']} and participant {row['participantId']}: {str(e)}")
                raise
apps/frontend-control/next.config.mjs (2)
9-37: 💡 Codebase verification

Found inconsistencies in session path usage

There are still active references to /session/ (singular) paths in the codebase while the redirects are set up for /sessions/ (plural):

LiveQuizLists.tsx: Uses link={/session/${quiz.id}}

StartModal.tsx: Uses router.push(/session/${quizId})

These paths won't be handled by the current redirect rules, potentially leading to 404 errors. Either:

Update the redirect rules to include singular form (/session/) paths, or

Update the components to use the new plural form (/quizzes/) directly

🔗 Analysis chain

Verify SEO impact of permanent redirects.

The use of permanent (301) redirects will inform search engines to permanently update their indexes. Ensure this is the intended behavior as it may impact SEO.
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Check for any remaining references to old session paths in the codebase

# Search for any remaining references to /sessions in components, pages, or documentation
echo "Checking for remaining references to /sessions paths..."
rg -g '!{*.test.*,*.spec.*}' -g '!next.config.*' '/sessions[/"']' ./apps/frontend-control/

# Check if there are any external links pointing to the old paths
echo "Checking for external links in documentation..."
rg -g '*.md' -g '*.mdx' '\[.*\]$.*\/sessions[/"'$]' .
Length of output: 238

Script:
#!/bin/bash
# Let's try a different approach to search for session-related paths

# Search for any remaining references to /sessions in components and pages
echo "Checking for remaining references to session paths..."
rg -g '!{*.test.*,*.spec.*,*.config.*}' '/sessions?' ./apps/frontend-control/

# Check for any router or navigation related code using sessions
echo "Checking for router/navigation code with sessions..."
rg -g '!{*.test.*,*.spec.*,*.config.*}' -i '(router|navigation|link).*session' ./apps/frontend-control/

# Look for any API endpoints or backend routes using sessions
echo "Checking for API endpoints with sessions..."
rg -g '!{*.test.*,*.spec.*,*.config.*}' '(api|route).*session' ./apps/frontend-control/
Length of output: 1174
9-37: 💡 Codebase verification

Found inconsistency between redirects and actual usage of session paths

The redirects are configured for /sessions/* but the codebase actually uses /session/* (singular) in components:

LiveQuizLists.tsx: uses /session/${quiz.id}

StartModal.tsx: uses /session/${quiz.id}

This mismatch means the redirects won't catch the actual paths being used in the application, potentially leading to 404 errors.

🔗 Analysis chain

Verify route coverage for all session-related paths.

Ensure all session-related paths are covered by the redirects to prevent 404 errors.

Let me gather more information by searching for any session-related references in the codebase using different patterns.
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Check for any session-related routes that might be missed in the redirects

# Search for route definitions in pages directory
echo "Checking for route definitions in pages directory..."
fd -e tsx -e ts -e js -e jsx . ./apps/frontend-control/src/pages \
  --exec grep -l "sessions"

# Search for router.push or Link components with session paths
echo "Checking for router navigation with session paths..."
ast-grep --pattern 'router.push("/sessions' ./apps/frontend-control/
ast-grep --pattern '<Link href="/sessions' ./apps/frontend-control/
Length of output: 461

Script:
#!/bin/bash
# Search for any session-related paths or routes in the codebase
echo "Searching for session-related paths in the codebase..."
rg -i "/sessions?/" ./apps/frontend-control/

# Search for any session-related strings in components and utilities
echo "Searching for session references in components..."
rg -i "sessions?" ./apps/frontend-control/src/ -g "!*.test.*" -g "!*.spec.*"

# Search for any router or navigation related code with sessions
echo "Searching for router/navigation patterns with sessions..."
ast-grep --pattern 'useRouter().push($path)' ./apps/frontend-control/
ast-grep --pattern 'navigate($path)' ./apps/frontend-control/
Length of output: 1763
apps/analytics/src/modules/participant_analytics/compute_participant_analytics.py (2)
7-9: 🛠️ Refactor suggestion

Add docstring and type hints.

The function lacks documentation and type hints. Add a comprehensive docstring explaining parameters, return type, and possible exceptions.
+from datetime import datetime
+from typing import Optional
+from prisma import Prisma
+import pandas as pd
+
 def compute_participant_analytics(
-    db, start_date, end_date, timestamp, analytics_type="DAILY", verbose=False
+    db: Prisma,
+    start_date: datetime,
+    end_date: datetime,
+    timestamp: datetime,
+    analytics_type: str = "DAILY",
+    verbose: bool = False
+) -> None:
+    """Compute and save participant analytics for a given date range.
+
+    Args:
+        db: Database connection instance
+        start_date: Start date for analytics computation
+        end_date: End date for analytics computation
+        timestamp: Timestamp for the analytics record
+        analytics_type: Type of analytics (default: "DAILY")
+        verbose: Enable verbose logging (default: False)
+
+    Raises:
+        ValueError: If date range is invalid or analytics_type is unsupported
+    """
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
from datetime import datetime
from typing import Optional
from prisma import Prisma
import pandas as pd

def compute_participant_analytics(
    db: Prisma,
    start_date: datetime,
    end_date: datetime,
    timestamp: datetime,
    analytics_type: str = "DAILY",
    verbose: bool = False
) -> None:
    """Compute and save participant analytics for a given date range.

    Args:
        db: Database connection instance
        start_date: Start date for analytics computation
        end_date: End date for analytics computation
        timestamp: Timestamp for the analytics record
        analytics_type: Type of analytics (default: "DAILY")
        verbose: Enable verbose logging (default: False)

    Raises:
        ValueError: If date range is invalid or analytics_type is unsupported
    """
10-20: ⚠️ Potential issue

Improve error handling and input validation.

Add input validation and enhance error handling for the data processing pipeline.
+    # Validate input parameters
+    if start_date > end_date:
+        raise ValueError("start_date must be before or equal to end_date")
+    
+    valid_types = ["DAILY", "WEEKLY", "MONTHLY"]
+    if analytics_type not in valid_types:
+        raise ValueError(f"analytics_type must be one of {valid_types}")
+
     df_details = get_participant_responses(db, start_date, end_date, verbose)
 
     # Compute the correctness of each question response detail
-    df_details, df_element_instances = compute_correctness(db, df_details, verbose)
+    try:
+        df_details, df_element_instances = compute_correctness(db, df_details, verbose)
+    except Exception as e:
+        if verbose:
+            print(f"Error computing correctness: {str(e)}")
+        raise
Committable suggestion skipped: line range outside the PR's diff.
apps/analytics/src/modules/aggregated_analytics/compute_aggregated_analytics.py (1)
31-34: 🛠️ Refactor suggestion

Add error handling for database operations.

The database operation could fail and should be wrapped in try-except to handle errors gracefully.
     # store the computed aggregated analytics in the database
     if df_aggregated_analytics is not None:
-        save_aggregated_analytics(
-            db, df_aggregated_analytics, timestamp, analytics_type
-        )
+        try:
+            save_aggregated_analytics(
+                db, df_aggregated_analytics, timestamp, analytics_type
+            )
+        except Exception as e:
+            print(f"Error saving aggregated analytics: {e}")
+            raise
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
    if df_aggregated_analytics is not None:
        try:
            save_aggregated_analytics(
                db, df_aggregated_analytics, timestamp, analytics_type
            )
        except Exception as e:
            print(f"Error saving aggregated analytics: {e}")
            raise
apps/analytics/src/modules/participant_performance/save_participant_performance.py (1)
1-30: 🛠️ Refactor suggestion

Refactor to improve error handling and reduce code duplication.

Several improvements can be made to enhance the robustness and maintainability of this function:

Add error handling for database operations

Reduce duplication between create and update blocks

Add type hints and docstring

Validate input DataFrame columns
+from typing import Dict, Any
+import pandas as pd
+
-def save_participant_performance(db, df_performance, course_id):
+def save_participant_performance(db, df_performance: pd.DataFrame, course_id: str):
+    """Save participant performance metrics to the database.
+
+    Args:
+        db: Database connection object
+        df_performance: DataFrame containing performance metrics
+        course_id: ID of the course
+
+    Raises:
+        ValueError: If required columns are missing from the DataFrame
+    """
+    required_columns = [
+        "participantId", "firstErrorRate", "firstPerformance",
+        "lastErrorRate", "lastPerformance", "totalErrorRate", "totalPerformance"
+    ]
+    missing_columns = [col for col in required_columns if col not in df_performance.columns]
+    if missing_columns:
+        raise ValueError(f"Missing required columns: {missing_columns}")
+
+    # Define common data fields to avoid duplication
+    def get_performance_data(row: pd.Series) -> Dict[str, Any]:
+        return {
+            "firstErrorRate": row["firstErrorRate"],
+            "firstPerformance": row["firstPerformance"],
+            "lastErrorRate": row["lastErrorRate"],
+            "lastPerformance": row["lastPerformance"],
+            "totalErrorRate": row["totalErrorRate"],
+            "totalPerformance": row["totalPerformance"],
+        }
+
     for _, row in df_performance.iterrows():
-        db.participantperformance.upsert(
-            where={
-                "participantId_courseId": {
-                    "participantId": row["participantId"],
-                    "courseId": course_id,
-                }
-            },
-            data={
-                "create": {
-                    "firstErrorRate": row["firstErrorRate"],
-                    "firstPerformance": row["firstPerformance"],
-                    "lastErrorRate": row["lastErrorRate"],
-                    "lastPerformance": row["lastPerformance"],
-                    "totalErrorRate": row["totalErrorRate"],
-                    "totalPerformance": row["totalPerformance"],
-                    "participant": {"connect": {"id": row["participantId"]}},
-                    "course": {"connect": {"id": course_id}},
+        try:
+            performance_data = get_performance_data(row)
+            db.participantperformance.upsert(
+                where={
+                    "participantId_courseId": {
+                        "participantId": row["participantId"],
+                        "courseId": course_id,
+                    }
                 },
-                "update": {
-                    "firstErrorRate": row["firstErrorRate"],
-                    "firstPerformance": row["firstPerformance"],
-                    "lastErrorRate": row["lastErrorRate"],
-                    "lastPerformance": row["lastPerformance"],
-                    "totalErrorRate": row["totalErrorRate"],
-                    "totalPerformance": row["totalPerformance"],
+                data={
+                    "create": {
+                        **performance_data,
+                        "participant": {"connect": {"id": row["participantId"]}},
+                        "course": {"connect": {"id": course_id}},
+                    },
+                    "update": performance_data,
                 },
-            },
-        )
+            )
+        except Exception as e:
+            print(f"Error upserting performance data for participant {row['participantId']}: {e}")
+            raise
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
from typing import Dict, Any
import pandas as pd

def save_participant_performance(db, df_performance: pd.DataFrame, course_id: str):
    """Save participant performance metrics to the database.

    Args:
        db: Database connection object
        df_performance: DataFrame containing performance metrics
        course_id: ID of the course

    Raises:
        ValueError: If required columns are missing from the DataFrame
    """
    required_columns = [
        "participantId", "firstErrorRate", "firstPerformance",
        "lastErrorRate", "lastPerformance", "totalErrorRate", "totalPerformance"
    ]
    missing_columns = [col for col in required_columns if col not in df_performance.columns]
    if missing_columns:
        raise ValueError(f"Missing required columns: {missing_columns}")

    # Define common data fields to avoid duplication
    def get_performance_data(row: pd.Series) -> Dict[str, Any]:
        return {
            "firstErrorRate": row["firstErrorRate"],
            "firstPerformance": row["firstPerformance"],
            "lastErrorRate": row["lastErrorRate"],
            "lastPerformance": row["lastPerformance"],
            "totalErrorRate": row["totalErrorRate"],
            "totalPerformance": row["totalPerformance"],
        }

    for _, row in df_performance.iterrows():
        try:
            performance_data = get_performance_data(row)
            db.participantperformance.upsert(
                where={
                    "participantId_courseId": {
                        "participantId": row["participantId"],
                        "courseId": course_id,
                    }
                },
                data={
                    "create": {
                        **performance_data,
                        "participant": {"connect": {"id": row["participantId"]}},
                        "course": {"connect": {"id": course_id}},
                    },
                    "update": performance_data,
                },
            )
        except Exception as e:
            print(f"Error upserting performance data for participant {row['participantId']}: {e}")
            raise
apps/analytics/src/modules/participant_performance/compute_performance_levels.py (1)
1-39: 🛠️ Refactor suggestion

Enhance robustness and maintainability of performance level computation.

Several improvements can be made:

Add type hints and docstring

Extract magic numbers as constants

Define performance levels as enums

Add input validation

Handle edge cases
+from enum import Enum
+from typing import Optional
+import pandas as pd
+
+class PerformanceLevel(str, Enum):
+    HIGH = "HIGH"
+    MEDIUM = "MEDIUM"
+    LOW = "LOW"
+
+# Quantile thresholds for performance levels
+Q1_THRESHOLD = 0.25
+Q3_THRESHOLD = 0.75
+
-def compute_performance_levels(df_performance):
+def compute_performance_levels(df_performance: pd.DataFrame) -> pd.DataFrame:
+    """Compute performance levels based on error rate quantiles.
+
+    Args:
+        df_performance: DataFrame containing error rate metrics
+
+    Returns:
+        DataFrame with added performance level columns
+
+    Raises:
+        ValueError: If required columns are missing or DataFrame is empty
+    """
+    required_columns = ["firstErrorRate", "lastErrorRate", "totalErrorRate"]
+    missing_columns = [col for col in required_columns if col not in df_performance.columns]
+    if missing_columns:
+        raise ValueError(f"Missing required columns: {missing_columns}")
+
+    if df_performance.empty:
+        raise ValueError("Cannot compute performance levels on empty DataFrame")
+
     # set the performance levels based on the quantiles
-    first_qs = df_performance.firstErrorRate.quantile([0.25, 0.75])
-    last_qs = df_performance.lastErrorRate.quantile([0.25, 0.75])
-    total_qs = df_performance.totalErrorRate.quantile([0.25, 0.75])
+    first_qs = df_performance.firstErrorRate.quantile([Q1_THRESHOLD, Q3_THRESHOLD])
+    last_qs = df_performance.lastErrorRate.quantile([Q1_THRESHOLD, Q3_THRESHOLD])
+    total_qs = df_performance.totalErrorRate.quantile([Q1_THRESHOLD, Q3_THRESHOLD])

-    first_q1 = first_qs[0.25]
-    first_q3 = first_qs[0.75]
-    last_q1 = last_qs[0.25]
-    last_q3 = last_qs[0.75]
-    total_q1 = total_qs[0.25]
-    total_q3 = total_qs[0.75]
+    first_q1 = first_qs[Q1_THRESHOLD]
+    first_q3 = first_qs[Q3_THRESHOLD]
+    last_q1 = last_qs[Q1_THRESHOLD]
+    last_q3 = last_qs[Q3_THRESHOLD]
+    total_q1 = total_qs[Q1_THRESHOLD]
+    total_q3 = total_qs[Q3_THRESHOLD]

     # set the performance levels based on the quantiles (inverse logic compared to activity - higher error rate is worse)
-    df_performance["firstPerformance"] = "MEDIUM"
+    df_performance["firstPerformance"] = PerformanceLevel.MEDIUM
     df_performance.loc[
-        df_performance.firstErrorRate <= first_q1, "firstPerformance"
-    ] = "HIGH"
+        df_performance.firstErrorRate <= first_q1, "firstPerformance"
+    ] = PerformanceLevel.HIGH
     df_performance.loc[
-        df_performance.firstErrorRate >= first_q3, "firstPerformance"
-    ] = "LOW"
+        df_performance.firstErrorRate >= first_q3, "firstPerformance"
+    ] = PerformanceLevel.LOW

-    df_performance["lastPerformance"] = "MEDIUM"
-    df_performance.loc[df_performance.lastErrorRate <= last_q1, "lastPerformance"] = (
-        "HIGH"
-    )
-    df_performance.loc[df_performance.lastErrorRate >= last_q3, "lastPerformance"] = (
-        "LOW"
-    )
+    df_performance["lastPerformance"] = PerformanceLevel.MEDIUM
+    df_performance.loc[
+        df_performance.lastErrorRate <= last_q1, "lastPerformance"
+    ] = PerformanceLevel.HIGH
+    df_performance.loc[
+        df_performance.lastErrorRate >= last_q3, "lastPerformance"
+    ] = PerformanceLevel.LOW

-    df_performance["totalPerformance"] = "MEDIUM"
+    df_performance["totalPerformance"] = PerformanceLevel.MEDIUM
     df_performance.loc[
-        df_performance.totalErrorRate <= total_q1, "totalPerformance"
-    ] = "HIGH"
+        df_performance.totalErrorRate <= total_q1, "totalPerformance"
+    ] = PerformanceLevel.HIGH
     df_performance.loc[
-        df_performance.totalErrorRate >= total_q3, "totalPerformance"
-    ] = "LOW"
+        df_performance.totalErrorRate >= total_q3, "totalPerformance"
+    ] = PerformanceLevel.LOW

     return df_performance
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
from enum import Enum
from typing import Optional
import pandas as pd

class PerformanceLevel(str, Enum):
    HIGH = "HIGH"
    MEDIUM = "MEDIUM"
    LOW = "LOW"

# Quantile thresholds for performance levels
Q1_THRESHOLD = 0.25
Q3_THRESHOLD = 0.75

def compute_performance_levels(df_performance: pd.DataFrame) -> pd.DataFrame:
    """Compute performance levels based on error rate quantiles.

    Args:
        df_performance: DataFrame containing error rate metrics

    Returns:
        DataFrame with added performance level columns

    Raises:
        ValueError: If required columns are missing or DataFrame is empty
    """
    required_columns = ["firstErrorRate", "lastErrorRate", "totalErrorRate"]
    missing_columns = [col for col in required_columns if col not in df_performance.columns]
    if missing_columns:
        raise ValueError(f"Missing required columns: {missing_columns}")

    if df_performance.empty:
        raise ValueError("Cannot compute performance levels on empty DataFrame")

    # set the performance levels based on the quantiles
    first_qs = df_performance.firstErrorRate.quantile([Q1_THRESHOLD, Q3_THRESHOLD])
    last_qs = df_performance.lastErrorRate.quantile([Q1_THRESHOLD, Q3_THRESHOLD])
    total_qs = df_performance.totalErrorRate.quantile([Q1_THRESHOLD, Q3_THRESHOLD])

    first_q1 = first_qs[Q1_THRESHOLD]
    first_q3 = first_qs[Q3_THRESHOLD]
    last_q1 = last_qs[Q1_THRESHOLD]
    last_q3 = last_qs[Q3_THRESHOLD]
    total_q1 = total_qs[Q1_THRESHOLD]
    total_q3 = total_qs[Q3_THRESHOLD]

    # set the performance levels based on the quantiles (inverse logic compared to activity - higher error rate is worse)
    df_performance["firstPerformance"] = PerformanceLevel.MEDIUM
    df_performance.loc[
        df_performance.firstErrorRate <= first_q1, "firstPerformance"
    ] = PerformanceLevel.HIGH
    df_performance.loc[
        df_performance.firstErrorRate >= first_q3, "firstPerformance"
    ] = PerformanceLevel.LOW

    df_performance["lastPerformance"] = PerformanceLevel.MEDIUM
    df_performance.loc[
        df_performance.lastErrorRate <= last_q1, "lastPerformance"
    ] = PerformanceLevel.HIGH
    df_performance.loc[
        df_performance.lastErrorRate >= last_q3, "lastPerformance"
    ] = PerformanceLevel.LOW

    df_performance["totalPerformance"] = PerformanceLevel.MEDIUM
    df_performance.loc[
        df_performance.totalErrorRate <= total_q1, "totalPerformance"
    ] = PerformanceLevel.HIGH
    df_performance.loc[
        df_performance.totalErrorRate >= total_q3, "totalPerformance"
    ] = PerformanceLevel.LOW

    return df_performance
apps/analytics/src/modules/participant_course_analytics/get_active_weeks.py (1)
4-30: 🛠️ Refactor suggestion

Consider batch processing for database queries

The current implementation queries the database for each participant individually within a loop, which could be inefficient for courses with many participants.

Consider refactoring to use a single batch query:
-    for participation in participations:
-        participant_id = participation["participantId"]
-        weekly_analytics = db.participantanalytics.find_many(
-            where={
-                "type": "WEEKLY",
-                "courseId": course_id,
-                "participantId": participant_id,
-            },
-        )
+    participant_ids = [p["participantId"] for p in participations]
+    weekly_analytics = db.participantanalytics.find_many(
+        where={
+            "type": "WEEKLY",
+            "courseId": course_id,
+            "participantId": {"in": participant_ids},
+        },
+    )
+    
+    # Group by participantId and count entries
+    analytics_count = pd.DataFrame(weekly_analytics).groupby("participantId").size()
Committable suggestion skipped: line range outside the PR's diff.
apps/analytics/src/modules/participant_performance/compute_response_error_rates.py (2)
46-52: ⚠️ Potential issue

Add protection against division by zero

The error rate calculations could fail with division by zero for participants with no responses.

Add safety checks:
-    df_performance["firstErrorRate"] = (
-        df_performance["wrongFirstResponseCount"] / df_performance["responseCount"]
-    )
-    df_performance["lastErrorRate"] = (
-        df_performance["wrongLastResponseCount"] / df_performance["responseCount"]
-    )
+    # Safely compute error rates with zero division protection
+    df_performance["firstErrorRate"] = np.where(
+        df_performance["responseCount"] > 0,
+        df_performance["wrongFirstResponseCount"] / df_performance["responseCount"],
+        0
+    )
+    df_performance["lastErrorRate"] = np.where(
+        df_performance["responseCount"] > 0,
+        df_performance["wrongLastResponseCount"] / df_performance["responseCount"],
+        0
+    )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
    # Safely compute error rates with zero division protection
    df_performance["firstErrorRate"] = np.where(
        df_performance["responseCount"] > 0,
        df_performance["wrongFirstResponseCount"] / df_performance["responseCount"],
        0
    )
    df_performance["lastErrorRate"] = np.where(
        df_performance["responseCount"] > 0,
        df_performance["wrongLastResponseCount"] / df_performance["responseCount"],
        0
    )
7-34: 🛠️ Refactor suggestion

Optimize DataFrame operations for better performance

Multiple group-by operations on the same DataFrame could be combined for better performance.

Consider this optimization:
-    df_response_count = (
-        df_responses.groupby("participantId").size().reset_index(name="responseCount")
-    )
-    df_first_response_wrong_count = (
-        df_responses[df_responses["firstResponseCorrectness"] == "WRONG"]
-        .groupby("participantId")
-        .size()
-        .reset_index(name="wrongFirstResponseCount")
-    )
-    df_last_response_wrong_count = (
-        df_responses[df_responses["lastResponseCorrectness"] == "WRONG"]
-        .groupby("participantId")
-        .size()
-        .reset_index(name="wrongLastResponseCount")
-    )
+    # Combine aggregations for better performance
+    df_aggregated = df_responses.groupby("participantId").agg(
+        responseCount=("participantId", "size"),
+        wrongFirstResponseCount=("firstResponseCorrectness", lambda x: (x == "WRONG").sum()),
+        wrongLastResponseCount=("lastResponseCorrectness", lambda x: (x == "WRONG").sum()),
+        totalErrorRate=("responseErrorRate", "mean")
+    ).reset_index()
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
    # compute the total number of responses, number of wrong first and last responses,
    # total number of wrong responses, and the average total error rate
    # Combine aggregations for better performance
    df_aggregated = df_responses.groupby("participantId").agg(
        responseCount=("participantId", "size"),
        wrongFirstResponseCount=("firstResponseCorrectness", lambda x: (x == "WRONG").sum()),
        wrongLastResponseCount=("lastResponseCorrectness", lambda x: (x == "WRONG").sum()),
        totalErrorRate=("responseErrorRate", "mean")
    ).reset_index()
apps/analytics/src/modules/participant_course_analytics/compute_participant_activity.py (1)
1-4: 🛠️ Refactor suggestion

Add input validation and error handling.

The function should validate input parameters to ensure robustness:

Verify that course_start is before course_end

Validate that df_activity contains required columns

Handle potential database connection issues

Example implementation:
 import pandas as pd
+from datetime import datetime
+from typing import List, Dict
+
+def validate_course_dates(course_start: datetime, course_end: datetime) -> None:
+    if course_start > course_end:
+        raise ValueError("course_start must be before course_end")
 
 def compute_participant_activity(db, df_activity, course_id, course_start, course_end):
+    validate_course_dates(course_start, course_end)
+    required_columns = ["participantId"]
+    missing_columns = [col for col in required_columns if col not in df_activity.columns]
+    if missing_columns:
+        raise ValueError(f"Missing required columns: {missing_columns}")
Committable suggestion skipped: line range outside the PR's diff.
apps/analytics/src/notebooks/aggregated_course_analytics.ipynb (3)
46-46: ⚠️ Potential issue

Security concern: Hardcoded database credentials.

The database connection string contains hardcoded credentials, which is a security risk.

Use environment variables instead:
-os.environ["DATABASE_URL"] = "postgresql://klicker:klicker@localhost:5432/klicker-prod"
+# Load from .env file or environment
+from dotenv import load_dotenv
+load_dotenv()
+if "DATABASE_URL" not in os.environ:
+    raise ValueError("DATABASE_URL environment variable is required")
Committable suggestion skipped: line range outside the PR's diff.

83-83: 🛠️ Refactor suggestion

Ensure database disconnection in case of errors.

The database disconnection should be guaranteed even if errors occur.
-db.disconnect()
+try:
+    # Your processing code here
+finally:
+    db.disconnect()
Committable suggestion skipped: line range outside the PR's diff.

66-74: 🛠️ Refactor suggestion

Add error handling for database operations.

The course processing loop lacks error handling for database operations and computation failures.
 # find all courses that started in the past
-df_courses = get_running_past_courses(db)
+try:
+    df_courses = get_running_past_courses(db)
+except Exception as e:
+    print(f"Failed to fetch courses: {e}")
+    raise

 # iterate over all courses and compute the participant course analytics
 for idx, course in df_courses.iterrows():
     print(f"Processing course", idx, "of", len(df_courses), "with id", course["id"])
+    try:
         # computation of activity per weekday
         compute_weekday_activity(db, course)
+    except Exception as e:
+        print(f"Failed to process course {course['id']}: {e}")
+        continue
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
    # find all courses that started in the past
    try:
        df_courses = get_running_past_courses(db)
    except Exception as e:
        print(f"Failed to fetch courses: {e}")
        raise

    # iterate over all courses and compute the participant course analytics
    for idx, course in df_courses.iterrows():
        print(f"Processing course", idx, "of", len(df_courses), "with id", course["id"])
        try:
            # computation of activity per weekday
            compute_weekday_activity(db, course)
        except Exception as e:
            print(f"Failed to process course {course['id']}: {e}")
            continue
apps/frontend-control/src/components/liveQuizzes/EmbeddingModal.tsx (1)
32-34: 💡 Codebase verification

Based on the search results, I can now generate the final response:

URL path change from /sessions/ to /quizzes/ appears consistent

The change from /sessions/ to /quizzes/ in the evaluation URL path is consistent with the architectural shift. The only remaining reference to /sessions/ is in the office add-in's URL validation pattern, which should be updated to match the new routing structure.

apps/office-addin/src/content/components/URLForm.tsx: Update validation pattern and placeholder URL to use /quizzes/ instead of /sessions/

🔗 Analysis chain

Verify consistent routing path updates across the application

The URL path change from /sessions/${quizId} to /quizzes/${quizId} aligns with the architectural shift. Let's verify this change is consistent across the application.

Let me try a different search pattern to verify the routing changes.

Let me try one more search pattern to verify the routing changes.

Let me try one final search with a simpler pattern.
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Search for any remaining references to /sessions/ in routing paths
rg -g '*.{ts,tsx,js,jsx}' '/sessions/[^"'\'']*evaluation'

# Search for new quiz routing pattern to confirm consistency
rg -g '*.{ts,tsx,js,jsx}' '/quizzes/[^"'\'']*evaluation'
Length of output: 65654

Script:
#!/bin/bash
# Search for any remaining references to /sessions/ in routing paths, being more specific
rg -g '*.{ts,tsx,js,jsx}' 'sessions/[^"'\'']*evaluation'

# Search for new quiz routing pattern to confirm consistency
rg -g '*.{ts,tsx,js,jsx}' 'quizzes/[^"'\'']*evaluation'
Length of output: 65652

Script:
#!/bin/bash
# Search for URL construction patterns with sessions and quizzes
rg -g '*.{ts,tsx,js,jsx}' -B 2 -A 2 '`/(?:sessions|quizzes)/\${[^}]+}/evaluation'
Length of output: 235

Script:
#!/bin/bash
# Search for evaluation URLs in the codebase
rg -g '*.{ts,tsx,js,jsx}' '/evaluation'
Length of output: 70488
apps/analytics/src/modules/participant_analytics/get_participant_responses.py (3)
34-50: 🛠️ Refactor suggestion

Refactor date handling to reduce duplication and improve maintainability

The current implementation has several issues:

Magic date should be defined as a constant

Duplicated code for date assignment

No type hints for better maintainability
+from typing import Dict, Any
+from datetime import date
+
+FUTURE_DATE = date(9999, 12, 31)
+
-def set_course_dates(detail):
+def set_course_dates(detail: Dict[str, Any]) -> Dict[str, Any]:
+    def _set_dates_from_course(course: Dict[str, Any]) -> None:
+        detail["course_start_date"] = course["startDate"]
+        detail["course_end_date"] = course["endDate"]
+
     if detail["practiceQuiz"] is not None:
-        course = detail["practiceQuiz"]["course"]
-        detail["course_start_date"] = course["startDate"]
-        detail["course_end_date"] = course["endDate"]
+        _set_dates_from_course(detail["practiceQuiz"]["course"])
     elif detail["microLearning"] is not None:
-        course = detail["microLearning"]["course"]
-        detail["course_start_date"] = course["startDate"]
-        detail["course_end_date"] = course["endDate"]
+        _set_dates_from_course(detail["microLearning"]["course"])
     else:
-        detail["course_start_date"] = date(9999, 12, 31)
-        detail["course_end_date"] = date(9999, 12, 31)
+        detail["course_start_date"] = FUTURE_DATE
+        detail["course_end_date"] = FUTURE_DATE
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
from typing import Dict, Any
from datetime import date

FUTURE_DATE = date(9999, 12, 31)

def set_course_dates(detail: Dict[str, Any]) -> Dict[str, Any]:
    def _set_dates_from_course(course: Dict[str, Any]) -> None:
        detail["course_start_date"] = course["startDate"]
        detail["course_end_date"] = course["endDate"]

    if detail["practiceQuiz"] is not None:
        _set_dates_from_course(detail["practiceQuiz"]["course"])
    elif detail["microLearning"] is not None:
        _set_dates_from_course(detail["microLearning"]["course"])
    else:
        detail["course_start_date"] = FUTURE_DATE
        detail["course_end_date"] = FUTURE_DATE

    return detail
5-11: ⚠️ Potential issue

Add error handling for missing courseId

The current implementation assumes either practiceQuiz or microLearning exists and has a courseId. This could raise a KeyError if neither exists.
def map_details(detail, participantId):
-    courseId = (
-        detail["practiceQuiz"]["courseId"]
-        if detail["practiceQuiz"]
-        else detail["microLearning"]["courseId"]
-    )
+    courseId = None
+    if detail.get("practiceQuiz"):
+        courseId = detail["practiceQuiz"].get("courseId")
+    elif detail.get("microLearning"):
+        courseId = detail["microLearning"].get("courseId")
+    if courseId is None:
+        raise ValueError("Neither practiceQuiz nor microLearning contains courseId")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
def map_details(detail, participantId):
    courseId = None
    if detail.get("practiceQuiz"):
        courseId = detail["practiceQuiz"].get("courseId")
    elif detail.get("microLearning"):
        courseId = detail["microLearning"].get("courseId")
    if courseId is None:
        raise ValueError("Neither practiceQuiz nor microLearning contains courseId")
    return {**detail, "participantId": participantId, "courseId": courseId}
52-92: 🛠️ Refactor suggestion

Enhance error handling and add type hints

Several improvements can be made to the main function:

Add type hints for better maintainability

Add error handling for empty results

Improve verbose logging implementation
+from typing import Any, List
+from pandas import DataFrame
+import logging
+
-def get_participant_responses(db, start_date, end_date, verbose=False):
+def get_participant_responses(
+    db: Any,
+    start_date: date,
+    end_date: date,
+    verbose: bool = False
+) -> DataFrame:
+    logger = logging.getLogger(__name__)
+
     participant_response_details = db.participant.find_many(
         include={
             "detailQuestionResponses": {
                 "where": {"createdAt": {"gte": start_date, "lte": end_date}},
                 "include": {
                     "practiceQuiz": {"include": {"course": True}},
                     "microLearning": {"include": {"course": True}},
                 },
             },
         }
     )
+
+    if not participant_response_details:
+        logger.warning("No participant responses found for the given date range")
+        return pd.DataFrame()

     if verbose:
-        print(
+        logger.info(
             "Found {} participants for the timespan from {} to {}".format(
                 len(participant_response_details), start_date, end_date
             )
         )
-        print(participant_response_details[0])
+        if participant_response_details:
+            logger.debug(f"First response: {participant_response_details[0]}")
Committable suggestion skipped: line range outside the PR's diff.
apps/analytics/src/notebooks/participant_performance.ipynb (2)
48-49: ⚠️ Potential issue

Security Risk: Hardcoded database credentials in source code

The database connection string contains hardcoded credentials which is a security risk. Consider using environment variables or a secure configuration management system.
-os.environ["DATABASE_URL"] = "postgresql://klicker:klicker@localhost:5432/klicker-prod"
+os.environ.setdefault("DATABASE_URL", os.getenv("DATABASE_URL"))
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
"# set the environment variable DATABASE_URL to the connection string of your database\n",
os.environ.setdefault("DATABASE_URL", os.getenv("DATABASE_URL"))
79-80: ⚠️ Potential issue

Add error handling for database operations

Database operations should include proper error handling to manage connection issues or query failures gracefully.
-    question_responses = db.questionresponse.find_many(where={"courseId": course_id})
-    df_responses = pd.DataFrame(list(map(lambda x: x.dict(), question_responses)))
+    try:
+        question_responses = db.questionresponse.find_many(where={"courseId": course_id})
+        df_responses = pd.DataFrame(list(map(lambda x: x.dict(), question_responses)))
+    except Exception as e:
+        print(f"Error fetching responses for course {course_id}: {str(e)}")
+        continue
Committable suggestion skipped: line range outside the PR's diff.
apps/analytics/src/modules/aggregated_course_analytics/compute_weekday_activity.py (3)
97-109: ⚠️ Potential issue

Fix potential bug in single_weekday_activity

The function appends 0 for empty weekdays but then unconditionally appends len(df_weekday) afterward, potentially double-counting.
def single_weekday_activity(weekdays, df_daily):
    collector = []
    for weekday in weekdays:
        df_weekday = df_daily[
            df_daily["timestamp"] == pd.Timestamp(weekday).tz_localize("UTC")
        ]

        if df_weekday.empty:
            collector.append(0)
-
-        collector.append(len(df_weekday))
+        else:
+            collector.append(len(df_weekday))

    return statistics.mean(collector) if len(collector) > 0 else 0
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
def single_weekday_activity(weekdays, df_daily):
    collector = []
    for weekday in weekdays:
        df_weekday = df_daily[
            df_daily["timestamp"] == pd.Timestamp(weekday).tz_localize("UTC")
        ]

        if df_weekday.empty:
            collector.append(0)
        else:
            collector.append(len(df_weekday))

    return statistics.mean(collector) if len(collector) > 0 else 0
24-58: 🛠️ Refactor suggestion

Refactor repetitive date range generation

The code for generating date ranges is duplicated for each weekday. Consider refactoring this into a helper function.
+def generate_weekday_range(start_date, end_date, weekday):
+    return pd.date_range(
+        start=start_date,
+        end=pd.Timestamp(end_date) + pd.tseries.offsets.DateOffset(days=1),
+        freq=f"W-{weekday}",
+    )

-    mondays = pd.date_range(
-        start=course_start,
-        end=pd.Timestamp(course_end) + pd.tseries.offsets.DateOffset(days=1),
-        freq="W-MON",
-    )
+    mondays = generate_weekday_range(course_start, course_end, "MON")
Committable suggestion skipped: line range outside the PR's diff.

69-94: ⚠️ Potential issue

Add error handling for database operations

The upsert operation should include error handling to manage potential database failures.
-    db.aggregatedcourseanalytics.upsert(
+    try:
+        db.aggregatedcourseanalytics.upsert(
             where={"courseId": course_id},
             data={
                 "create": {
                     # ... existing create data ...
                 },
                 "update": {
                     # ... existing update data ...
                 },
             },
         )
+    except Exception as e:
+        logging.error(f"Failed to update analytics for course {course_id}: {str(e)}")
+        raise
Committable suggestion skipped: line range outside the PR's diff.
packages/prisma/src/prisma/migrations/20241205154359_learning_analytics_performance_progress/migration.sql (1)
7-12: 🛠️ Refactor suggestion

Consider adding constraints for error rate values.

The error rate fields should logically be between 0 and 1 (or 0% to 100%). Consider adding CHECK constraints to ensure data validity.
    "id" SERIAL NOT NULL,
-   "firstErrorRate" REAL NOT NULL,
+   "firstErrorRate" REAL NOT NULL CHECK ("firstErrorRate" >= 0 AND "firstErrorRate" <= 1),
-   "lastErrorRate" REAL NOT NULL,
+   "lastErrorRate" REAL NOT NULL CHECK ("lastErrorRate" >= 0 AND "lastErrorRate" <= 1),
-   "totalErrorRate" REAL NOT NULL,
+   "totalErrorRate" REAL NOT NULL CHECK ("totalErrorRate" >= 0 AND "totalErrorRate" <= 1),
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
    "firstErrorRate" REAL NOT NULL CHECK ("firstErrorRate" >= 0 AND "firstErrorRate" <= 1),
    "firstPerformance" "PerformanceLevel" NOT NULL,
    "lastErrorRate" REAL NOT NULL CHECK ("lastErrorRate" >= 0 AND "lastErrorRate" <= 1),
    "lastPerformance" "PerformanceLevel" NOT NULL,
    "totalErrorRate" REAL NOT NULL CHECK ("totalErrorRate" >= 0 AND "totalErrorRate" <= 1),
    "totalPerformance" "PerformanceLevel" NOT NULL,
apps/analytics/src/notebooks/aggregated_analytics.ipynb (2)
43-43: ⚠️ Potential issue

Security: Avoid hardcoding database credentials

The database connection string contains hardcoded credentials. This is a security risk as sensitive information should not be stored in version control.

Consider using environment variables:
-os.environ["DATABASE_URL"] = "postgresql://klicker:klicker@localhost:5432/klicker-prod"
+os.environ["DATABASE_URL"] = os.getenv("DATABASE_URL")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
    os.environ["DATABASE_URL"] = os.getenv("DATABASE_URL")
86-89: 🛠️ Refactor suggestion

Add error handling for compute_aggregated_analytics calls

The function call lacks try-catch blocks for error handling. Failed computations could silently fail or crash the notebook.

Consider adding error handling:
-        compute_aggregated_analytics(
-            db, day_start, day_end, timestamp, "DAILY", verbose
-        )
+        try:
+            compute_aggregated_analytics(
+                db, day_start, day_end, timestamp, "DAILY", verbose
+            )
+        except Exception as e:
+            print(f"Error computing daily analytics for {specific_date}: {str(e)}")
+            continue
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
    "        timestamp = day_start\n",
    "        try:\n",
    "            compute_aggregated_analytics(\n",
    "                db, day_start, day_end, timestamp, \"DAILY\", verbose\n",
    "            )\n",
    "        except Exception as e:\n",
    "            print(f\"Error computing daily analytics for {specific_date}: {str(e)}\")\n",
    "            continue\n",
apps/analytics/src/notebooks/participant_analytics.ipynb (2)
41-55: 🛠️ Refactor suggestion

Extract common configuration to a shared module

The database connection setup and script settings are duplicated between notebooks. This violates the DRY principle and makes maintenance harder.

Consider creating a shared configuration module:
# config/analytics_config.py
import os
from prisma import Prisma

def get_db_connection():
    db = Prisma()
    db.connect()
    return db

def get_script_settings():
    return {
        'verbose': False,
        'compute_daily': True,
        'compute_weekly': True,
        'compute_monthly': True,
        'compute_course': True
    }
142-166: 🛠️ Refactor suggestion

Optimize database queries for large datasets

The current implementation fetches all courses at once and processes them in memory. This could cause performance issues with large datasets.

Consider implementing pagination and batch processing:
-    courses = db.course.find_many(
+    batch_size = 100
+    skip = 0
+    while True:
+        courses_batch = db.course.find_many(
+            take=batch_size,
+            skip=skip,
             where={
                 "startDate": {"lte": curr_date + "T23:59:59.999Z"},
-            }
+            },
+            order_by={"id": "asc"}
         )
+        if not courses_batch:
+            break
+        process_courses_batch(courses_batch)
+        skip += batch_size
Committable suggestion skipped: line range outside the PR's diff.
apps/analytics/src/notebooks/archive/test.ipynb (4)
35-41: ⚠️ Potential issue

Performance Concern: Large data fetch with arbitrary limit

Both queries use a fixed take=10000 limit without pagination, which could lead to memory issues with large datasets and potentially miss data beyond this limit.

Consider implementing proper pagination:
-    take=10000,
+    cursor=cursor,
+    take=PAGE_SIZE,
Also applies to: 52-58

24-24: ⚠️ Potential issue

Security Issue: Remove hardcoded database credentials

The database connection string contains hardcoded credentials, which is a security risk. Consider using environment variables for sensitive information.

Apply this change:
-os.environ['DATABASE_URL'] = 'postgresql://klicker:klicker@localhost:5432/klicker-prod'
+os.environ['DATABASE_URL'] = os.getenv('DATABASE_URL')
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
    os.environ['DATABASE_URL'] = os.getenv('DATABASE_URL')
69-77: ⚠️ Potential issue

Enhance error handling in map_response_details function

The function assumes the presence of 'elementInstance' and 'elementStack' without validation, which could lead to KeyError exceptions.

Add proper error handling:
 def map_response_details(response_details):
     as_dict = response_details.dict()
+    try:
+        element_stack = as_dict.get('elementInstance', {}).get('elementStack')
+        if not element_stack:
+            raise ValueError("Missing required elementStack data")
 
         extended_dict = {
             **as_dict,
-            "elementStack": as_dict['elementInstance']['elementStack'],
+            "elementStack": element_stack,
         }
 
         return extended_dict
+    except Exception as e:
+        print(f"Error mapping response details: {e}")
+        return None
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
def map_response_details(response_details):
    as_dict = response_details.dict()
    try:
        element_stack = as_dict.get('elementInstance', {}).get('elementStack')
        if not element_stack:
            raise ValueError("Missing required elementStack data")

        extended_dict = {
            **as_dict,
            "elementStack": element_stack,
        }

        return extended_dict
    except Exception as e:
        print(f"Error mapping response details: {e}")
        return None
228-231: ⚠️ Potential issue

Add error handling for file operations

The CSV file operations lack error handling and path validation. This could lead to runtime errors if the 'out' directory doesn't exist.

Consider adding proper error handling:
+import os
+import pathlib
+
+# Ensure output directory exists
+out_dir = pathlib.Path('out')
+out_dir.mkdir(parents=True, exist_ok=True)
+
+try:
     df_response_details_relevant_daily.to_csv('out/df_response_details_relevant_daily.csv')
     df_response_details_relevant_weekly.to_csv('out/df_response_details_relevant_weekly.csv')
     df_response_details_relevant_monthly.to_csv('out/df_response_details_relevant_monthly.csv')
     df_response_details_relevant_mean.to_csv('out/df_response_details_relevant_mean.csv')
+except Exception as e:
+    print(f"Error saving CSV files: {e}")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
import os
import pathlib

# Ensure output directory exists
out_dir = pathlib.Path('out')
out_dir.mkdir(parents=True, exist_ok=True)

try:
    df_response_details_relevant_daily.to_csv('out/df_response_details_relevant_daily.csv')
    df_response_details_relevant_weekly.to_csv('out/df_response_details_relevant_weekly.csv')
    df_response_details_relevant_monthly.to_csv('out/df_response_details_relevant_monthly.csv')
    df_response_details_relevant_mean.to_csv('out/df_response_details_relevant_mean.csv')
except Exception as e:
    print(f"Error saving CSV files: {e}")
apps/analytics/src/notebooks/archive/daily_participant_analytics.ipynb (3)
137-182: 🛠️ Refactor suggestion

Refactor complex correctness computation logic.

The compute_correctness function is complex and handles multiple question types. Consider breaking it down into smaller, type-specific functions for better maintainability and testability.

Suggested refactor:
def compute_sc_correctness(response, options):
    selected_choice = response['choices'][0]
    correct_choice = next((choice['ix'] for choice in options['choices'] if choice['correct']), None)
    return 'CORRECT' if selected_choice == correct_choice else 'INCORRECT'

def compute_mc_correctness(response, options, available_choices, hamming_distance):
    correctness = max(-2 * hamming_distance / available_choices + 1, 0)
    if correctness == 1:
        return 'CORRECT'
    elif correctness == 0:
        return 'INCORRECT'
    return 'PARTIAL'

def compute_kprim_correctness(hamming_distance):
    return 'CORRECT' if hamming_distance == 0 else 'PARTIAL' if hamming_distance == 1 else 'INCORRECT'

def compute_numerical_correctness(response, options):
    response_value = float(response['value'])
    within_range = list(map(lambda range: float(range['min']) <= response_value <= float(range['max']), options['solutionRanges']))
    return 'CORRECT' if any(within_range) else 'INCORRECT'
278-314: ⚠️ Potential issue

Add error handling for database operations.

The database upsert operation lacks error handling, which could lead to silent failures. Consider adding try-catch blocks and logging.

Suggested implementation:
try:
    for index, row in df_analytics.iterrows():
        try:
            db.participantanalytics.upsert(
                # ... existing upsert code ...
            )
        except Exception as e:
            print(f"Error upserting analytics for participant {row['participantId']}: {str(e)}")
            continue
except Exception as e:
    print(f"Fatal error during analytics update: {str(e)}")
    raise
31-36: ⚠️ Potential issue

Security Issue: Hardcoded database credentials in code.

The database connection string contains hardcoded credentials, which is a security risk. Consider using environment variables or a secure configuration management system.

Apply this diff to use environment variables:
-os.environ['DATABASE_URL'] = 'postgresql://klicker:klicker@localhost:5432/klicker-prod'
+# Ensure DATABASE_URL is set in the environment before running the notebook
+if 'DATABASE_URL' not in os.environ:
+    raise ValueError("DATABASE_URL environment variable is not set")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
db = Prisma()

# set the environment variable DATABASE_URL to the connection string of your database
# Ensure DATABASE_URL is set in the environment before running the notebook
if 'DATABASE_URL' not in os.environ:
    raise ValueError("DATABASE_URL environment variable is not set")

db.connect()
apps/analytics/src/modules/participant_course_analytics/__init__.py (1)
1-4: 🛠️ Refactor suggestion

Define __all__ to explicitly declare the public API.

The imports create a well-structured analytics pipeline, but they should be explicitly exported. Add the following after the imports:
from .get_running_past_courses import get_running_past_courses
from .get_active_weeks import get_active_weeks
from .compute_participant_activity import compute_participant_activity
from .save_participant_course_analytics import save_participant_course_analytics
+
+__all__ = [
+    "get_running_past_courses",
+    "get_active_weeks",
+    "compute_participant_activity",
+    "save_participant_course_analytics",
+]
This makes the module's public API explicit and resolves the unused import warnings.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
from .get_running_past_courses import get_running_past_courses
from .get_active_weeks import get_active_weeks
from .compute_participant_activity import compute_participant_activity
from .save_participant_course_analytics import save_participant_course_analytics

__all__ = [
    "get_running_past_courses",
    "get_active_weeks",
    "compute_participant_activity",
    "save_participant_course_analytics",
]
🧰 Tools

🪛 Ruff (0.8.0)

1-1: .get_running_past_courses.get_running_past_courses imported but unused; consider removing, adding to __all__, or using a redundant alias

(F401)

2-2: .get_active_weeks.get_active_weeks imported but unused; consider removing, adding to __all__, or using a redundant alias

(F401)

3-3: .compute_participant_activity.compute_participant_activity imported but unused; consider removing, adding to __all__, or using a redundant alias

(F401)

4-4: .save_participant_course_analytics.save_participant_course_analytics imported but unused; consider removing, adding to __all__, or using a redundant alias

(F401)
apps/analytics/src/modules/aggregated_analytics/__init__.py (1)
1-4: 🛠️ Refactor suggestion

Add module documentation and define __all__.

The imports establish a clear analytics aggregation pipeline. Add documentation and export list:
+"""
+Aggregated analytics module for processing and storing course-wide analytics data.
+
+This module provides functionality to:
+1. Load individual participant analytics
+2. Aggregate data across participants
+3. Compute course-level metrics
+4. Save aggregated results
+"""
+
from .compute_aggregated_analytics import compute_aggregated_analytics
from .load_participant_analytics import load_participant_analytics
from .aggregate_participant_analytics import aggregate_participant_analytics
from .save_aggregated_analytics import save_aggregated_analytics
+
+__all__ = [
+    "compute_aggregated_analytics",
+    "load_participant_analytics",
+    "aggregate_participant_analytics",
+    "save_aggregated_analytics",
+]
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
"""
Aggregated analytics module for processing and storing course-wide analytics data.

This module provides functionality to:
1. Load individual participant analytics
2. Aggregate data across participants
3. Compute course-level metrics
4. Save aggregated results
"""

from .compute_aggregated_analytics import compute_aggregated_analytics
from .load_participant_analytics import load_participant_analytics
from .aggregate_participant_analytics import aggregate_participant_analytics
from .save_aggregated_analytics import save_aggregated_analytics

__all__ = [
    "compute_aggregated_analytics",
    "load_participant_analytics",
    "aggregate_participant_analytics",
    "save_aggregated_analytics",
]
🧰 Tools

🪛 Ruff (0.8.0)

1-1: .compute_aggregated_analytics.compute_aggregated_analytics imported but unused; consider removing, adding to __all__, or using a redundant alias

(F401)

2-2: .load_participant_analytics.load_participant_analytics imported but unused; consider removing, adding to __all__, or using a redundant alias

(F401)

3-3: .aggregate_participant_analytics.aggregate_participant_analytics imported but unused; consider removing, adding to __all__, or using a redundant alias

(F401)

4-4: .save_aggregated_analytics.save_aggregated_analytics imported but unused; consider removing, adding to __all__, or using a redundant alias

(F401)
packages/prisma/src/prisma/schema/analytics.prisma (1)

235-240: 🛠️ Refactor suggestion

Enforce mutual exclusivity between practiceQuizId and microLearningId.

In the ActivityPerformance model, both practiceQuizId and microLearningId are optional and unique. To prevent records from having both fields set or both fields unset, consider implementing a model-level constraint or application logic to ensure that exactly one of these fields is non-null.
apps/analytics/src/modules/participant_course_analytics/get_running_past_courses.py (1)
6-13: 🛠️ Refactor suggestion

Use datetime objects for date comparisons to improve reliability.

Currently, the code concatenates strings to create datetime formats, which can be error-prone. Consider using datetime objects and ISO formatting for accurate date-time handling.

Apply this refactor to enhance precision:
-from datetime import datetime
+from datetime import datetime, timezone

 def get_running_past_courses(db):
-    curr_date = datetime.now().strftime("%Y-%m-%d")
+    curr_datetime = datetime.now(timezone.utc)

     courses = db.course.find_many(
         where={
-            "startDate": {"lte": curr_date + "T23:59:59.999Z"},
+            "startDate": {"lte": curr_datetime.isoformat()},
         },
         include={"participations": True},
     )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
    curr_datetime = datetime.now(timezone.utc)
    courses = db.course.find_many(
        where={
            # Incremental scripts can add this statement to reduce the amount of required computations
            # 'endDate': {
            #     'gt': datetime.now().strftime('%Y-%m-%d') + 'T00:00:00.000Z'
            # }
            "startDate": {"lte": curr_datetime.isoformat()},

cypress · 2024-12-05T16:24:05Z

klicker-uzh Run #3778

Run Properties: Passed #3778 • a476d57146 ℹ️: Merge 6960a84f681d7fa6c11aff876d3b3b844954234a into 51bbba935c90b70aecdbb887dc56...

Project	`klicker-uzh`
Branch Review	`participant-performance-analytics`
Run status	`Passed #3778`
Run duration	`12m 00s`
Commit	`a476d57146 ℹ️: Merge 6960a84f681d7fa6c11aff876d3b3b844954234a into 51bbba935c90b70aecdbb887dc56...`
Committer	`Julius Schlapbach`
View all properties for this run ↗︎

Test results
Failures	`0`
Flaky	`0`
Pending	`0`
Skipped	`0`
Passing	`148`
View all changes introduced in this branch ↗︎

sjschlapbach added 3 commits December 5, 2024 16:47

wip: add unique constraint to participant performance table

4817075

wip: notebook implementation

b87965c

refactor: extract logic to separate functions

6960a84

sjschlapbach changed the base branch from v3 to v3-analytics December 5, 2024 16:04

sjschlapbach merged commit cb471c3 into v3-analytics Dec 5, 2024
15 of 17 checks passed

sjschlapbach deleted the participant-performance-analytics branch December 5, 2024 16:17

coderabbitai bot reviewed Dec 5, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

enhance(apps/analytics): add computation logic for participant course performance #4390

enhance(apps/analytics): add computation logic for participant course performance #4390

sjschlapbach commented Dec 5, 2024 •

edited by coderabbitai bot

Loading

aviator-app bot commented Dec 5, 2024 •

edited

Loading

coderabbitai bot commented Dec 5, 2024 •

edited

Loading

Walkthrough

Changes

Possibly related PRs

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

gitguardian bot commented Dec 5, 2024 •

edited

Loading

sonarqubecloud bot commented Dec 5, 2024

coderabbitai bot left a comment

cypress bot commented Dec 5, 2024

enhance(apps/analytics): add computation logic for participant course performance #4390

enhance(apps/analytics): add computation logic for participant course performance #4390

Conversation

sjschlapbach commented Dec 5, 2024 • edited by coderabbitai bot Loading

Summary by CodeRabbit

aviator-app bot commented Dec 5, 2024 • edited Loading

Current Aviator status

coderabbitai bot commented Dec 5, 2024 • edited Loading

Walkthrough

Changes

Possibly related PRs

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

gitguardian bot commented Dec 5, 2024 • edited Loading

️✅ There are no secrets present in this pull request anymore.

sonarqubecloud bot commented Dec 5, 2024

Quality Gate failed

coderabbitai bot left a comment

Choose a reason for hiding this comment

cypress bot commented Dec 5, 2024

klicker-uzh Run #3778

sjschlapbach commented Dec 5, 2024 •

edited by coderabbitai bot

Loading

aviator-app bot commented Dec 5, 2024 •

edited

Loading

coderabbitai bot commented Dec 5, 2024 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)

gitguardian bot commented Dec 5, 2024 •

edited

Loading