Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enhance(apps/analytics): add computation logic for participant course performance #4390

Merged
merged 3 commits into from
Dec 5, 2024

Conversation

sjschlapbach
Copy link
Member

@sjschlapbach sjschlapbach commented Dec 5, 2024

Summary by CodeRabbit

  • New Features

    • Introduced new analytics capabilities for tracking participant performance and activity metrics across various components.
    • Added new Jupyter notebooks for computing aggregated analytics and participant analytics.
    • Implemented URL structure changes for quizzes, enhancing navigation and user experience.
  • Bug Fixes

    • Resolved inconsistencies in routing paths for accessing quiz-related functionalities.
  • Documentation

    • Updated README files and localization entries to reflect new features and changes.
  • Style

    • Enhanced UI elements and navigation components for better clarity and usability.
  • Tests

    • Improved Cypress end-to-end tests for course and quiz workflows, ensuring robust validation of functionalities.

Copy link

aviator-app bot commented Dec 5, 2024

Current Aviator status

Aviator will automatically update this comment as the status of the PR changes.
Comment /aviator refresh to force Aviator to re-examine your PR (or learn about other /aviator commands).

This PR was merged manually (without Aviator). Merging manually can negatively impact the performance of the queue. Consider using Aviator next time.


See the real-time status of this PR on the Aviator webapp.
Use the Aviator Chrome Extension to see the status of your PR within GitHub.

@sjschlapbach sjschlapbach changed the base branch from v3 to v3-analytics December 5, 2024 16:04
Copy link

coderabbitai bot commented Dec 5, 2024

📝 Walkthrough

Walkthrough

This pull request includes modifications across various files, primarily focusing on the .gitignore file, the addition of new README documentation, and the introduction of several new scripts and functions related to analytics. Changes to the project structure include updates to package configurations, new database migration scripts, and enhancements to participant analytics functionality. Additionally, routing paths for quizzes have been updated throughout the frontend components, reflecting a shift from session-based to quiz-based navigation.

Changes

File Path Change Summary
.gitignore Added !out/.gitkeep to track the .gitkeep file in the out/ directory; retained existing entries for Python cache, Node.js modules, etc.
apps/analytics/README.md New README file added detailing the KlickerUZH Analytics service, including setup instructions and available commands.
apps/analytics/package.json New package.json file created for the @klicker-uzh/analytics project, specifying project details and scripts for development.
apps/analytics/pyproject.toml Updated authors list, modified dependency versions, added development dependencies, and changed task command for main execution.
apps/analytics/src/__init__.py Added imports for all entities from modules and notebooks submodules.
apps/analytics/src/modules/__init__.py Consolidated imports from various analytics submodules into the module's namespace.
apps/analytics/src/modules/aggregated_analytics/__init__.py Added imports for new analytics functions.
apps/analytics/src/modules/aggregated_analytics/aggregate_participant_analytics.py Introduced aggregate_participant_analytics function for aggregating participant analytics.
apps/analytics/src/modules/aggregated_analytics/compute_aggregated_analytics.py Added compute_aggregated_analytics function to load and aggregate participant analytics data.
apps/analytics/src/modules/aggregated_analytics/load_participant_analytics.py Introduced functions to convert analytics data into DataFrames and load participant analytics from the database.
apps/analytics/src/modules/aggregated_analytics/save_aggregated_analytics.py Created save_aggregated_analytics function to save aggregated analytics data to the database.
apps/analytics/src/modules/aggregated_course_analytics/__init__.py Added import for compute_weekday_activity function.
apps/analytics/src/modules/aggregated_course_analytics/compute_weekday_activity.py Introduced compute_weekday_activity function for calculating weekly participant activity.
apps/analytics/src/modules/participant_analytics/__init__.py Added imports for various participant analytics functions.
apps/analytics/src/modules/participant_analytics/aggregate_analytics.py Created aggregate_analytics function for processing participant response data.
apps/analytics/src/modules/participant_analytics/compute_correctness.py Introduced functions for evaluating correctness of responses.
apps/analytics/src/modules/participant_analytics/get_participant_responses.py Added functions to retrieve and process participant response data.
apps/analytics/src/modules/participant_course_analytics/__init__.py Added imports for course-related analytics functions.
apps/analytics/src/modules/participant_course_analytics/compute_participant_activity.py Introduced compute_participant_activity function for calculating participant activity metrics.
apps/analytics/src/notebooks/ Multiple new Jupyter notebooks added for various analytics computations.
apps/frontend-control/next.config.mjs Added new asynchronous redirects function for routing changes.
apps/frontend-control/src/components/ Updated various components to change URL paths from /sessions/... to /quizzes/....
apps/frontend-manage/src/pages/ Modified routing logic in multiple page components to reflect quiz-based navigation.
packages/i18n/messages/ Updated localization files to include new keys for quizzes and analytics.
packages/prisma/src/prisma/migrations/ Several migration scripts added or modified to update database schema for analytics tracking.
packages/prisma/src/prisma/schema/ Multiple schema updates to include new performance tracking fields and models.

Possibly related PRs


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

gitguardian bot commented Dec 5, 2024

️✅ There are no secrets present in this pull request anymore.

If these secrets were true positive and are still valid, we highly recommend you to revoke them.
Once a secret has been leaked into a git repository, you should consider it compromised, even if it was deleted immediately.
Find here more information about risks.


🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.

Copy link

sonarqubecloud bot commented Dec 5, 2024

Quality Gate Failed Quality Gate failed

Failed conditions
E Security Rating on New Code (required ≥ A)

See analysis details on SonarQube Cloud

Catch issues before they fail your Quality Gate with our IDE extension SonarQube for IDE

@sjschlapbach sjschlapbach merged commit cb471c3 into v3-analytics Dec 5, 2024
15 of 17 checks passed
@sjschlapbach sjschlapbach deleted the participant-performance-analytics branch December 5, 2024 16:17
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Inline review comments failed to post. This is likely due to GitHub's limits when posting large numbers of comments.

🛑 Comments failed to post (56)
packages/prisma/src/prisma/migrations/20240826113537_participant_analytics_correctness_counts/migration.sql (2)

4-6: ⚠️ Potential issue

Consider implementing a data migration strategy

The migration drops columns containing historical analytics data without preserving or transforming the existing values. This could lead to loss of important historical performance metrics.

Consider adding a data transformation step to convert the mean values into the new count-based format before dropping the columns. For example:

-- First, create new columns
ALTER TABLE "ParticipantAnalytics"
ADD COLUMN "firstCorrectCount" REAL,
ADD COLUMN "lastCorrectCount" REAL;

-- Then populate them with transformed data (example)
UPDATE "ParticipantAnalytics"
SET "firstCorrectCount" = "meanFirstCorrectCount" * total_attempts,
    "lastCorrectCount" = "meanLastCorrectCount" * total_attempts;

-- Finally drop old columns
ALTER TABLE "ParticipantAnalytics"
DROP COLUMN "meanFirstCorrectCount",
DROP COLUMN "meanLastCorrectCount";

13-16: 🛠️ Refactor suggestion

Consider using INTEGER type for count columns

The columns firstCorrectCount, firstWrongCount, lastCorrectCount, and lastWrongCount are designed to store count data but use the REAL type. Since these appear to be counting discrete occurrences, INTEGER might be more appropriate to prevent floating-point precision issues and ensure data integrity.

Consider modifying the column types:

-ADD COLUMN     "firstCorrectCount" REAL,
-ADD COLUMN     "firstWrongCount" REAL,
-ADD COLUMN     "lastCorrectCount" REAL,
-ADD COLUMN     "lastWrongCount" REAL;
+ADD COLUMN     "firstCorrectCount" INTEGER NOT NULL DEFAULT 0,
+ADD COLUMN     "firstWrongCount" INTEGER NOT NULL DEFAULT 0,
+ADD COLUMN     "lastCorrectCount" INTEGER NOT NULL DEFAULT 0,
+ADD COLUMN     "lastWrongCount" INTEGER NOT NULL DEFAULT 0;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

ADD COLUMN     "firstCorrectCount" INTEGER NOT NULL DEFAULT 0,
ADD COLUMN     "firstWrongCount" INTEGER NOT NULL DEFAULT 0,
ADD COLUMN     "lastCorrectCount" INTEGER NOT NULL DEFAULT 0,
ADD COLUMN     "lastWrongCount" INTEGER NOT NULL DEFAULT 0;
apps/frontend-control/tsconfig.json (1)

20-20: 💡 Codebase verification

Consider removing the "~/*" path alias as it's currently unused

The codebase scan reveals that:

  • No imports are currently using the "~/*" path alias
  • Existing imports use either relative paths (../components/) or package imports (@klicker-uzh/)
  • The addition of an unused path alias may create confusion about the preferred import pattern

Recommendation: Either remove the unused "~/*" path alias or document its intended usage pattern if it's being added for future use.

🔗 Analysis chain

Verify consistent usage of the new path alias.

The addition of the "~/*" path alias is valid, but let's ensure it's used consistently across the codebase.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for mixed usage of direct 'src/' imports vs '~/' alias
# to ensure consistent import patterns

echo "Checking for direct src/ imports..."
rg "from\s+'\.\.*/src/" "apps/frontend-control/src"

echo "Checking for usage of new ~/ alias..."
rg "from\s+'~/" "apps/frontend-control/src"

Length of output: 267


Script:

#!/bin/bash
# Let's try a broader search pattern to catch all import variations
echo "Checking for any imports from src directory..."
rg -t ts -t tsx "from ['\"].*src/" apps/frontend-control/src

echo "Checking for any imports using ~ alias..."
rg -t ts -t tsx "from ['\"]~/" apps/frontend-control/src

# Let's also check import statements at the start of lines
echo "Checking for imports at line start..."
rg -t ts -t tsx "^import .* from ['\"]~/" apps/frontend-control/src

# Check if there are any relative path imports that could be simplified with ~
echo "Checking for deep relative imports..."
rg -t ts -t tsx "from ['\"]\.\./.+/" apps/frontend-control/src

Length of output: 742


Script:

#!/bin/bash
# Let's try without file type filters since tsx is not recognized
echo "Checking for any imports from src directory..."
rg "from ['\"].*src/" apps/frontend-control/src

echo "Checking for any imports using ~ alias..."
rg "from ['\"]~/" apps/frontend-control/src

# Let's also check import statements at line start
echo "Checking for imports at line start..."
rg "^import .* from ['\"]~/" apps/frontend-control/src

# Check if there are any relative path imports that could be simplified with ~
echo "Checking for deep relative imports..."
rg "from ['\"]\.\./.+/" apps/frontend-control/src

# Let's also check if the src directory exists and list its contents
echo "Checking src directory structure..."
ls -R apps/frontend-control/src

Length of output: 3787

packages/prisma/src/prisma/migrations/20240827152536_computed_at_analytics/migration.sql (2)

1-11: ⚠️ Potential issue

Critical: Migration will fail for non-empty tables

The migration attempts to add required updatedAt columns without default values to multiple tables. This will fail if any of these tables contain existing records.

Consider this safer approach:

  1. First migration:
-- Add columns as nullable
ALTER TABLE "AggregatedAnalytics" ADD COLUMN "updatedAt" TIMESTAMP(3);
-- Repeat for other tables

-- Set initial values
UPDATE "AggregatedAnalytics" SET "updatedAt" = CURRENT_TIMESTAMP;
-- Repeat for other tables
  1. Second migration:
-- Make columns NOT NULL
ALTER TABLE "AggregatedAnalytics" ALTER COLUMN "updatedAt" SET NOT NULL;
-- Repeat for other tables

13-15: 🛠️ Refactor suggestion

Consider using consistent datetime types

computedAt uses DATE while createdAt/updatedAt use TIMESTAMP(3). This inconsistency might cause issues:

  • DATE loses time information
  • Different timezone handling between types
  • Potential comparison issues in queries

Consider using TIMESTAMP(3) consistently:

-"computedAt" DATE NOT NULL DEFAULT CURRENT_TIMESTAMP,
+"computedAt" TIMESTAMP(3) NOT NULL DEFAULT CURRENT_TIMESTAMP,

Also applies to: 30-32

packages/prisma/src/prisma/migrations/20241204165711_update_daily_activity_types/migration.sql (1)

1-8: ⚠️ Potential issue

⚠️ Review data migration strategy before proceeding

The warnings indicate potential data loss and constraints that could cause migration failure:

  1. All data in participantCount will be lost
  2. Adding non-nullable courseParticipantCount requires empty table
  3. Unique constraint on courseId may fail if duplicates exist

Consider the following safer migration approach:

  1. First migration:
    • Add nullable courseParticipantCount
    • Copy data from participantCount to courseParticipantCount
  2. Second migration:
    • Make courseParticipantCount non-nullable
    • Drop participantCount
  3. Third migration:
    • Add unique constraint after cleaning duplicates

Would you like me to help generate these separate migration files?

apps/frontend-manage/src/pages/analytics/index.tsx (1)

4-6: ⚠️ Potential issue

Implementation appears incomplete for participant course performance analytics

The current implementation is a placeholder and doesn't align with the PR objective of adding computation logic for participant course performance. The component should include:

  • Data fetching for participant course data
  • Performance computation visualization
  • Interactive analytics dashboard elements

Would you like help implementing the participant performance analytics features? I can help create:

  1. Data fetching hooks for course performance
  2. Performance computation components
  3. Analytics visualization components
apps/analytics/src/modules/participant_analytics/compute_participant_course_analytics.py (1)

19-19: 🛠️ Refactor suggestion

Remove redundant assignment to course_id

The variable course_id is assigned twice. The second assignment at line 19 is unnecessary since course_id was already assigned at line 16. Removing the redundant assignment improves code clarity.

Apply this diff to remove the redundant assignment:

-        course_id = course["id"]

Committable suggestion skipped: line range outside the PR's diff.

apps/analytics/src/modules/participant_analytics/save_participant_analytics.py (1)

35-36: ⚠️ Potential issue

Specify update data in upsert operation

In the upsert operation for analytics types "DAILY", "WEEKLY", and "MONTHLY", the update field is empty. This means that if a record already exists, it won't be updated with new data. To ensure existing records are updated, specify the fields that need to be updated.

Apply this diff to include the update data:

                     "update": {
+                        "computedAt": computedAt,
+                        "trialsCount": row["trialsCount"],
+                        "responseCount": row["responseCount"],
+                        "totalScore": row["totalScore"],
+                        "totalPoints": row["totalPoints"],
+                        "totalXp": row["totalXp"],
+                        "meanCorrectCount": row["meanCorrectCount"],
+                        "meanPartialCorrectCount": row["meanPartialCount"],
+                        "meanWrongCount": row["meanWrongCount"],
                     },
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

                    "update": {
                        "computedAt": computedAt,
                        "trialsCount": row["trialsCount"],
                        "responseCount": row["responseCount"],
                        "totalScore": row["totalScore"],
                        "totalPoints": row["totalPoints"],
                        "totalXp": row["totalXp"],
                        "meanCorrectCount": row["meanCorrectCount"],
                        "meanPartialCorrectCount": row["meanPartialCount"],
                        "meanWrongCount": row["meanWrongCount"],
                    },
apps/analytics/src/modules/aggregated_analytics/save_aggregated_analytics.py (2)

28-30: ⚠️ Potential issue

Address TODO for totalElementsAvailable calculation

There's a TODO comment indicating that totalElementsAvailable is set to an invalid value -1 and needs to be correctly computed for rolling updates in production code. This placeholder should be replaced with the actual computation to ensure accurate analytics.

Would you like assistance in implementing the logic to calculate totalElementsAvailable for rolling updates?


40-73: 🛠️ Refactor suggestion

Optimize data fetching for course elements

The current implementation fetches all practice quizzes and microlearnings along with their stacks and elements to compute totalElementsAvailable. This may lead to performance issues due to the large amount of data being retrieved.

Consider fetching only the count of elements rather than the entire elements data. You can modify the query to aggregate counts directly in the database.

Example:

# Modify the query to get counts instead of full data
course = db.course.find_unique_or_raise(
    where={"id": row["courseId"]},
    include={
        "practiceQuizzes": {
            "include": {
                "stacks": {
                    "select": {"elements": {"select": {"_count": True}}}
                }
            }
        },
        "microLearnings": {
            "include": {
                "stacks": {
                    "select": {"elements": {"select": {"_count": True}}}
                }
            }
        },
    },
)

Then, adjust the logic to sum up the counts without loading all element details.

apps/analytics/src/modules/aggregated_analytics/aggregate_participant_analytics.py (1)

3-7: 🛠️ Refactor suggestion

Add input validation for required columns.

The function assumes certain columns exist but doesn't validate their presence.

     if df_participant_analytics.empty:
         if verbose:
             print("No participant analytics to aggregate")
         return None
+
+    required_columns = {"courseId", "id", "responseCount", "totalScore", "totalPoints", "totalXp"}
+    missing_columns = required_columns - set(df_participant_analytics.columns)
+    if missing_columns:
+        raise ValueError(f"Missing required columns: {missing_columns}")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

    if df_participant_analytics.empty:
        if verbose:
            print("No participant analytics to aggregate")
        return None

    required_columns = {"courseId", "id", "responseCount", "totalScore", "totalPoints", "totalXp"}
    missing_columns = required_columns - set(df_participant_analytics.columns)
    if missing_columns:
        raise ValueError(f"Missing required columns: {missing_columns}")
apps/analytics/src/modules/aggregated_analytics/load_participant_analytics.py (3)

4-10: 🛠️ Refactor suggestion

Add type hints and error handling to the conversion function.

The function should include type hints and handle potential conversion errors.

-def convert_to_df(analytics):
+from typing import List, Dict, Any
+
+def convert_to_df(analytics: List[Dict[str, Any]]) -> pd.DataFrame:
+    """Convert database analytics records to a pandas DataFrame.
+
+    Args:
+        analytics (List[Dict[str, Any]]): List of analytics records
+
+    Returns:
+        pd.DataFrame: Converted DataFrame
+    """
     # convert the database query result into a pandas dataframe
-    rows = []
-    for item in analytics:
-        rows.append(dict(item))
+    try:
+        return pd.DataFrame([dict(item) for item in analytics])
+    except (AttributeError, TypeError) as e:
+        raise ValueError(f"Failed to convert analytics to DataFrame: {e}")
-    return pd.DataFrame(rows)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

from typing import List, Dict, Any

def convert_to_df(analytics: List[Dict[str, Any]]) -> pd.DataFrame:
    """Convert database analytics records to a pandas DataFrame.

    Args:
        analytics (List[Dict[str, Any]]): List of analytics records

    Returns:
        pd.DataFrame: Converted DataFrame
    """
    # convert the database query result into a pandas dataframe
    try:
        return pd.DataFrame([dict(item) for item in analytics])
    except (AttributeError, TypeError) as e:
        raise ValueError(f"Failed to convert analytics to DataFrame: {e}")

13-16: 🛠️ Refactor suggestion

Add error handling for database operations.

The function should handle potential database errors gracefully.

+from prisma.models import ParticipantAnalytics
+from prisma import Prisma
+
-def load_participant_analytics(db, timestamp, analytics_type, verbose=False):
+def load_participant_analytics(
+    db: Prisma,
+    timestamp: str,
+    analytics_type: str,
+    verbose: bool = False
+) -> pd.DataFrame:
+    """Load participant analytics from the database.
+
+    Args:
+        db (Prisma): Database connection
+        timestamp (str): Timestamp to filter analytics
+        analytics_type (str): Type of analytics to load
+        verbose (bool, optional): Enable verbose output. Defaults to False.
+
+    Returns:
+        pd.DataFrame: Loaded analytics data
+    """
+    try:
         participant_analytics = db.participantanalytics.find_many(
             where={"timestamp": timestamp, "type": analytics_type},
         )
+    except Exception as e:
+        raise RuntimeError(f"Failed to load participant analytics: {e}")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

from prisma.models import ParticipantAnalytics
from prisma import Prisma

def load_participant_analytics(
    db: Prisma,
    timestamp: str,
    analytics_type: str,
    verbose: bool = False
) -> pd.DataFrame:
    """Load participant analytics from the database.

    Args:
        db (Prisma): Database connection
        timestamp (str): Timestamp to filter analytics
        analytics_type (str): Type of analytics to load
        verbose (bool, optional): Enable verbose output. Defaults to False.

    Returns:
        pd.DataFrame: Loaded analytics data
    """
    try:
        participant_analytics = db.participantanalytics.find_many(
            where={"timestamp": timestamp, "type": analytics_type},
        )
    except Exception as e:
        raise RuntimeError(f"Failed to load participant analytics: {e}")

20-24: ⚠️ Potential issue

Fix undefined variables in format string.

The format string references undefined variables start_date and end_date.

         print(
             "Found {} analytics for the timespan from {} to {}".format(
-                len(participant_analytics), start_date, end_date
+                len(participant_analytics), timestamp, analytics_type
             )
         )

Committable suggestion skipped: line range outside the PR's diff.

🧰 Tools
🪛 Ruff (0.8.0)

22-22: Undefined name start_date

(F821)


22-22: Undefined name end_date

(F821)

apps/analytics/src/modules/participant_course_analytics/save_participant_course_analytics.py (2)

1-2: 🛠️ Refactor suggestion

Add type hints and input validation.

The function parameters lack type hints and input validation. Consider adding type hints and validating the DataFrame structure before processing.

-def save_participant_course_analytics(db, df_activity):
+from pandas import DataFrame
+from prisma import Prisma
+
+def save_participant_course_analytics(db: Prisma, df_activity: DataFrame) -> None:
+    required_columns = ['courseId', 'participantId', 'activeWeeks', 'activeDaysPerWeek', 
+                       'meanElementsPerDay', 'activityLevel']
+    if not all(col in df_activity.columns for col in required_columns):
+        raise ValueError(f"DataFrame missing required columns: {required_columns}")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

from pandas import DataFrame
from prisma import Prisma

def save_participant_course_analytics(db: Prisma, df_activity: DataFrame) -> None:
    required_columns = ['courseId', 'participantId', 'activeWeeks', 'activeDaysPerWeek', 
                       'meanElementsPerDay', 'activityLevel']
    if not all(col in df_activity.columns for col in required_columns):
        raise ValueError(f"DataFrame missing required columns: {required_columns}")
    for _, row in df_activity.iterrows():

2-26: ⚠️ Potential issue

Add error handling and logging mechanism.

The database operations lack error handling and logging. Consider wrapping the upsert operation in a try-catch block and adding logging.

+import logging
+from prisma.errors import PrismaError
+
 def save_participant_course_analytics(db: Prisma, df_activity: DataFrame) -> None:
+    logger = logging.getLogger(__name__)
     for _, row in df_activity.iterrows():
-        db.participantcourseanalytics.upsert(
-            where={
+        try:
+            db.participantcourseanalytics.upsert(
+                where={
+                    "courseId_participantId": {
+                        "courseId": row["courseId"],
+                        "participantId": row["participantId"],
+                    }
+                },
+                data={
+                    "create": {
+                        "activeWeeks": row["activeWeeks"],
+                        "activeDaysPerWeek": row["activeDaysPerWeek"],
+                        "meanElementsPerDay": row["meanElementsPerDay"],
+                        "activityLevel": row["activityLevel"],
+                        "course": {"connect": {"id": row["courseId"]}},
+                        "participant": {"connect": {"id": row["participantId"]}},
+                    },
+                    "update": {
+                        "activeWeeks": row["activeWeeks"],
+                        "activeDaysPerWeek": row["activeDaysPerWeek"],
+                        "meanElementsPerDay": row["meanElementsPerDay"],
+                        "activityLevel": row["activityLevel"],
+                    },
+                },
+            )
+        except PrismaError as e:
+            logger.error(f"Failed to upsert analytics for course {row['courseId']} and participant {row['participantId']}: {str(e)}")
+            raise
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

    import logging
    from prisma.errors import PrismaError

    def save_participant_course_analytics(db: Prisma, df_activity: DataFrame) -> None:
        logger = logging.getLogger(__name__)
        for _, row in df_activity.iterrows():
            try:
                db.participantcourseanalytics.upsert(
                    where={
                        "courseId_participantId": {
                            "courseId": row["courseId"],
                            "participantId": row["participantId"],
                        }
                    },
                    data={
                        "create": {
                            "activeWeeks": row["activeWeeks"],
                            "activeDaysPerWeek": row["activeDaysPerWeek"],
                            "meanElementsPerDay": row["meanElementsPerDay"],
                            "activityLevel": row["activityLevel"],
                            "course": {"connect": {"id": row["courseId"]}},
                            "participant": {"connect": {"id": row["participantId"]}},
                        },
                        "update": {
                            "activeWeeks": row["activeWeeks"],
                            "activeDaysPerWeek": row["activeDaysPerWeek"],
                            "meanElementsPerDay": row["meanElementsPerDay"],
                            "activityLevel": row["activityLevel"],
                        },
                    },
                )
            except PrismaError as e:
                logger.error(f"Failed to upsert analytics for course {row['courseId']} and participant {row['participantId']}: {str(e)}")
                raise
apps/frontend-control/next.config.mjs (2)

9-37: 💡 Codebase verification

Found inconsistencies in session path usage

There are still active references to /session/ (singular) paths in the codebase while the redirects are set up for /sessions/ (plural):

  • LiveQuizLists.tsx: Uses link={/session/${quiz.id}}
  • StartModal.tsx: Uses router.push(/session/${quizId})

These paths won't be handled by the current redirect rules, potentially leading to 404 errors. Either:

  1. Update the redirect rules to include singular form (/session/) paths, or
  2. Update the components to use the new plural form (/quizzes/) directly
🔗 Analysis chain

Verify SEO impact of permanent redirects.

The use of permanent (301) redirects will inform search engines to permanently update their indexes. Ensure this is the intended behavior as it may impact SEO.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for any remaining references to old session paths in the codebase

# Search for any remaining references to /sessions in components, pages, or documentation
echo "Checking for remaining references to /sessions paths..."
rg -g '!{*.test.*,*.spec.*}' -g '!next.config.*' '/sessions[/"']' ./apps/frontend-control/

# Check if there are any external links pointing to the old paths
echo "Checking for external links in documentation..."
rg -g '*.md' -g '*.mdx' '\[.*\]\(.*\/sessions[/"'\)]' .

Length of output: 238


Script:

#!/bin/bash
# Let's try a different approach to search for session-related paths

# Search for any remaining references to /sessions in components and pages
echo "Checking for remaining references to session paths..."
rg -g '!{*.test.*,*.spec.*,*.config.*}' '/sessions?' ./apps/frontend-control/

# Check for any router or navigation related code using sessions
echo "Checking for router/navigation code with sessions..."
rg -g '!{*.test.*,*.spec.*,*.config.*}' -i '(router|navigation|link).*session' ./apps/frontend-control/

# Look for any API endpoints or backend routes using sessions
echo "Checking for API endpoints with sessions..."
rg -g '!{*.test.*,*.spec.*,*.config.*}' '(api|route).*session' ./apps/frontend-control/

Length of output: 1174


9-37: 💡 Codebase verification

Found inconsistency between redirects and actual usage of session paths

The redirects are configured for /sessions/* but the codebase actually uses /session/* (singular) in components:

  • LiveQuizLists.tsx: uses /session/${quiz.id}
  • StartModal.tsx: uses /session/${quiz.id}

This mismatch means the redirects won't catch the actual paths being used in the application, potentially leading to 404 errors.

🔗 Analysis chain

Verify route coverage for all session-related paths.

Ensure all session-related paths are covered by the redirects to prevent 404 errors.


Let me gather more information by searching for any session-related references in the codebase using different patterns.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for any session-related routes that might be missed in the redirects

# Search for route definitions in pages directory
echo "Checking for route definitions in pages directory..."
fd -e tsx -e ts -e js -e jsx . ./apps/frontend-control/src/pages \
  --exec grep -l "sessions"

# Search for router.push or Link components with session paths
echo "Checking for router navigation with session paths..."
ast-grep --pattern 'router.push("/sessions' ./apps/frontend-control/
ast-grep --pattern '<Link href="/sessions' ./apps/frontend-control/

Length of output: 461


Script:

#!/bin/bash
# Search for any session-related paths or routes in the codebase
echo "Searching for session-related paths in the codebase..."
rg -i "/sessions?/" ./apps/frontend-control/

# Search for any session-related strings in components and utilities
echo "Searching for session references in components..."
rg -i "sessions?" ./apps/frontend-control/src/ -g "!*.test.*" -g "!*.spec.*"

# Search for any router or navigation related code with sessions
echo "Searching for router/navigation patterns with sessions..."
ast-grep --pattern 'useRouter().push($path)' ./apps/frontend-control/
ast-grep --pattern 'navigate($path)' ./apps/frontend-control/

Length of output: 1763

apps/analytics/src/modules/participant_analytics/compute_participant_analytics.py (2)

7-9: 🛠️ Refactor suggestion

Add docstring and type hints.

The function lacks documentation and type hints. Add a comprehensive docstring explaining parameters, return type, and possible exceptions.

+from datetime import datetime
+from typing import Optional
+from prisma import Prisma
+import pandas as pd
+
 def compute_participant_analytics(
-    db, start_date, end_date, timestamp, analytics_type="DAILY", verbose=False
+    db: Prisma,
+    start_date: datetime,
+    end_date: datetime,
+    timestamp: datetime,
+    analytics_type: str = "DAILY",
+    verbose: bool = False
+) -> None:
+    """Compute and save participant analytics for a given date range.
+
+    Args:
+        db: Database connection instance
+        start_date: Start date for analytics computation
+        end_date: End date for analytics computation
+        timestamp: Timestamp for the analytics record
+        analytics_type: Type of analytics (default: "DAILY")
+        verbose: Enable verbose logging (default: False)
+
+    Raises:
+        ValueError: If date range is invalid or analytics_type is unsupported
+    """
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

from datetime import datetime
from typing import Optional
from prisma import Prisma
import pandas as pd

def compute_participant_analytics(
    db: Prisma,
    start_date: datetime,
    end_date: datetime,
    timestamp: datetime,
    analytics_type: str = "DAILY",
    verbose: bool = False
) -> None:
    """Compute and save participant analytics for a given date range.

    Args:
        db: Database connection instance
        start_date: Start date for analytics computation
        end_date: End date for analytics computation
        timestamp: Timestamp for the analytics record
        analytics_type: Type of analytics (default: "DAILY")
        verbose: Enable verbose logging (default: False)

    Raises:
        ValueError: If date range is invalid or analytics_type is unsupported
    """

10-20: ⚠️ Potential issue

Improve error handling and input validation.

Add input validation and enhance error handling for the data processing pipeline.

+    # Validate input parameters
+    if start_date > end_date:
+        raise ValueError("start_date must be before or equal to end_date")
+    
+    valid_types = ["DAILY", "WEEKLY", "MONTHLY"]
+    if analytics_type not in valid_types:
+        raise ValueError(f"analytics_type must be one of {valid_types}")
+
     df_details = get_participant_responses(db, start_date, end_date, verbose)
 
     # Compute the correctness of each question response detail
-    df_details, df_element_instances = compute_correctness(db, df_details, verbose)
+    try:
+        df_details, df_element_instances = compute_correctness(db, df_details, verbose)
+    except Exception as e:
+        if verbose:
+            print(f"Error computing correctness: {str(e)}")
+        raise

Committable suggestion skipped: line range outside the PR's diff.

apps/analytics/src/modules/aggregated_analytics/compute_aggregated_analytics.py (1)

31-34: 🛠️ Refactor suggestion

Add error handling for database operations.

The database operation could fail and should be wrapped in try-except to handle errors gracefully.

     # store the computed aggregated analytics in the database
     if df_aggregated_analytics is not None:
-        save_aggregated_analytics(
-            db, df_aggregated_analytics, timestamp, analytics_type
-        )
+        try:
+            save_aggregated_analytics(
+                db, df_aggregated_analytics, timestamp, analytics_type
+            )
+        except Exception as e:
+            print(f"Error saving aggregated analytics: {e}")
+            raise
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

    if df_aggregated_analytics is not None:
        try:
            save_aggregated_analytics(
                db, df_aggregated_analytics, timestamp, analytics_type
            )
        except Exception as e:
            print(f"Error saving aggregated analytics: {e}")
            raise
apps/analytics/src/modules/participant_performance/save_participant_performance.py (1)

1-30: 🛠️ Refactor suggestion

Refactor to improve error handling and reduce code duplication.

Several improvements can be made to enhance the robustness and maintainability of this function:

  1. Add error handling for database operations
  2. Reduce duplication between create and update blocks
  3. Add type hints and docstring
  4. Validate input DataFrame columns
+from typing import Dict, Any
+import pandas as pd
+
-def save_participant_performance(db, df_performance, course_id):
+def save_participant_performance(db, df_performance: pd.DataFrame, course_id: str):
+    """Save participant performance metrics to the database.
+
+    Args:
+        db: Database connection object
+        df_performance: DataFrame containing performance metrics
+        course_id: ID of the course
+
+    Raises:
+        ValueError: If required columns are missing from the DataFrame
+    """
+    required_columns = [
+        "participantId", "firstErrorRate", "firstPerformance",
+        "lastErrorRate", "lastPerformance", "totalErrorRate", "totalPerformance"
+    ]
+    missing_columns = [col for col in required_columns if col not in df_performance.columns]
+    if missing_columns:
+        raise ValueError(f"Missing required columns: {missing_columns}")
+
+    # Define common data fields to avoid duplication
+    def get_performance_data(row: pd.Series) -> Dict[str, Any]:
+        return {
+            "firstErrorRate": row["firstErrorRate"],
+            "firstPerformance": row["firstPerformance"],
+            "lastErrorRate": row["lastErrorRate"],
+            "lastPerformance": row["lastPerformance"],
+            "totalErrorRate": row["totalErrorRate"],
+            "totalPerformance": row["totalPerformance"],
+        }
+
     for _, row in df_performance.iterrows():
-        db.participantperformance.upsert(
-            where={
-                "participantId_courseId": {
-                    "participantId": row["participantId"],
-                    "courseId": course_id,
-                }
-            },
-            data={
-                "create": {
-                    "firstErrorRate": row["firstErrorRate"],
-                    "firstPerformance": row["firstPerformance"],
-                    "lastErrorRate": row["lastErrorRate"],
-                    "lastPerformance": row["lastPerformance"],
-                    "totalErrorRate": row["totalErrorRate"],
-                    "totalPerformance": row["totalPerformance"],
-                    "participant": {"connect": {"id": row["participantId"]}},
-                    "course": {"connect": {"id": course_id}},
+        try:
+            performance_data = get_performance_data(row)
+            db.participantperformance.upsert(
+                where={
+                    "participantId_courseId": {
+                        "participantId": row["participantId"],
+                        "courseId": course_id,
+                    }
                 },
-                "update": {
-                    "firstErrorRate": row["firstErrorRate"],
-                    "firstPerformance": row["firstPerformance"],
-                    "lastErrorRate": row["lastErrorRate"],
-                    "lastPerformance": row["lastPerformance"],
-                    "totalErrorRate": row["totalErrorRate"],
-                    "totalPerformance": row["totalPerformance"],
+                data={
+                    "create": {
+                        **performance_data,
+                        "participant": {"connect": {"id": row["participantId"]}},
+                        "course": {"connect": {"id": course_id}},
+                    },
+                    "update": performance_data,
                 },
-            },
-        )
+            )
+        except Exception as e:
+            print(f"Error upserting performance data for participant {row['participantId']}: {e}")
+            raise
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

from typing import Dict, Any
import pandas as pd

def save_participant_performance(db, df_performance: pd.DataFrame, course_id: str):
    """Save participant performance metrics to the database.

    Args:
        db: Database connection object
        df_performance: DataFrame containing performance metrics
        course_id: ID of the course

    Raises:
        ValueError: If required columns are missing from the DataFrame
    """
    required_columns = [
        "participantId", "firstErrorRate", "firstPerformance",
        "lastErrorRate", "lastPerformance", "totalErrorRate", "totalPerformance"
    ]
    missing_columns = [col for col in required_columns if col not in df_performance.columns]
    if missing_columns:
        raise ValueError(f"Missing required columns: {missing_columns}")

    # Define common data fields to avoid duplication
    def get_performance_data(row: pd.Series) -> Dict[str, Any]:
        return {
            "firstErrorRate": row["firstErrorRate"],
            "firstPerformance": row["firstPerformance"],
            "lastErrorRate": row["lastErrorRate"],
            "lastPerformance": row["lastPerformance"],
            "totalErrorRate": row["totalErrorRate"],
            "totalPerformance": row["totalPerformance"],
        }

    for _, row in df_performance.iterrows():
        try:
            performance_data = get_performance_data(row)
            db.participantperformance.upsert(
                where={
                    "participantId_courseId": {
                        "participantId": row["participantId"],
                        "courseId": course_id,
                    }
                },
                data={
                    "create": {
                        **performance_data,
                        "participant": {"connect": {"id": row["participantId"]}},
                        "course": {"connect": {"id": course_id}},
                    },
                    "update": performance_data,
                },
            )
        except Exception as e:
            print(f"Error upserting performance data for participant {row['participantId']}: {e}")
            raise
apps/analytics/src/modules/participant_performance/compute_performance_levels.py (1)

1-39: 🛠️ Refactor suggestion

Enhance robustness and maintainability of performance level computation.

Several improvements can be made:

  1. Add type hints and docstring
  2. Extract magic numbers as constants
  3. Define performance levels as enums
  4. Add input validation
  5. Handle edge cases
+from enum import Enum
+from typing import Optional
+import pandas as pd
+
+class PerformanceLevel(str, Enum):
+    HIGH = "HIGH"
+    MEDIUM = "MEDIUM"
+    LOW = "LOW"
+
+# Quantile thresholds for performance levels
+Q1_THRESHOLD = 0.25
+Q3_THRESHOLD = 0.75
+
-def compute_performance_levels(df_performance):
+def compute_performance_levels(df_performance: pd.DataFrame) -> pd.DataFrame:
+    """Compute performance levels based on error rate quantiles.
+
+    Args:
+        df_performance: DataFrame containing error rate metrics
+
+    Returns:
+        DataFrame with added performance level columns
+
+    Raises:
+        ValueError: If required columns are missing or DataFrame is empty
+    """
+    required_columns = ["firstErrorRate", "lastErrorRate", "totalErrorRate"]
+    missing_columns = [col for col in required_columns if col not in df_performance.columns]
+    if missing_columns:
+        raise ValueError(f"Missing required columns: {missing_columns}")
+
+    if df_performance.empty:
+        raise ValueError("Cannot compute performance levels on empty DataFrame")
+
     # set the performance levels based on the quantiles
-    first_qs = df_performance.firstErrorRate.quantile([0.25, 0.75])
-    last_qs = df_performance.lastErrorRate.quantile([0.25, 0.75])
-    total_qs = df_performance.totalErrorRate.quantile([0.25, 0.75])
+    first_qs = df_performance.firstErrorRate.quantile([Q1_THRESHOLD, Q3_THRESHOLD])
+    last_qs = df_performance.lastErrorRate.quantile([Q1_THRESHOLD, Q3_THRESHOLD])
+    total_qs = df_performance.totalErrorRate.quantile([Q1_THRESHOLD, Q3_THRESHOLD])

-    first_q1 = first_qs[0.25]
-    first_q3 = first_qs[0.75]
-    last_q1 = last_qs[0.25]
-    last_q3 = last_qs[0.75]
-    total_q1 = total_qs[0.25]
-    total_q3 = total_qs[0.75]
+    first_q1 = first_qs[Q1_THRESHOLD]
+    first_q3 = first_qs[Q3_THRESHOLD]
+    last_q1 = last_qs[Q1_THRESHOLD]
+    last_q3 = last_qs[Q3_THRESHOLD]
+    total_q1 = total_qs[Q1_THRESHOLD]
+    total_q3 = total_qs[Q3_THRESHOLD]

     # set the performance levels based on the quantiles (inverse logic compared to activity - higher error rate is worse)
-    df_performance["firstPerformance"] = "MEDIUM"
+    df_performance["firstPerformance"] = PerformanceLevel.MEDIUM
     df_performance.loc[
-        df_performance.firstErrorRate <= first_q1, "firstPerformance"
-    ] = "HIGH"
+        df_performance.firstErrorRate <= first_q1, "firstPerformance"
+    ] = PerformanceLevel.HIGH
     df_performance.loc[
-        df_performance.firstErrorRate >= first_q3, "firstPerformance"
-    ] = "LOW"
+        df_performance.firstErrorRate >= first_q3, "firstPerformance"
+    ] = PerformanceLevel.LOW

-    df_performance["lastPerformance"] = "MEDIUM"
-    df_performance.loc[df_performance.lastErrorRate <= last_q1, "lastPerformance"] = (
-        "HIGH"
-    )
-    df_performance.loc[df_performance.lastErrorRate >= last_q3, "lastPerformance"] = (
-        "LOW"
-    )
+    df_performance["lastPerformance"] = PerformanceLevel.MEDIUM
+    df_performance.loc[
+        df_performance.lastErrorRate <= last_q1, "lastPerformance"
+    ] = PerformanceLevel.HIGH
+    df_performance.loc[
+        df_performance.lastErrorRate >= last_q3, "lastPerformance"
+    ] = PerformanceLevel.LOW

-    df_performance["totalPerformance"] = "MEDIUM"
+    df_performance["totalPerformance"] = PerformanceLevel.MEDIUM
     df_performance.loc[
-        df_performance.totalErrorRate <= total_q1, "totalPerformance"
-    ] = "HIGH"
+        df_performance.totalErrorRate <= total_q1, "totalPerformance"
+    ] = PerformanceLevel.HIGH
     df_performance.loc[
-        df_performance.totalErrorRate >= total_q3, "totalPerformance"
-    ] = "LOW"
+        df_performance.totalErrorRate >= total_q3, "totalPerformance"
+    ] = PerformanceLevel.LOW

     return df_performance
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

from enum import Enum
from typing import Optional
import pandas as pd

class PerformanceLevel(str, Enum):
    HIGH = "HIGH"
    MEDIUM = "MEDIUM"
    LOW = "LOW"

# Quantile thresholds for performance levels
Q1_THRESHOLD = 0.25
Q3_THRESHOLD = 0.75

def compute_performance_levels(df_performance: pd.DataFrame) -> pd.DataFrame:
    """Compute performance levels based on error rate quantiles.

    Args:
        df_performance: DataFrame containing error rate metrics

    Returns:
        DataFrame with added performance level columns

    Raises:
        ValueError: If required columns are missing or DataFrame is empty
    """
    required_columns = ["firstErrorRate", "lastErrorRate", "totalErrorRate"]
    missing_columns = [col for col in required_columns if col not in df_performance.columns]
    if missing_columns:
        raise ValueError(f"Missing required columns: {missing_columns}")

    if df_performance.empty:
        raise ValueError("Cannot compute performance levels on empty DataFrame")

    # set the performance levels based on the quantiles
    first_qs = df_performance.firstErrorRate.quantile([Q1_THRESHOLD, Q3_THRESHOLD])
    last_qs = df_performance.lastErrorRate.quantile([Q1_THRESHOLD, Q3_THRESHOLD])
    total_qs = df_performance.totalErrorRate.quantile([Q1_THRESHOLD, Q3_THRESHOLD])

    first_q1 = first_qs[Q1_THRESHOLD]
    first_q3 = first_qs[Q3_THRESHOLD]
    last_q1 = last_qs[Q1_THRESHOLD]
    last_q3 = last_qs[Q3_THRESHOLD]
    total_q1 = total_qs[Q1_THRESHOLD]
    total_q3 = total_qs[Q3_THRESHOLD]

    # set the performance levels based on the quantiles (inverse logic compared to activity - higher error rate is worse)
    df_performance["firstPerformance"] = PerformanceLevel.MEDIUM
    df_performance.loc[
        df_performance.firstErrorRate <= first_q1, "firstPerformance"
    ] = PerformanceLevel.HIGH
    df_performance.loc[
        df_performance.firstErrorRate >= first_q3, "firstPerformance"
    ] = PerformanceLevel.LOW

    df_performance["lastPerformance"] = PerformanceLevel.MEDIUM
    df_performance.loc[
        df_performance.lastErrorRate <= last_q1, "lastPerformance"
    ] = PerformanceLevel.HIGH
    df_performance.loc[
        df_performance.lastErrorRate >= last_q3, "lastPerformance"
    ] = PerformanceLevel.LOW

    df_performance["totalPerformance"] = PerformanceLevel.MEDIUM
    df_performance.loc[
        df_performance.totalErrorRate <= total_q1, "totalPerformance"
    ] = PerformanceLevel.HIGH
    df_performance.loc[
        df_performance.totalErrorRate >= total_q3, "totalPerformance"
    ] = PerformanceLevel.LOW

    return df_performance
apps/analytics/src/modules/participant_course_analytics/get_active_weeks.py (1)

4-30: 🛠️ Refactor suggestion

Consider batch processing for database queries

The current implementation queries the database for each participant individually within a loop, which could be inefficient for courses with many participants.

Consider refactoring to use a single batch query:

-    for participation in participations:
-        participant_id = participation["participantId"]
-        weekly_analytics = db.participantanalytics.find_many(
-            where={
-                "type": "WEEKLY",
-                "courseId": course_id,
-                "participantId": participant_id,
-            },
-        )
+    participant_ids = [p["participantId"] for p in participations]
+    weekly_analytics = db.participantanalytics.find_many(
+        where={
+            "type": "WEEKLY",
+            "courseId": course_id,
+            "participantId": {"in": participant_ids},
+        },
+    )
+    
+    # Group by participantId and count entries
+    analytics_count = pd.DataFrame(weekly_analytics).groupby("participantId").size()

Committable suggestion skipped: line range outside the PR's diff.

apps/analytics/src/modules/participant_performance/compute_response_error_rates.py (2)

46-52: ⚠️ Potential issue

Add protection against division by zero

The error rate calculations could fail with division by zero for participants with no responses.

Add safety checks:

-    df_performance["firstErrorRate"] = (
-        df_performance["wrongFirstResponseCount"] / df_performance["responseCount"]
-    )
-    df_performance["lastErrorRate"] = (
-        df_performance["wrongLastResponseCount"] / df_performance["responseCount"]
-    )
+    # Safely compute error rates with zero division protection
+    df_performance["firstErrorRate"] = np.where(
+        df_performance["responseCount"] > 0,
+        df_performance["wrongFirstResponseCount"] / df_performance["responseCount"],
+        0
+    )
+    df_performance["lastErrorRate"] = np.where(
+        df_performance["responseCount"] > 0,
+        df_performance["wrongLastResponseCount"] / df_performance["responseCount"],
+        0
+    )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

    # Safely compute error rates with zero division protection
    df_performance["firstErrorRate"] = np.where(
        df_performance["responseCount"] > 0,
        df_performance["wrongFirstResponseCount"] / df_performance["responseCount"],
        0
    )
    df_performance["lastErrorRate"] = np.where(
        df_performance["responseCount"] > 0,
        df_performance["wrongLastResponseCount"] / df_performance["responseCount"],
        0
    )

7-34: 🛠️ Refactor suggestion

Optimize DataFrame operations for better performance

Multiple group-by operations on the same DataFrame could be combined for better performance.

Consider this optimization:

-    df_response_count = (
-        df_responses.groupby("participantId").size().reset_index(name="responseCount")
-    )
-    df_first_response_wrong_count = (
-        df_responses[df_responses["firstResponseCorrectness"] == "WRONG"]
-        .groupby("participantId")
-        .size()
-        .reset_index(name="wrongFirstResponseCount")
-    )
-    df_last_response_wrong_count = (
-        df_responses[df_responses["lastResponseCorrectness"] == "WRONG"]
-        .groupby("participantId")
-        .size()
-        .reset_index(name="wrongLastResponseCount")
-    )
+    # Combine aggregations for better performance
+    df_aggregated = df_responses.groupby("participantId").agg(
+        responseCount=("participantId", "size"),
+        wrongFirstResponseCount=("firstResponseCorrectness", lambda x: (x == "WRONG").sum()),
+        wrongLastResponseCount=("lastResponseCorrectness", lambda x: (x == "WRONG").sum()),
+        totalErrorRate=("responseErrorRate", "mean")
+    ).reset_index()
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

    # compute the total number of responses, number of wrong first and last responses,
    # total number of wrong responses, and the average total error rate
    # Combine aggregations for better performance
    df_aggregated = df_responses.groupby("participantId").agg(
        responseCount=("participantId", "size"),
        wrongFirstResponseCount=("firstResponseCorrectness", lambda x: (x == "WRONG").sum()),
        wrongLastResponseCount=("lastResponseCorrectness", lambda x: (x == "WRONG").sum()),
        totalErrorRate=("responseErrorRate", "mean")
    ).reset_index()
apps/analytics/src/modules/participant_course_analytics/compute_participant_activity.py (1)

1-4: 🛠️ Refactor suggestion

Add input validation and error handling.

The function should validate input parameters to ensure robustness:

  • Verify that course_start is before course_end
  • Validate that df_activity contains required columns
  • Handle potential database connection issues

Example implementation:

 import pandas as pd
+from datetime import datetime
+from typing import List, Dict
+
+def validate_course_dates(course_start: datetime, course_end: datetime) -> None:
+    if course_start > course_end:
+        raise ValueError("course_start must be before course_end")
 
 def compute_participant_activity(db, df_activity, course_id, course_start, course_end):
+    validate_course_dates(course_start, course_end)
+    required_columns = ["participantId"]
+    missing_columns = [col for col in required_columns if col not in df_activity.columns]
+    if missing_columns:
+        raise ValueError(f"Missing required columns: {missing_columns}")

Committable suggestion skipped: line range outside the PR's diff.

apps/analytics/src/notebooks/aggregated_course_analytics.ipynb (3)

46-46: ⚠️ Potential issue

Security concern: Hardcoded database credentials.

The database connection string contains hardcoded credentials, which is a security risk.

Use environment variables instead:

-os.environ["DATABASE_URL"] = "postgresql://klicker:klicker@localhost:5432/klicker-prod"
+# Load from .env file or environment
+from dotenv import load_dotenv
+load_dotenv()
+if "DATABASE_URL" not in os.environ:
+    raise ValueError("DATABASE_URL environment variable is required")

Committable suggestion skipped: line range outside the PR's diff.


83-83: 🛠️ Refactor suggestion

Ensure database disconnection in case of errors.

The database disconnection should be guaranteed even if errors occur.

-db.disconnect()
+try:
+    # Your processing code here
+finally:
+    db.disconnect()

Committable suggestion skipped: line range outside the PR's diff.


66-74: 🛠️ Refactor suggestion

Add error handling for database operations.

The course processing loop lacks error handling for database operations and computation failures.

 # find all courses that started in the past
-df_courses = get_running_past_courses(db)
+try:
+    df_courses = get_running_past_courses(db)
+except Exception as e:
+    print(f"Failed to fetch courses: {e}")
+    raise

 # iterate over all courses and compute the participant course analytics
 for idx, course in df_courses.iterrows():
     print(f"Processing course", idx, "of", len(df_courses), "with id", course["id"])
+    try:
         # computation of activity per weekday
         compute_weekday_activity(db, course)
+    except Exception as e:
+        print(f"Failed to process course {course['id']}: {e}")
+        continue
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

    # find all courses that started in the past
    try:
        df_courses = get_running_past_courses(db)
    except Exception as e:
        print(f"Failed to fetch courses: {e}")
        raise

    # iterate over all courses and compute the participant course analytics
    for idx, course in df_courses.iterrows():
        print(f"Processing course", idx, "of", len(df_courses), "with id", course["id"])
        try:
            # computation of activity per weekday
            compute_weekday_activity(db, course)
        except Exception as e:
            print(f"Failed to process course {course['id']}: {e}")
            continue
apps/frontend-control/src/components/liveQuizzes/EmbeddingModal.tsx (1)

32-34: 💡 Codebase verification

Based on the search results, I can now generate the final response:

URL path change from /sessions/ to /quizzes/ appears consistent

The change from /sessions/ to /quizzes/ in the evaluation URL path is consistent with the architectural shift. The only remaining reference to /sessions/ is in the office add-in's URL validation pattern, which should be updated to match the new routing structure.

  • apps/office-addin/src/content/components/URLForm.tsx: Update validation pattern and placeholder URL to use /quizzes/ instead of /sessions/
🔗 Analysis chain

Verify consistent routing path updates across the application

The URL path change from /sessions/${quizId} to /quizzes/${quizId} aligns with the architectural shift. Let's verify this change is consistent across the application.


Let me try a different search pattern to verify the routing changes.


Let me try one more search pattern to verify the routing changes.


Let me try one final search with a simpler pattern.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for any remaining references to /sessions/ in routing paths
rg -g '*.{ts,tsx,js,jsx}' '/sessions/[^"'\'']*evaluation'

# Search for new quiz routing pattern to confirm consistency
rg -g '*.{ts,tsx,js,jsx}' '/quizzes/[^"'\'']*evaluation'

Length of output: 65654


Script:

#!/bin/bash
# Search for any remaining references to /sessions/ in routing paths, being more specific
rg -g '*.{ts,tsx,js,jsx}' 'sessions/[^"'\'']*evaluation'

# Search for new quiz routing pattern to confirm consistency
rg -g '*.{ts,tsx,js,jsx}' 'quizzes/[^"'\'']*evaluation'

Length of output: 65652


Script:

#!/bin/bash
# Search for URL construction patterns with sessions and quizzes
rg -g '*.{ts,tsx,js,jsx}' -B 2 -A 2 '`/(?:sessions|quizzes)/\${[^}]+}/evaluation'

Length of output: 235


Script:

#!/bin/bash
# Search for evaluation URLs in the codebase
rg -g '*.{ts,tsx,js,jsx}' '/evaluation'

Length of output: 70488

apps/analytics/src/modules/participant_analytics/get_participant_responses.py (3)

34-50: 🛠️ Refactor suggestion

Refactor date handling to reduce duplication and improve maintainability

The current implementation has several issues:

  1. Magic date should be defined as a constant
  2. Duplicated code for date assignment
  3. No type hints for better maintainability
+from typing import Dict, Any
+from datetime import date
+
+FUTURE_DATE = date(9999, 12, 31)
+
-def set_course_dates(detail):
+def set_course_dates(detail: Dict[str, Any]) -> Dict[str, Any]:
+    def _set_dates_from_course(course: Dict[str, Any]) -> None:
+        detail["course_start_date"] = course["startDate"]
+        detail["course_end_date"] = course["endDate"]
+
     if detail["practiceQuiz"] is not None:
-        course = detail["practiceQuiz"]["course"]
-        detail["course_start_date"] = course["startDate"]
-        detail["course_end_date"] = course["endDate"]
+        _set_dates_from_course(detail["practiceQuiz"]["course"])
     elif detail["microLearning"] is not None:
-        course = detail["microLearning"]["course"]
-        detail["course_start_date"] = course["startDate"]
-        detail["course_end_date"] = course["endDate"]
+        _set_dates_from_course(detail["microLearning"]["course"])
     else:
-        detail["course_start_date"] = date(9999, 12, 31)
-        detail["course_end_date"] = date(9999, 12, 31)
+        detail["course_start_date"] = FUTURE_DATE
+        detail["course_end_date"] = FUTURE_DATE
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

from typing import Dict, Any
from datetime import date

FUTURE_DATE = date(9999, 12, 31)

def set_course_dates(detail: Dict[str, Any]) -> Dict[str, Any]:
    def _set_dates_from_course(course: Dict[str, Any]) -> None:
        detail["course_start_date"] = course["startDate"]
        detail["course_end_date"] = course["endDate"]

    if detail["practiceQuiz"] is not None:
        _set_dates_from_course(detail["practiceQuiz"]["course"])
    elif detail["microLearning"] is not None:
        _set_dates_from_course(detail["microLearning"]["course"])
    else:
        detail["course_start_date"] = FUTURE_DATE
        detail["course_end_date"] = FUTURE_DATE

    return detail

5-11: ⚠️ Potential issue

Add error handling for missing courseId

The current implementation assumes either practiceQuiz or microLearning exists and has a courseId. This could raise a KeyError if neither exists.

def map_details(detail, participantId):
-    courseId = (
-        detail["practiceQuiz"]["courseId"]
-        if detail["practiceQuiz"]
-        else detail["microLearning"]["courseId"]
-    )
+    courseId = None
+    if detail.get("practiceQuiz"):
+        courseId = detail["practiceQuiz"].get("courseId")
+    elif detail.get("microLearning"):
+        courseId = detail["microLearning"].get("courseId")
+    if courseId is None:
+        raise ValueError("Neither practiceQuiz nor microLearning contains courseId")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

def map_details(detail, participantId):
    courseId = None
    if detail.get("practiceQuiz"):
        courseId = detail["practiceQuiz"].get("courseId")
    elif detail.get("microLearning"):
        courseId = detail["microLearning"].get("courseId")
    if courseId is None:
        raise ValueError("Neither practiceQuiz nor microLearning contains courseId")
    return {**detail, "participantId": participantId, "courseId": courseId}

52-92: 🛠️ Refactor suggestion

Enhance error handling and add type hints

Several improvements can be made to the main function:

  1. Add type hints for better maintainability
  2. Add error handling for empty results
  3. Improve verbose logging implementation
+from typing import Any, List
+from pandas import DataFrame
+import logging
+
-def get_participant_responses(db, start_date, end_date, verbose=False):
+def get_participant_responses(
+    db: Any,
+    start_date: date,
+    end_date: date,
+    verbose: bool = False
+) -> DataFrame:
+    logger = logging.getLogger(__name__)
+
     participant_response_details = db.participant.find_many(
         include={
             "detailQuestionResponses": {
                 "where": {"createdAt": {"gte": start_date, "lte": end_date}},
                 "include": {
                     "practiceQuiz": {"include": {"course": True}},
                     "microLearning": {"include": {"course": True}},
                 },
             },
         }
     )
+
+    if not participant_response_details:
+        logger.warning("No participant responses found for the given date range")
+        return pd.DataFrame()

     if verbose:
-        print(
+        logger.info(
             "Found {} participants for the timespan from {} to {}".format(
                 len(participant_response_details), start_date, end_date
             )
         )
-        print(participant_response_details[0])
+        if participant_response_details:
+            logger.debug(f"First response: {participant_response_details[0]}")

Committable suggestion skipped: line range outside the PR's diff.

apps/analytics/src/notebooks/participant_performance.ipynb (2)

48-49: ⚠️ Potential issue

Security Risk: Hardcoded database credentials in source code

The database connection string contains hardcoded credentials which is a security risk. Consider using environment variables or a secure configuration management system.

-os.environ["DATABASE_URL"] = "postgresql://klicker:klicker@localhost:5432/klicker-prod"
+os.environ.setdefault("DATABASE_URL", os.getenv("DATABASE_URL"))
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

"# set the environment variable DATABASE_URL to the connection string of your database\n",
os.environ.setdefault("DATABASE_URL", os.getenv("DATABASE_URL"))

79-80: ⚠️ Potential issue

Add error handling for database operations

Database operations should include proper error handling to manage connection issues or query failures gracefully.

-    question_responses = db.questionresponse.find_many(where={"courseId": course_id})
-    df_responses = pd.DataFrame(list(map(lambda x: x.dict(), question_responses)))
+    try:
+        question_responses = db.questionresponse.find_many(where={"courseId": course_id})
+        df_responses = pd.DataFrame(list(map(lambda x: x.dict(), question_responses)))
+    except Exception as e:
+        print(f"Error fetching responses for course {course_id}: {str(e)}")
+        continue

Committable suggestion skipped: line range outside the PR's diff.

apps/analytics/src/modules/aggregated_course_analytics/compute_weekday_activity.py (3)

97-109: ⚠️ Potential issue

Fix potential bug in single_weekday_activity

The function appends 0 for empty weekdays but then unconditionally appends len(df_weekday) afterward, potentially double-counting.

def single_weekday_activity(weekdays, df_daily):
    collector = []
    for weekday in weekdays:
        df_weekday = df_daily[
            df_daily["timestamp"] == pd.Timestamp(weekday).tz_localize("UTC")
        ]

        if df_weekday.empty:
            collector.append(0)
-
-        collector.append(len(df_weekday))
+        else:
+            collector.append(len(df_weekday))

    return statistics.mean(collector) if len(collector) > 0 else 0
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

def single_weekday_activity(weekdays, df_daily):
    collector = []
    for weekday in weekdays:
        df_weekday = df_daily[
            df_daily["timestamp"] == pd.Timestamp(weekday).tz_localize("UTC")
        ]

        if df_weekday.empty:
            collector.append(0)
        else:
            collector.append(len(df_weekday))

    return statistics.mean(collector) if len(collector) > 0 else 0

24-58: 🛠️ Refactor suggestion

Refactor repetitive date range generation

The code for generating date ranges is duplicated for each weekday. Consider refactoring this into a helper function.

+def generate_weekday_range(start_date, end_date, weekday):
+    return pd.date_range(
+        start=start_date,
+        end=pd.Timestamp(end_date) + pd.tseries.offsets.DateOffset(days=1),
+        freq=f"W-{weekday}",
+    )

-    mondays = pd.date_range(
-        start=course_start,
-        end=pd.Timestamp(course_end) + pd.tseries.offsets.DateOffset(days=1),
-        freq="W-MON",
-    )
+    mondays = generate_weekday_range(course_start, course_end, "MON")

Committable suggestion skipped: line range outside the PR's diff.


69-94: ⚠️ Potential issue

Add error handling for database operations

The upsert operation should include error handling to manage potential database failures.

-    db.aggregatedcourseanalytics.upsert(
+    try:
+        db.aggregatedcourseanalytics.upsert(
             where={"courseId": course_id},
             data={
                 "create": {
                     # ... existing create data ...
                 },
                 "update": {
                     # ... existing update data ...
                 },
             },
         )
+    except Exception as e:
+        logging.error(f"Failed to update analytics for course {course_id}: {str(e)}")
+        raise

Committable suggestion skipped: line range outside the PR's diff.

packages/prisma/src/prisma/migrations/20241205154359_learning_analytics_performance_progress/migration.sql (1)

7-12: 🛠️ Refactor suggestion

Consider adding constraints for error rate values.

The error rate fields should logically be between 0 and 1 (or 0% to 100%). Consider adding CHECK constraints to ensure data validity.

    "id" SERIAL NOT NULL,
-   "firstErrorRate" REAL NOT NULL,
+   "firstErrorRate" REAL NOT NULL CHECK ("firstErrorRate" >= 0 AND "firstErrorRate" <= 1),
-   "lastErrorRate" REAL NOT NULL,
+   "lastErrorRate" REAL NOT NULL CHECK ("lastErrorRate" >= 0 AND "lastErrorRate" <= 1),
-   "totalErrorRate" REAL NOT NULL,
+   "totalErrorRate" REAL NOT NULL CHECK ("totalErrorRate" >= 0 AND "totalErrorRate" <= 1),
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

    "firstErrorRate" REAL NOT NULL CHECK ("firstErrorRate" >= 0 AND "firstErrorRate" <= 1),
    "firstPerformance" "PerformanceLevel" NOT NULL,
    "lastErrorRate" REAL NOT NULL CHECK ("lastErrorRate" >= 0 AND "lastErrorRate" <= 1),
    "lastPerformance" "PerformanceLevel" NOT NULL,
    "totalErrorRate" REAL NOT NULL CHECK ("totalErrorRate" >= 0 AND "totalErrorRate" <= 1),
    "totalPerformance" "PerformanceLevel" NOT NULL,
apps/analytics/src/notebooks/aggregated_analytics.ipynb (2)

43-43: ⚠️ Potential issue

Security: Avoid hardcoding database credentials

The database connection string contains hardcoded credentials. This is a security risk as sensitive information should not be stored in version control.

Consider using environment variables:

-os.environ["DATABASE_URL"] = "postgresql://klicker:klicker@localhost:5432/klicker-prod"
+os.environ["DATABASE_URL"] = os.getenv("DATABASE_URL")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

    os.environ["DATABASE_URL"] = os.getenv("DATABASE_URL")

86-89: 🛠️ Refactor suggestion

Add error handling for compute_aggregated_analytics calls

The function call lacks try-catch blocks for error handling. Failed computations could silently fail or crash the notebook.

Consider adding error handling:

-        compute_aggregated_analytics(
-            db, day_start, day_end, timestamp, "DAILY", verbose
-        )
+        try:
+            compute_aggregated_analytics(
+                db, day_start, day_end, timestamp, "DAILY", verbose
+            )
+        except Exception as e:
+            print(f"Error computing daily analytics for {specific_date}: {str(e)}")
+            continue
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

    "        timestamp = day_start\n",
    "        try:\n",
    "            compute_aggregated_analytics(\n",
    "                db, day_start, day_end, timestamp, \"DAILY\", verbose\n",
    "            )\n",
    "        except Exception as e:\n",
    "            print(f\"Error computing daily analytics for {specific_date}: {str(e)}\")\n",
    "            continue\n",
apps/analytics/src/notebooks/participant_analytics.ipynb (2)

41-55: 🛠️ Refactor suggestion

Extract common configuration to a shared module

The database connection setup and script settings are duplicated between notebooks. This violates the DRY principle and makes maintenance harder.

Consider creating a shared configuration module:

# config/analytics_config.py
import os
from prisma import Prisma

def get_db_connection():
    db = Prisma()
    db.connect()
    return db

def get_script_settings():
    return {
        'verbose': False,
        'compute_daily': True,
        'compute_weekly': True,
        'compute_monthly': True,
        'compute_course': True
    }

142-166: 🛠️ Refactor suggestion

Optimize database queries for large datasets

The current implementation fetches all courses at once and processes them in memory. This could cause performance issues with large datasets.

Consider implementing pagination and batch processing:

-    courses = db.course.find_many(
+    batch_size = 100
+    skip = 0
+    while True:
+        courses_batch = db.course.find_many(
+            take=batch_size,
+            skip=skip,
             where={
                 "startDate": {"lte": curr_date + "T23:59:59.999Z"},
-            }
+            },
+            order_by={"id": "asc"}
         )
+        if not courses_batch:
+            break
+        process_courses_batch(courses_batch)
+        skip += batch_size

Committable suggestion skipped: line range outside the PR's diff.

apps/analytics/src/notebooks/archive/test.ipynb (4)

35-41: ⚠️ Potential issue

Performance Concern: Large data fetch with arbitrary limit

Both queries use a fixed take=10000 limit without pagination, which could lead to memory issues with large datasets and potentially miss data beyond this limit.

Consider implementing proper pagination:

-    take=10000,
+    cursor=cursor,
+    take=PAGE_SIZE,

Also applies to: 52-58


24-24: ⚠️ Potential issue

Security Issue: Remove hardcoded database credentials

The database connection string contains hardcoded credentials, which is a security risk. Consider using environment variables for sensitive information.

Apply this change:

-os.environ['DATABASE_URL'] = 'postgresql://klicker:klicker@localhost:5432/klicker-prod'
+os.environ['DATABASE_URL'] = os.getenv('DATABASE_URL')
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

    os.environ['DATABASE_URL'] = os.getenv('DATABASE_URL')

69-77: ⚠️ Potential issue

Enhance error handling in map_response_details function

The function assumes the presence of 'elementInstance' and 'elementStack' without validation, which could lead to KeyError exceptions.

Add proper error handling:

 def map_response_details(response_details):
     as_dict = response_details.dict()
+    try:
+        element_stack = as_dict.get('elementInstance', {}).get('elementStack')
+        if not element_stack:
+            raise ValueError("Missing required elementStack data")
 
         extended_dict = {
             **as_dict,
-            "elementStack": as_dict['elementInstance']['elementStack'],
+            "elementStack": element_stack,
         }
 
         return extended_dict
+    except Exception as e:
+        print(f"Error mapping response details: {e}")
+        return None
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

def map_response_details(response_details):
    as_dict = response_details.dict()
    try:
        element_stack = as_dict.get('elementInstance', {}).get('elementStack')
        if not element_stack:
            raise ValueError("Missing required elementStack data")

        extended_dict = {
            **as_dict,
            "elementStack": element_stack,
        }

        return extended_dict
    except Exception as e:
        print(f"Error mapping response details: {e}")
        return None

228-231: ⚠️ Potential issue

Add error handling for file operations

The CSV file operations lack error handling and path validation. This could lead to runtime errors if the 'out' directory doesn't exist.

Consider adding proper error handling:

+import os
+import pathlib
+
+# Ensure output directory exists
+out_dir = pathlib.Path('out')
+out_dir.mkdir(parents=True, exist_ok=True)
+
+try:
     df_response_details_relevant_daily.to_csv('out/df_response_details_relevant_daily.csv')
     df_response_details_relevant_weekly.to_csv('out/df_response_details_relevant_weekly.csv')
     df_response_details_relevant_monthly.to_csv('out/df_response_details_relevant_monthly.csv')
     df_response_details_relevant_mean.to_csv('out/df_response_details_relevant_mean.csv')
+except Exception as e:
+    print(f"Error saving CSV files: {e}")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

import os
import pathlib

# Ensure output directory exists
out_dir = pathlib.Path('out')
out_dir.mkdir(parents=True, exist_ok=True)

try:
    df_response_details_relevant_daily.to_csv('out/df_response_details_relevant_daily.csv')
    df_response_details_relevant_weekly.to_csv('out/df_response_details_relevant_weekly.csv')
    df_response_details_relevant_monthly.to_csv('out/df_response_details_relevant_monthly.csv')
    df_response_details_relevant_mean.to_csv('out/df_response_details_relevant_mean.csv')
except Exception as e:
    print(f"Error saving CSV files: {e}")
apps/analytics/src/notebooks/archive/daily_participant_analytics.ipynb (3)

137-182: 🛠️ Refactor suggestion

Refactor complex correctness computation logic.

The compute_correctness function is complex and handles multiple question types. Consider breaking it down into smaller, type-specific functions for better maintainability and testability.

Suggested refactor:

def compute_sc_correctness(response, options):
    selected_choice = response['choices'][0]
    correct_choice = next((choice['ix'] for choice in options['choices'] if choice['correct']), None)
    return 'CORRECT' if selected_choice == correct_choice else 'INCORRECT'

def compute_mc_correctness(response, options, available_choices, hamming_distance):
    correctness = max(-2 * hamming_distance / available_choices + 1, 0)
    if correctness == 1:
        return 'CORRECT'
    elif correctness == 0:
        return 'INCORRECT'
    return 'PARTIAL'

def compute_kprim_correctness(hamming_distance):
    return 'CORRECT' if hamming_distance == 0 else 'PARTIAL' if hamming_distance == 1 else 'INCORRECT'

def compute_numerical_correctness(response, options):
    response_value = float(response['value'])
    within_range = list(map(lambda range: float(range['min']) <= response_value <= float(range['max']), options['solutionRanges']))
    return 'CORRECT' if any(within_range) else 'INCORRECT'

278-314: ⚠️ Potential issue

Add error handling for database operations.

The database upsert operation lacks error handling, which could lead to silent failures. Consider adding try-catch blocks and logging.

Suggested implementation:

try:
    for index, row in df_analytics.iterrows():
        try:
            db.participantanalytics.upsert(
                # ... existing upsert code ...
            )
        except Exception as e:
            print(f"Error upserting analytics for participant {row['participantId']}: {str(e)}")
            continue
except Exception as e:
    print(f"Fatal error during analytics update: {str(e)}")
    raise

31-36: ⚠️ Potential issue

Security Issue: Hardcoded database credentials in code.

The database connection string contains hardcoded credentials, which is a security risk. Consider using environment variables or a secure configuration management system.

Apply this diff to use environment variables:

-os.environ['DATABASE_URL'] = 'postgresql://klicker:klicker@localhost:5432/klicker-prod'
+# Ensure DATABASE_URL is set in the environment before running the notebook
+if 'DATABASE_URL' not in os.environ:
+    raise ValueError("DATABASE_URL environment variable is not set")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

db = Prisma()

# set the environment variable DATABASE_URL to the connection string of your database
# Ensure DATABASE_URL is set in the environment before running the notebook
if 'DATABASE_URL' not in os.environ:
    raise ValueError("DATABASE_URL environment variable is not set")

db.connect()
apps/analytics/src/modules/participant_course_analytics/__init__.py (1)

1-4: 🛠️ Refactor suggestion

Define __all__ to explicitly declare the public API.

The imports create a well-structured analytics pipeline, but they should be explicitly exported. Add the following after the imports:

from .get_running_past_courses import get_running_past_courses
from .get_active_weeks import get_active_weeks
from .compute_participant_activity import compute_participant_activity
from .save_participant_course_analytics import save_participant_course_analytics
+
+__all__ = [
+    "get_running_past_courses",
+    "get_active_weeks",
+    "compute_participant_activity",
+    "save_participant_course_analytics",
+]

This makes the module's public API explicit and resolves the unused import warnings.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

from .get_running_past_courses import get_running_past_courses
from .get_active_weeks import get_active_weeks
from .compute_participant_activity import compute_participant_activity
from .save_participant_course_analytics import save_participant_course_analytics

__all__ = [
    "get_running_past_courses",
    "get_active_weeks",
    "compute_participant_activity",
    "save_participant_course_analytics",
]
🧰 Tools
🪛 Ruff (0.8.0)

1-1: .get_running_past_courses.get_running_past_courses imported but unused; consider removing, adding to __all__, or using a redundant alias

(F401)


2-2: .get_active_weeks.get_active_weeks imported but unused; consider removing, adding to __all__, or using a redundant alias

(F401)


3-3: .compute_participant_activity.compute_participant_activity imported but unused; consider removing, adding to __all__, or using a redundant alias

(F401)


4-4: .save_participant_course_analytics.save_participant_course_analytics imported but unused; consider removing, adding to __all__, or using a redundant alias

(F401)

apps/analytics/src/modules/aggregated_analytics/__init__.py (1)

1-4: 🛠️ Refactor suggestion

Add module documentation and define __all__.

The imports establish a clear analytics aggregation pipeline. Add documentation and export list:

+"""
+Aggregated analytics module for processing and storing course-wide analytics data.
+
+This module provides functionality to:
+1. Load individual participant analytics
+2. Aggregate data across participants
+3. Compute course-level metrics
+4. Save aggregated results
+"""
+
from .compute_aggregated_analytics import compute_aggregated_analytics
from .load_participant_analytics import load_participant_analytics
from .aggregate_participant_analytics import aggregate_participant_analytics
from .save_aggregated_analytics import save_aggregated_analytics
+
+__all__ = [
+    "compute_aggregated_analytics",
+    "load_participant_analytics",
+    "aggregate_participant_analytics",
+    "save_aggregated_analytics",
+]
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

"""
Aggregated analytics module for processing and storing course-wide analytics data.

This module provides functionality to:
1. Load individual participant analytics
2. Aggregate data across participants
3. Compute course-level metrics
4. Save aggregated results
"""

from .compute_aggregated_analytics import compute_aggregated_analytics
from .load_participant_analytics import load_participant_analytics
from .aggregate_participant_analytics import aggregate_participant_analytics
from .save_aggregated_analytics import save_aggregated_analytics

__all__ = [
    "compute_aggregated_analytics",
    "load_participant_analytics",
    "aggregate_participant_analytics",
    "save_aggregated_analytics",
]
🧰 Tools
🪛 Ruff (0.8.0)

1-1: .compute_aggregated_analytics.compute_aggregated_analytics imported but unused; consider removing, adding to __all__, or using a redundant alias

(F401)


2-2: .load_participant_analytics.load_participant_analytics imported but unused; consider removing, adding to __all__, or using a redundant alias

(F401)


3-3: .aggregate_participant_analytics.aggregate_participant_analytics imported but unused; consider removing, adding to __all__, or using a redundant alias

(F401)


4-4: .save_aggregated_analytics.save_aggregated_analytics imported but unused; consider removing, adding to __all__, or using a redundant alias

(F401)

packages/prisma/src/prisma/schema/analytics.prisma (1)

235-240: 🛠️ Refactor suggestion

Enforce mutual exclusivity between practiceQuizId and microLearningId.

In the ActivityPerformance model, both practiceQuizId and microLearningId are optional and unique. To prevent records from having both fields set or both fields unset, consider implementing a model-level constraint or application logic to ensure that exactly one of these fields is non-null.

apps/analytics/src/modules/participant_course_analytics/get_running_past_courses.py (1)

6-13: 🛠️ Refactor suggestion

Use datetime objects for date comparisons to improve reliability.

Currently, the code concatenates strings to create datetime formats, which can be error-prone. Consider using datetime objects and ISO formatting for accurate date-time handling.

Apply this refactor to enhance precision:

-from datetime import datetime
+from datetime import datetime, timezone

 def get_running_past_courses(db):
-    curr_date = datetime.now().strftime("%Y-%m-%d")
+    curr_datetime = datetime.now(timezone.utc)

     courses = db.course.find_many(
         where={
-            "startDate": {"lte": curr_date + "T23:59:59.999Z"},
+            "startDate": {"lte": curr_datetime.isoformat()},
         },
         include={"participations": True},
     )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

    curr_datetime = datetime.now(timezone.utc)
    courses = db.course.find_many(
        where={
            # Incremental scripts can add this statement to reduce the amount of required computations
            # 'endDate': {
            #     'gt': datetime.now().strftime('%Y-%m-%d') + 'T00:00:00.000Z'
            # }
            "startDate": {"lte": curr_datetime.isoformat()},

Copy link

cypress bot commented Dec 5, 2024

klicker-uzh    Run #3778

Run Properties:  status check passed Passed #3778  •  git commit a476d57146 ℹ️: Merge 6960a84f681d7fa6c11aff876d3b3b844954234a into 51bbba935c90b70aecdbb887dc56...
Project klicker-uzh
Branch Review participant-performance-analytics
Run status status check passed Passed #3778
Run duration 12m 00s
Commit git commit a476d57146 ℹ️: Merge 6960a84f681d7fa6c11aff876d3b3b844954234a into 51bbba935c90b70aecdbb887dc56...
Committer Julius Schlapbach
View all properties for this run ↗︎

Test results
Tests that failed  Failures 0
Tests that were flaky  Flaky 0
Tests that did not run due to a developer annotating a test with .skip  Pending 0
Tests that did not run due to a failure in a mocha hook  Skipped 0
Tests that passed  Passing 148
View all changes introduced in this branch ↗︎

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

1 participant