Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release/v0.4.0 #163

Merged
merged 11 commits into from
Dec 4, 2024
Merged

Release/v0.4.0 #163

merged 11 commits into from
Dec 4, 2024

Conversation

pmayd
Copy link
Collaborator

@pmayd pmayd commented Dec 3, 2024

Summary by CodeRabbit

  • New Features
    • Introduced a new type alias ParamDict for improved type clarity.
    • Enhanced logincheck function to handle long-running requests and modified request structure.
  • Bug Fixes
    • Updated HTTP methods from GET to POST for various requests, ensuring proper data handling.
  • Documentation
    • Improved readability and clarity in multiple Jupyter notebooks and test files.
  • Chores
    • Updated project version from 0.3.3 to 0.4.0 and modified dependencies in pyproject.toml.
    • Removed outdated configuration files for linting tools.

Copy link

coderabbitai bot commented Dec 3, 2024

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

The changes in this pull request include the removal of configuration files for Flake8 and Pylint, updates to the pre-commit configuration to replace certain repositories with new ones, and modifications to various Jupyter notebooks for improved readability. Additionally, the project version has been incremented, and dependencies have been updated or replaced in the pyproject.toml file. Type annotations have been enhanced across multiple files, and significant updates have been made to the handling of HTTP requests and responses in test files, reflecting a shift from GET to POST methods with structured request bodies.

Changes

File Change Summary
.flake8 Removed configuration file for Flake8, including various settings for style checks.
.pylintrc Removed configuration file for Pylint, detailing coding standards and analysis settings.
.pre-commit-config.yaml Removed isort and black, added ruff-pre-commit and mirrors-mypy with specific hooks.
pyproject.toml Version updated from 0.3.3 to 0.4.0, dependencies updated or removed, including the addition of ruff.
src/pystatis/__init__.py Updated version from 0.3.3 to 0.4.0.
src/pystatis/cache.py Updated type annotations for params parameter from dict to ParamDict in several functions.
src/pystatis/config.py Restructured LANG_TO_COL_MAPPING, added VERSION_MAPPING and ARS_OR_AGS_MAPPING dictionaries.
src/pystatis/find.py Updated method signatures with enhanced type annotations for run, summary, and _get_find_results.
src/pystatis/helloworld.py Modified logincheck function to handle long-running requests and changed HTTP method to POST.
src/pystatis/http_helper.py Updated parameters from dict to ParamDict in several functions and added internal function get_response.
src/pystatis/profile.py Removed unnecessary import and streamlined test_change_password function.
src/pystatis/results.py Updated method signatures and documentation for get_code, show_metadata, and _get_metadata_results.
src/pystatis/table.py Enhanced type annotations and refactored method signatures for clarity.
src/pystatis/types.py Introduced new type alias ParamDict for improved type clarity.
tests/cassettes/*.yaml Numerous updates to HTTP methods, request bodies, and headers across multiple test files to reflect changes in API interactions.
tests/conftest.py Expanded filter_headers and added new match_on criteria for VCR configuration.
tests/test_cache.py Reformatted import statements for readability.
tests/test_config.py Reformatted assertion in test_config_path for clarity.
tests/test_helloworld.py Added mock patch for requests.post in test_logincheck function.
tests/test_http_helper.py Reformatted function calls and assertions for improved readability.
tests/test_profile.py Removed unnecessary import and streamlined test_change_password function.
tests/test_table.py Updated expected shapes and column names in parameterized tests for accuracy.
tests/test_version.py Reformatted assertion in test_version for improved readability.

Poem

In the code where rabbits play,
Changes hop and dance today.
From Flake8 to Pylint's grace,
New tools join the coding race.
With every line, we refine and cheer,
A brighter path for all to steer! 🐇✨


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@pmayd
Copy link
Collaborator Author

pmayd commented Dec 3, 2024

@coderabbitai review

Copy link

coderabbitai bot commented Dec 3, 2024

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link

codecov bot commented Dec 3, 2024

Codecov Report

Attention: Patch coverage is 93.50649% with 5 lines in your changes missing coverage. Please review.

Project coverage is 81.59%. Comparing base (65c3f76) to head (1d54b14).

Files with missing lines Patch % Lines
src/pystatis/http_helper.py 86.95% 3 Missing ⚠️
src/pystatis/helloworld.py 71.42% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##              dev     #163      +/-   ##
==========================================
+ Coverage   80.93%   81.59%   +0.66%     
==========================================
  Files          11       12       +1     
  Lines         556      576      +20     
==========================================
+ Hits          450      470      +20     
  Misses        106      106              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 21

🧹 Outside diff range and nitpick comments (12)
nb/05_presentation.ipynb (1)

2761-2761: LGTM! Consider adding data validation.

The code follows good practices by using list comprehension and explicit type casting. However, consider adding validation to make it more robust against malformed data.

Here's a suggested improvement:

-    int(semester[3:7]) for semester in ratio_international.index.get_level_values(2)
+    int(semester[3:7]) if len(semester) >= 7 and semester[3:7].isdigit()
+    else None for semester in ratio_international.index.get_level_values(2)

This change will:

  1. Validate the string length before slicing
  2. Ensure the extracted substring contains only digits
  3. Return None for invalid values instead of raising an exception
src/pystatis/table.py (1)

213-213: Fix typo in comment

There's a typo in the comment on line 213: "we pohave to identify" should be "we have to identify".

nb/02_geo_visualization_int_students_germany.ipynb (4)

595-606: Consider adding error handling for semester string parsing.

While the year extraction works for well-formed semester strings, it might fail for malformed data.

Consider adding error handling:

-ratio_international["year"] = [
-    int(semester[3:7]) for semester in ratio_international.index.get_level_values(2)
-]
+def extract_year(semester: str) -> int:
+    try:
+        return int(semester[3:7])
+    except (IndexError, ValueError):
+        logging.warning(f"Malformed semester string: {semester}")
+        return None
+
+ratio_international["year"] = [
+    extract_year(semester) for semester in ratio_international.index.get_level_values(2)
+]

Also applies to: 1082-1093


688-688: Remove commented-out code.

Dead code should be removed rather than commented out. If this code is needed for reference, consider adding it to the documentation.


1156-1162: Standardize legend formatting across plots.

Legend formatting is duplicated across plots. Consider extracting the legend configuration into a constant or helper function.

LEGEND_CONFIG = {
    "title": "Region",
    "title_fontsize": "13",
    "fontsize": "12",
    "loc": "upper left",
    "bbox_to_anchor": (1, 1)
}

# Usage
ax.legend(**LEGEND_CONFIG)

Also applies to: 1220-1226


770-773: Extract colorbar creation logic.

The colorbar creation code is duplicated. Consider extracting this into a helper function.

def add_colorbar(fig: plt.Figure, axes: List[plt.Axes], 
                min_value: float, max_value: float,
                label: str = "International Student Ratio [%]") -> None:
    cm = fig.colorbar(
        plt.cm.ScalarMappable(
            norm=plt.Normalize(vmin=min_value, vmax=max_value),
            cmap="viridis"
        ),
        ax=axes[len(axes) - 1]
    )
    cm.set_label(label, fontsize=18)

Also applies to: 1467-1470

tests/cassettes/41312-01-01-4.yaml (1)

13357-13357: Consider implementing automated cassette maintenance

Manual updates to cassette timestamps can be error-prone and time-consuming. Consider implementing a strategy for automated cassette maintenance, such as:

  1. Using VCR.py's before_record hooks to standardize timestamps
  2. Creating a utility script to update cassettes systematically
  3. Implementing relative time comparisons in tests instead of absolute dates

This would reduce maintenance overhead and make the test suite more robust.

Also applies to: 13359-13359, 13412-13412, 13414-13414

tests/test_profile.py (1)

5-5: Consider grouping related imports.

The imports could be organized better by grouping related imports (standard library, third-party, local).

from configparser import RawConfigParser
import pytest

-from pystatis import config
-from pystatis.profile import change_password, remove_result
+from pystatis import config
+from pystatis.profile import (
+    change_password,
+    remove_result,
+)
from tests.test_http_helper import _generic_request_status
src/pystatis/helloworld.py (1)

56-56: Consider adding error handling for timeout scenarios.

The timeout parameter is set, but there's no specific handling for timeout exceptions. Consider adding explicit error handling:

-    response = requests.post(url, headers=headers, data=params, timeout=(1, 15))
+    try:
+        response = requests.post(url, headers=headers, data=params, timeout=(1, 15))
+    except requests.exceptions.Timeout:
+        raise TimeoutError("Login request timed out after 15 minutes")
src/pystatis/results.py (1)

95-97: Consider using a list comprehension for better readability.

The string joining logic could be more concise using list comprehensions.

-                        "\n".join(
-                            [col["Content"] for col in structure_dict["Columns"]]
-                        ),
+                        "\n".join([col["Content"] for col in structure_dict["Columns"]]),

Also applies to: 131-134

src/pystatis/http_helper.py (1)

346-348: Improve error handling readability

The condition could be simplified for better readability.

-elif (destatis_status_code in [104, 50, 90]) or (
-    destatis_status_type in error_en_de
-):
+elif destatis_status_code in [104, 50, 90] or destatis_status_type in error_en_de:
tests/cassettes/4000W-2041.yaml (1)

288-288: Consider standardizing connection handling

There's an inconsistency in connection header values (keep-alive vs close). While this works, standardizing on keep-alive could improve performance in production.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 65c3f76 and 285e625.

⛔ Files ignored due to path filters (1)
  • poetry.lock is excluded by !**/*.lock
📒 Files selected for processing (44)
  • .flake8 (0 hunks)
  • .pre-commit-config.yaml (1 hunks)
  • .pylintrc (0 hunks)
  • nb/02_geo_visualization_int_students_germany.ipynb (15 hunks)
  • nb/03_find.ipynb (5 hunks)
  • nb/04_jobs.ipynb (2 hunks)
  • nb/05_presentation.ipynb (1 hunks)
  • pyproject.toml (2 hunks)
  • src/pystatis/__init__.py (1 hunks)
  • src/pystatis/cache.py (6 hunks)
  • src/pystatis/config.py (1 hunks)
  • src/pystatis/db.py (1 hunks)
  • src/pystatis/find.py (4 hunks)
  • src/pystatis/helloworld.py (2 hunks)
  • src/pystatis/http_helper.py (14 hunks)
  • src/pystatis/profile.py (1 hunks)
  • src/pystatis/results.py (5 hunks)
  • src/pystatis/table.py (10 hunks)
  • src/pystatis/types.py (1 hunks)
  • tests/cassettes/1000A-2022.yaml (4 hunks)
  • tests/cassettes/11111-02-01-4.yaml (4 hunks)
  • tests/cassettes/12211-0001.yaml (3 hunks)
  • tests/cassettes/12211-Z-11.yaml (4 hunks)
  • tests/cassettes/13111-01-03-4.yaml (4 hunks)
  • tests/cassettes/2000S-2003.yaml (1 hunks)
  • tests/cassettes/23111-0001.yaml (4 hunks)
  • tests/cassettes/3000G-1008.yaml (8 hunks)
  • tests/cassettes/32121-01-02-4.yaml (4 hunks)
  • tests/cassettes/32161-0003.yaml (4 hunks)
  • tests/cassettes/4000W-2030.yaml (4 hunks)
  • tests/cassettes/4000W-2041.yaml (8 hunks)
  • tests/cassettes/41312-01-01-4.yaml (4 hunks)
  • tests/cassettes/46181-0001.yaml (4 hunks)
  • tests/cassettes/52111-0001.yaml (1 hunks)
  • tests/cassettes/81000-0001.yaml (1 hunks)
  • tests/cassettes/AI-N-01-2-5.yaml (4 hunks)
  • tests/conftest.py (1 hunks)
  • tests/test_cache.py (1 hunks)
  • tests/test_config.py (1 hunks)
  • tests/test_helloworld.py (1 hunks)
  • tests/test_http_helper.py (2 hunks)
  • tests/test_profile.py (2 hunks)
  • tests/test_table.py (37 hunks)
  • tests/test_version.py (1 hunks)
🔥 Files not summarized due to errors (4)
  • nb/05_presentation.ipynb: Error: Server error: no LLM provider could handle the message
  • tests/cassettes/13111-01-03-4.yaml: Error: Server error: no LLM provider could handle the message
  • tests/cassettes/32121-01-02-4.yaml: Error: Server error: no LLM provider could handle the message
  • tests/cassettes/12211-Z-11.yaml: Error: Server error: no LLM provider could handle the message
💤 Files with no reviewable changes (2)
  • .flake8
  • .pylintrc
✅ Files skipped from review due to trivial changes (8)
  • src/pystatis/types.py
  • src/pystatis/init.py
  • tests/test_version.py
  • tests/test_http_helper.py
  • tests/test_config.py
  • tests/test_cache.py
  • src/pystatis/profile.py
  • nb/03_find.ipynb
🧰 Additional context used
🪛 GitHub Check: codecov/patch
src/pystatis/http_helper.py

[warning] 77-77: src/pystatis/http_helper.py#L77
Added line #L77 was not covered by tests


[warning] 232-232: src/pystatis/http_helper.py#L232
Added line #L232 was not covered by tests


[warning] 260-260: src/pystatis/http_helper.py#L260
Added line #L260 was not covered by tests

src/pystatis/db.py

[warning] 35-35: src/pystatis/db.py#L35
Added line #L35 was not covered by tests

🔇 Additional comments (55)
tests/cassettes/13111-01-03-4.yaml (1)

8-8: LGTM: Enhanced compression support with zstd

The addition of zstd to the Accept-Encoding header is a good improvement. Zstandard (zstd) is a modern compression algorithm that offers better compression ratios and performance compared to traditional algorithms.

Also applies to: 6504-6504

tests/cassettes/12211-Z-11.yaml (1)

8-8: LGTM: Enhanced compression support with zstd

The addition of zstd to Accept-Encoding is a good improvement for performance. Just ensure that your application's dependencies support zstd compression if you plan to actually use it in production.

tests/cassettes/32121-01-02-4.yaml (2)

8-8: LGTM: Accept-Encoding header update includes modern compression

The addition of zstd (Zstandard) compression support alongside existing algorithms is a good improvement, as it's becoming increasingly common in modern web services.

Also applies to: 10722-10722


10696-10698: ⚠️ Potential issue

Consider using dynamic dates in test fixtures

The response headers contain hardcoded future dates (December 2024) which could cause test failures when these dates pass. Consider using relative dates or implementing a dynamic date generation mechanism in your test setup.

Let's check if this pattern exists in other cassette files:

Would you like me to provide an example implementation using dynamic dates for the test fixtures?

Also applies to: 10753-10755

src/pystatis/table.py (3)

6-6: No issues found with the import statement

The import of Any from the typing module is appropriate and necessary for type annotations used in the code.


29-29: Improved type annotation for self.metadata

Specifying self.metadata as dict[str, Any] enhances type clarity and is a good practice.


46-46: Explicit return type for get_data method

Adding -> None to the get_data method signature improves code readability and type checking.

nb/02_geo_visualization_int_students_germany.ipynb (2)

406-409: LGTM! Improved readability of DataFrame filtering operations.

The code has been nicely reformatted with proper line breaks and indentation, making complex boolean conditions more readable.

Also applies to: 1344-1346


1408-1413: LGTM! Well-structured DataFrame merge.

The merge operation is clear and readable with explicit parameter names.

tests/cassettes/AI-N-01-2-5.yaml (1)

8-8: LGTM: Addition of zstd compression support

The addition of zstd to Accept-Encoding is a good improvement as it's a modern, high-performance compression algorithm. However, let's verify that the codebase properly handles zstd compressed responses.

Also applies to: 14018-14018

✅ Verification successful

The previous search showed zstd in many cassettes, but we need to:

  1. Understand how it's being used in the cassettes
  2. Find the test/client code that handles these requests

This will help determine if the compression is properly handled in the codebase.


The cassettes show that python-requests/2.32.3 is being used as the HTTP client. We need to verify:

  1. How the requests session is configured
  2. If there are any custom transport adapters
  3. Whether zstd compression is handled by dependencies

This will help determine if the compression support is properly implemented.


LGTM: zstd compression support is handled by requests library

The zstd compression in the cassettes is safe because:

  1. The project uses requests 2.32.3 which natively supports zstd compression
  2. The compression handling is automatic - requests library handles content negotiation and decompression transparently
  3. No custom compression adapters are needed as this is built into the requests library

Reasoning: The pyproject.toml shows requests ^2.27.1 as a dependency, and the cassettes show requests 2.32.3 in use, which has built-in zstd support. The Accept-Encoding header changes are automatically handled by the requests library's content negotiation system, requiring no additional code changes.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify zstd compression handling in the codebase

# Check if there are any compression-related utility functions or dependencies
rg -l "(?i)compress|decompress|zstd|zstandard" --type python

# Check for zstd related dependencies in project configuration
rg -l "zstd|zstandard" --type toml --type yaml

Length of output: 1229


Script:

#!/bin/bash
# Let's check the content of a few cassettes to understand the zstd usage pattern
head -n 20 tests/cassettes/AI-N-01-2-5.yaml
head -n 20 tests/cassettes/32421-0012.yaml

# Check for any HTTP client configuration or test setup files
fd -e py -e yaml "client|config|setup|test" --exec grep -l -i "compression|zstd|requests"

Length of output: 1733


Script:

#!/bin/bash
# Let's check for the HTTP client setup and test configuration
fd -e py "test|client" --type f --exec rg -l "requests\.Session|HTTPAdapter"

# Look for requirements or dependency files
fd "requirements|setup.py|pyproject.toml" --type f --exec cat {}

Length of output: 2331

tests/cassettes/41312-01-01-4.yaml (1)

13357-13357: Verify consistent timestamp updates across all cassettes

The Date and Expires headers have been updated to December 2024. We should ensure this update is consistent across all cassette files to maintain temporal coherence in the test suite.

Let's verify the timestamp consistency:

Also applies to: 13359-13359, 13412-13412, 13414-13414

✅ Verification successful

All cassettes have been consistently updated to December 2024

The verification shows that all cassette files have been updated to use December 2024 dates. No June 2024 dates were found, and all cassettes contain properly formatted December 2024 timestamps, maintaining temporal coherence across the test suite.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for any inconsistent dates in cassette files

echo "Checking for any remaining June 2024 dates in cassettes..."
rg "Jun 2024" "tests/cassettes/"

echo "Checking for consistent December 2024 dates..."
rg "Dec 2024" "tests/cassettes/"

Length of output: 10649

tests/test_helloworld.py (1)

22-25: Verify if both GET and POST mocks are needed.

The test now mocks both requests.get and requests.post with the same return value. If the underlying logincheck function has been updated to use POST exclusively, we might be able to remove the GET mock.

✅ Verification successful

Both GET and POST mocks are required

The codebase shows that both HTTP methods are actively used:

  • whoami function uses requests.get
  • logincheck function uses requests.post

Therefore, both mocks in the test file are necessary and should be kept.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check if GET is still used in the logincheck function
ast-grep --pattern 'def logincheck($$$) {
  $$$
  requests.get($$$)
  $$$
}'

# Check if both GET and POST are used
ast-grep --pattern 'def logincheck($$$) {
  $$$
  requests.post($$$)
  $$$
}'

Length of output: 158


Script:

#!/bin/bash
# Let's try a different approach using ripgrep to search for requests.get and requests.post
rg "requests\.(get|post)" -A 2 -B 2

# Also check the test file for both mocks
rg "requests\.(get|post)" tests/test_helloworld.py

Length of output: 2154

tests/conftest.py (2)

7-7: Good security practice: Filtering sensitive headers.

Adding "username" and "password" to filtered headers improves security by preventing sensitive credentials from being recorded in test cassettes.


9-20: Well-documented VCR configuration for POST requests.

The configuration and documentation clearly explain the need for body matching due to Genesis v5's POST request requirements. The comprehensive match_on criteria ensure accurate request matching.

src/pystatis/helloworld.py (1)

47-54: LGTM! Security improvement in credential handling.

The switch from query parameters to POST with form-encoded body is a security improvement, as it prevents credentials from being logged in server access logs.

pyproject.toml (2)

15-15: LGTM! Version bump follows semantic versioning.

The version increment to 0.4.0 is appropriate for the significant tooling changes introduced in this release.


47-55: LGTM! Dependency updates look good.

The consolidation of linting tools into ruff and the updates to development dependencies are well-considered changes that will improve the development workflow.

Let's verify the compatibility of the updated dependencies:

✅ Verification successful

Dependencies are compatible and configurations are properly set up

The verification shows that:

  • The poetry.lock is in sync with pyproject.toml for the updated dependencies
  • Ruff configuration in ruff.toml is compatible with ruff 0.8.1, using standard rule sets
  • The project has properly migrated linting configurations from older tools to ruff
  • Test configurations are properly set up for the pytest 8.0.0 and pytest-cov 6.0.0
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for any known compatibility issues between the updated dependencies

# Check if poetry.lock is in sync with pyproject.toml
if [ -f "poetry.lock" ]; then
    # Extract and compare dependencies
    grep -A 1 "name = \"ruff\"" poetry.lock
    grep -A 1 "name = \"pre-commit\"" poetry.lock
    grep -A 1 "name = \"pytest-cov\"" poetry.lock
fi

Length of output: 264


Script:

#!/bin/bash
# Let's verify if there are any potential compatibility issues by:
# 1. Checking if pytest-cov is compatible with the specified pytest version
# 2. Looking for any existing pytest configuration that might need updates
# 3. Checking for any ruff configuration that might need adjustments

# Check for pytest configuration
fd -e ini -e cfg -e toml -e py . -x grep -l "pytest" {}

# Check for ruff configuration
fd -e toml -e py . -x grep -l "ruff" {}

# Look for any existing linting configurations that might need updates
fd -e ini -e toml . -x grep -l "flake8\|pylint\|black" {}

Length of output: 382


Script:

#!/bin/bash
# Let's check the content of ruff.toml and any pytest configuration in pyproject.toml
# to ensure they're compatible with the updated versions

# Check ruff configuration
cat ruff.toml

# Check pytest configuration in pyproject.toml
rg "\[tool\.pytest\]|\[pytest\]" -A 10 pyproject.toml

# Check if there are any specific linting rules that might need updates
rg "select|ignore|extend-select|extend-ignore" ruff.toml

Length of output: 431

src/pystatis/find.py (3)

4-4: LGTM!

The addition of Any from typing module supports proper type annotations in the codebase.


55-55: LGTM!

The return type annotation correctly specifies that the method doesn't return a value.


102-102: LGTM!

The Any type annotation for **kwargs correctly indicates that the method accepts arbitrary keyword arguments.

tests/cassettes/52111-0001.yaml (3)

53-57: LGTM!

The response headers have been properly updated to reflect the new ZIP format and include a timestamp in the filename.


3-3: LGTM! Verify API compatibility with older clients.

The switch to POST method with form-encoded body is a good practice for complex parameters. The addition of zstd compression support can improve performance.

Also applies to: 8-8, 17-17

✅ Verification successful

API compatibility is maintained across endpoints

Based on the search results, there is a clear pattern in how the endpoints are used:

  1. The www-genesis.destatis.de endpoint consistently uses POST method with form-encoded body (like in 52111-0001.yaml)
  2. The www.regionalstatistik.de endpoint consistently uses GET method with query parameters (like in AI-N-01-2-5.yaml)
  3. The ergebnisse.zensus2022.de endpoint consistently uses POST method (like in 4000W-2030.yaml)

This shows that each specific domain maintains its own consistent API pattern, and the changes in 52111-0001.yaml align with the established pattern for the www-genesis.destatis.de domain. The change from GET to POST is not a compatibility concern as it follows the domain's standard practice.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check if any other files still use GET method for this endpoint
rg -l "method: GET.*genesisWS/rest/2020/data/tablefile" tests/cassettes/

Length of output: 74


Script:

#!/bin/bash
# Let's first check what files exist in the cassettes directory
fd . tests/cassettes/

# Then search for any GET or POST methods in those files
rg "method: (GET|POST)" tests/cassettes/ -A 2

# Also search for the endpoint pattern to ensure we don't miss anything
rg "genesisWS/rest/2020/data/tablefile" tests/cassettes/

Length of output: 26056


21-48: Document the response format change in the API documentation.

The response format has changed from CSV to ZIP, which improves data compression but may affect clients expecting CSV format.

src/pystatis/results.py (3)

4-4: LGTM!

The addition of Any from typing module supports the enhanced type annotations in the file.


52-52: LGTM!

The method signature now properly specifies that it accepts a list of integers and returns a list of strings.


160-160: LGTM!

The return type annotation dict[str, Any] accurately describes the metadata dictionary structure.

src/pystatis/cache.py (3)

15-15: LGTM: Import of ParamDict type alias

The addition of the ParamDict import aligns with the type annotation updates throughout the file.


25-25: LGTM: Consistent type annotation updates

The type annotations for the params parameter have been consistently updated from dict to ParamDict across all functions, improving type safety and clarity.

Also applies to: 77-77, 107-107, 154-154


154-154: ⚠️ Potential issue

Fix typo in function name

The function name hit_in_cash contains a typo and should be renamed to hit_in_cache for clarity and consistency.

-def hit_in_cash(
+def hit_in_cache(

Likely invalid or redundant comment.

src/pystatis/config.py (3)

34-36: LGTM: Improved regex pattern readability

The "regio" regex pattern has been reformatted for better readability while maintaining its functionality.


38-42: LGTM: Clear version mapping for databases

The VERSION_MAPPING provides a clear association between databases and their versions, which is used in conjunction with the restructured LANG_TO_COL_MAPPING.


43-56: LGTM: Comprehensive regional key translations

The ARS_OR_AGS_MAPPING provides consistent translations for regional keys (ARS/AGS) across all supported databases in both German and English.

tests/cassettes/81000-0001.yaml (2)

3-17: LGTM: Updated request format

The request format has been properly updated to use POST method with form-encoded data, including appropriate headers for content type and length.

Also applies to: 90-104


21-70: LGTM: Updated response format

The response format has been updated to handle ZIP files instead of CSV, with appropriate content type and disposition headers.

Also applies to: 75-81

nb/04_jobs.ipynb (1)

11-19: LGTM! Documentation and code organization improvements.

The changes improve readability through:

  • Better markdown formatting for headers
  • Logical organization of imports
  • Enhanced documentation clarity about caching behavior

Also applies to: 34-36, 44-44

tests/cassettes/46181-0001.yaml (1)

3-18: LGTM! API interaction improvements with proper request/response handling.

The changes implement a more robust API interaction pattern:

  • Switched from GET to POST with proper form-encoded bodies
  • Updated response handling for ZIP format
  • Consistent headers and content types across all requests

Let's verify the consistency of these changes across other test cassettes:

Also applies to: 21-35, 40-46, 55-70, 76-82, 102-117, 120-134, 139-145, 154-169, 175-181

✅ Verification successful

API interaction patterns are consistent and properly implemented

The verification confirms proper implementation of the API interaction patterns:

  • Requests use correct Content-Type: application/x-www-form-urlencoded headers
  • Form-encoded request bodies are properly formatted with all required parameters
  • Responses alternate between binary (ZIP) data and JSON metadata as expected
  • Headers and content types are consistent across language variants (de/en)
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify consistent API interaction patterns across test cassettes

# Check for consistent HTTP method and content type headers
echo "Checking HTTP methods and content types..."
rg -A 5 "method: (GET|POST)" tests/cassettes/
rg "Content-Type: application/x-www-form-urlencoded" tests/cassettes/

# Check for consistent response format
echo "Checking response formats..."
rg "Content-Type: application/zip" tests/cassettes/

Length of output: 65945


Script:

#!/bin/bash
# Let's verify the API interaction patterns more specifically

# Check for Content-Type headers in request bodies
echo "Checking request Content-Type headers..."
rg -A 2 "Content-Type:" tests/cassettes/46181-0001.yaml

# Check for form-encoded bodies
echo "Checking request bodies..."
rg -A 1 "body: " tests/cassettes/46181-0001.yaml

# Check for response format indicators
echo "Checking response format indicators..."
rg -A 3 "response:" tests/cassettes/46181-0001.yaml

Length of output: 2850

tests/cassettes/4000W-2030.yaml (1)

3-18: LGTM! Consistent API changes with additional security measures.

The changes maintain consistency with the updated API pattern and include proper security headers:

  • Same POST method and form data structure
  • Consistent response format
  • Security headers: X-Frame-Options, X-XSS-Protection, Strict-Transport-Security

Also applies to: 87-104, 125-140, 166-170

src/pystatis/http_helper.py (1)

13-13: LGTM: Type import addition

The addition of ParamDict type import enhances type safety across the module.

tests/cassettes/3000G-1008.yaml (1)

3-18: LGTM: Updated request format

The request format has been correctly updated to:

  • Use POST method
  • Include form-encoded body
  • Set appropriate Content-Type header
tests/cassettes/32161-0003.yaml (2)

3-18: LGTM: Consistent request format

All requests have been consistently updated to use POST method with proper headers and form-encoded body, matching the changes in http_helper.py.

Also applies to: 66-81, 116-131, 178-193


86-87: Verify data range update impact

The time range has been extended from 2010-2022. Ensure that any date-dependent tests or validations are updated accordingly.

✅ Verification successful

Based on the search results, I can see that the test files contain various date ranges, including references to both 2010 and 2022. The key findings are:

  1. The test file test_table.py contains test cases that use dates like 2010 and 2022 in assertions and test data
  2. Multiple cassette files contain date ranges, with some using 2010 as a base year for index calculations
  3. The date ranges appear to be part of the test data and metadata, rather than being hard-coded test logic

Since the changes in the time range are part of the test data/cassettes and don't affect any test assertions or validation logic, I can generate the final response:

No date-dependent test updates needed

The time range extension to 2010-2022 is reflected in the test data/cassettes but does not impact any test assertions or validation logic. The test framework handles the date ranges dynamically through the cassette data.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for date-dependent tests or validations

# Search for date-related test assertions
rg -A 5 "2010|2022" tests/

Length of output: 74226

tests/cassettes/2000S-2003.yaml (3)

1-80: LGTM! Security headers are properly configured.

The request-response pair for German data retrieval is well structured with:

  • Proper Content-Type headers for form-encoded data
  • Security headers including CORS, XSS protection, and HSTS
  • Compressed ZIP response format

146-291: LGTM! English data request-response pair is consistent.

The structure matches the German request-response pair with appropriate language parameter changes.


81-145: Verify sensitive data masking in responses

The metadata response properly masks sensitive credentials with asterisks. The JSON structure and character encoding for German text are handled correctly.

✅ Verification successful

Sensitive data masking is consistently implemented

The verification confirms that username and password fields are properly masked with exactly 20 asterisks ("********************") across all API response cassettes in the test suite, maintaining consistent and secure handling of credentials.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Verify consistent masking of sensitive data
rg -A 2 '"username":"\*{20}"' tests/cassettes/

Length of output: 48


Script:

#!/bin/bash
# Search for any username/password patterns in cassettes to verify masking
rg -i "username.*password" tests/cassettes/

# Also search for asterisk patterns that may indicate masking
rg "\*{5,}" tests/cassettes/

Length of output: 40754

tests/cassettes/4000W-2041.yaml (1)

Line range hint 1-80: LGTM! Request-response structure follows established pattern.

The housing data retrieval follows the same secure request-response pattern with proper headers and compression.

tests/cassettes/12211-0001.yaml (2)

67-68: LGTM! Response sizes are consistent and reasonable.

The Content-Length headers match the actual response body sizes, and the compression ratios look appropriate for the data type.

Also applies to: 196-196, 250-250


18-18: Verify API host configuration

The requests are being made to a different host (www-genesis.destatis.de) compared to other files (ergebnisse.zensus2022.de). Ensure this is the intended configuration.

Also applies to: 96-96, 146-146, 224-224

✅ Verification successful

Different API hosts are used consistently based on data source

The API hosts are being used consistently across the test cassettes:

  • www-genesis.destatis.de - Used for GENESIS-Online tables (e.g. 12211-0001, 91111-0001)
  • www.regionalstatistik.de - Used for regional statistics tables (e.g. 86000U-Z-01, 73111-01-01-4)
  • ergebnisse.zensus2022.de - Used for census 2022 tables (e.g. 3000G-1008, 4000W-2041)

Each data source has its own dedicated API endpoint and the hosts are used consistently based on the table ID pattern. This appears to be the intended configuration.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check for API host configurations
echo "Checking API hosts in cassettes..."
rg -A 1 'uri: https://' tests/cassettes/

Length of output: 17263

tests/cassettes/23111-0001.yaml (2)

3-18: API interaction updated from GET to POST method.

The changes reflect a more RESTful approach by:

  • Using POST method for data retrieval
  • Including parameters in the request body instead of URL
  • Adding proper Content-Type header for form data

Also applies to: 84-99, 134-149


69-69: Response format changed from CSV to ZIP.

The API now returns compressed ZIP files instead of plain CSV, which is more efficient for large datasets. This change is consistently applied across all responses.

Also applies to: 73-73, 198-198, 202-202

tests/test_table.py (3)

14-26: Test data shapes updated to match actual API responses.

The expected shapes for both German and English tables have been updated to reflect the current data structure. For example:

  • "23111-0001": (264, 13)
  • "32161-0003": (70, 17)
    This ensures tests accurately validate the API responses.

Also applies to: 41-53


169-174: Column names updated for better clarity and internationalization.

The column names have been updated to:

  • Use proper spacing and special characters
  • Include correct unit suffixes
  • Support both German and English localization

Also applies to: 517-522


60-62: Function signatures reformatted for better readability.

The test function signatures have been split across multiple lines with proper type hints, improving code readability and maintainability.

Also applies to: 729-733

tests/cassettes/1000A-2022.yaml (1)

Line range hint 1-354: Changes consistent with the updated API interaction pattern.

The modifications follow the same pattern as seen in 23111-0001.yaml:

  • POST method with form data
  • ZIP response format
  • Updated headers
tests/cassettes/11111-02-01-4.yaml (1)

8-8: LGTM: Test cassette updated with zstd compression support

The changes appropriately add zstd compression support to the Accept-Encoding headers and update the timestamp metadata. These modifications maintain the test's functionality while expanding compression options.

Also applies to: 834-834, 836-836, 860-860, 889-889, 891-891

src/pystatis/db.py (1)

35-37: LGTM: Improved error message formatting

The multi-line formatting improves readability while maintaining the same error message content.

🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 35-35: src/pystatis/db.py#L35
Added line #L35 was not covered by tests

tests/cassettes/13111-01-03-4.yaml Show resolved Hide resolved
tests/cassettes/12211-Z-11.yaml Show resolved Hide resolved
src/pystatis/table.py Show resolved Hide resolved
src/pystatis/table.py Show resolved Hide resolved
src/pystatis/table.py Show resolved Hide resolved
.pre-commit-config.yaml Show resolved Hide resolved
src/pystatis/helloworld.py Show resolved Hide resolved
src/pystatis/http_helper.py Show resolved Hide resolved
src/pystatis/http_helper.py Show resolved Hide resolved
src/pystatis/db.py Show resolved Hide resolved
@pmayd pmayd merged commit 85b09ac into dev Dec 4, 2024
1 check passed
@pmayd pmayd deleted the release/v0.4.0 branch December 4, 2024 19:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant