-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Release/v0.4.0 #163
Release/v0.4.0 #163
Conversation
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the WalkthroughThe changes in this pull request include the removal of configuration files for Flake8 and Pylint, updates to the pre-commit configuration to replace certain repositories with new ones, and modifications to various Jupyter notebooks for improved readability. Additionally, the project version has been incremented, and dependencies have been updated or replaced in the Changes
Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## dev #163 +/- ##
==========================================
+ Coverage 80.93% 81.59% +0.66%
==========================================
Files 11 12 +1
Lines 556 576 +20
==========================================
+ Hits 450 470 +20
Misses 106 106 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 21
🧹 Outside diff range and nitpick comments (12)
nb/05_presentation.ipynb (1)
2761-2761
: LGTM! Consider adding data validation.The code follows good practices by using list comprehension and explicit type casting. However, consider adding validation to make it more robust against malformed data.
Here's a suggested improvement:
- int(semester[3:7]) for semester in ratio_international.index.get_level_values(2) + int(semester[3:7]) if len(semester) >= 7 and semester[3:7].isdigit() + else None for semester in ratio_international.index.get_level_values(2)This change will:
- Validate the string length before slicing
- Ensure the extracted substring contains only digits
- Return None for invalid values instead of raising an exception
src/pystatis/table.py (1)
213-213
: Fix typo in commentThere's a typo in the comment on line 213: "we pohave to identify" should be "we have to identify".
nb/02_geo_visualization_int_students_germany.ipynb (4)
595-606
: Consider adding error handling for semester string parsing.While the year extraction works for well-formed semester strings, it might fail for malformed data.
Consider adding error handling:
-ratio_international["year"] = [ - int(semester[3:7]) for semester in ratio_international.index.get_level_values(2) -] +def extract_year(semester: str) -> int: + try: + return int(semester[3:7]) + except (IndexError, ValueError): + logging.warning(f"Malformed semester string: {semester}") + return None + +ratio_international["year"] = [ + extract_year(semester) for semester in ratio_international.index.get_level_values(2) +]Also applies to: 1082-1093
688-688
: Remove commented-out code.Dead code should be removed rather than commented out. If this code is needed for reference, consider adding it to the documentation.
1156-1162
: Standardize legend formatting across plots.Legend formatting is duplicated across plots. Consider extracting the legend configuration into a constant or helper function.
LEGEND_CONFIG = { "title": "Region", "title_fontsize": "13", "fontsize": "12", "loc": "upper left", "bbox_to_anchor": (1, 1) } # Usage ax.legend(**LEGEND_CONFIG)Also applies to: 1220-1226
770-773
: Extract colorbar creation logic.The colorbar creation code is duplicated. Consider extracting this into a helper function.
def add_colorbar(fig: plt.Figure, axes: List[plt.Axes], min_value: float, max_value: float, label: str = "International Student Ratio [%]") -> None: cm = fig.colorbar( plt.cm.ScalarMappable( norm=plt.Normalize(vmin=min_value, vmax=max_value), cmap="viridis" ), ax=axes[len(axes) - 1] ) cm.set_label(label, fontsize=18)Also applies to: 1467-1470
tests/cassettes/41312-01-01-4.yaml (1)
13357-13357
: Consider implementing automated cassette maintenanceManual updates to cassette timestamps can be error-prone and time-consuming. Consider implementing a strategy for automated cassette maintenance, such as:
- Using VCR.py's
before_record
hooks to standardize timestamps- Creating a utility script to update cassettes systematically
- Implementing relative time comparisons in tests instead of absolute dates
This would reduce maintenance overhead and make the test suite more robust.
Also applies to: 13359-13359, 13412-13412, 13414-13414
tests/test_profile.py (1)
5-5
: Consider grouping related imports.The imports could be organized better by grouping related imports (standard library, third-party, local).
from configparser import RawConfigParser import pytest -from pystatis import config -from pystatis.profile import change_password, remove_result +from pystatis import config +from pystatis.profile import ( + change_password, + remove_result, +) from tests.test_http_helper import _generic_request_statussrc/pystatis/helloworld.py (1)
56-56
: Consider adding error handling for timeout scenarios.The timeout parameter is set, but there's no specific handling for timeout exceptions. Consider adding explicit error handling:
- response = requests.post(url, headers=headers, data=params, timeout=(1, 15)) + try: + response = requests.post(url, headers=headers, data=params, timeout=(1, 15)) + except requests.exceptions.Timeout: + raise TimeoutError("Login request timed out after 15 minutes")src/pystatis/results.py (1)
95-97
: Consider using a list comprehension for better readability.The string joining logic could be more concise using list comprehensions.
- "\n".join( - [col["Content"] for col in structure_dict["Columns"]] - ), + "\n".join([col["Content"] for col in structure_dict["Columns"]]),Also applies to: 131-134
src/pystatis/http_helper.py (1)
346-348
: Improve error handling readabilityThe condition could be simplified for better readability.
-elif (destatis_status_code in [104, 50, 90]) or ( - destatis_status_type in error_en_de -): +elif destatis_status_code in [104, 50, 90] or destatis_status_type in error_en_de:tests/cassettes/4000W-2041.yaml (1)
288-288
: Consider standardizing connection handlingThere's an inconsistency in connection header values (keep-alive vs close). While this works, standardizing on keep-alive could improve performance in production.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
⛔ Files ignored due to path filters (1)
poetry.lock
is excluded by!**/*.lock
📒 Files selected for processing (44)
.flake8
(0 hunks).pre-commit-config.yaml
(1 hunks).pylintrc
(0 hunks)nb/02_geo_visualization_int_students_germany.ipynb
(15 hunks)nb/03_find.ipynb
(5 hunks)nb/04_jobs.ipynb
(2 hunks)nb/05_presentation.ipynb
(1 hunks)pyproject.toml
(2 hunks)src/pystatis/__init__.py
(1 hunks)src/pystatis/cache.py
(6 hunks)src/pystatis/config.py
(1 hunks)src/pystatis/db.py
(1 hunks)src/pystatis/find.py
(4 hunks)src/pystatis/helloworld.py
(2 hunks)src/pystatis/http_helper.py
(14 hunks)src/pystatis/profile.py
(1 hunks)src/pystatis/results.py
(5 hunks)src/pystatis/table.py
(10 hunks)src/pystatis/types.py
(1 hunks)tests/cassettes/1000A-2022.yaml
(4 hunks)tests/cassettes/11111-02-01-4.yaml
(4 hunks)tests/cassettes/12211-0001.yaml
(3 hunks)tests/cassettes/12211-Z-11.yaml
(4 hunks)tests/cassettes/13111-01-03-4.yaml
(4 hunks)tests/cassettes/2000S-2003.yaml
(1 hunks)tests/cassettes/23111-0001.yaml
(4 hunks)tests/cassettes/3000G-1008.yaml
(8 hunks)tests/cassettes/32121-01-02-4.yaml
(4 hunks)tests/cassettes/32161-0003.yaml
(4 hunks)tests/cassettes/4000W-2030.yaml
(4 hunks)tests/cassettes/4000W-2041.yaml
(8 hunks)tests/cassettes/41312-01-01-4.yaml
(4 hunks)tests/cassettes/46181-0001.yaml
(4 hunks)tests/cassettes/52111-0001.yaml
(1 hunks)tests/cassettes/81000-0001.yaml
(1 hunks)tests/cassettes/AI-N-01-2-5.yaml
(4 hunks)tests/conftest.py
(1 hunks)tests/test_cache.py
(1 hunks)tests/test_config.py
(1 hunks)tests/test_helloworld.py
(1 hunks)tests/test_http_helper.py
(2 hunks)tests/test_profile.py
(2 hunks)tests/test_table.py
(37 hunks)tests/test_version.py
(1 hunks)
🔥 Files not summarized due to errors (4)
- nb/05_presentation.ipynb: Error: Server error: no LLM provider could handle the message
- tests/cassettes/13111-01-03-4.yaml: Error: Server error: no LLM provider could handle the message
- tests/cassettes/32121-01-02-4.yaml: Error: Server error: no LLM provider could handle the message
- tests/cassettes/12211-Z-11.yaml: Error: Server error: no LLM provider could handle the message
💤 Files with no reviewable changes (2)
- .flake8
- .pylintrc
✅ Files skipped from review due to trivial changes (8)
- src/pystatis/types.py
- src/pystatis/init.py
- tests/test_version.py
- tests/test_http_helper.py
- tests/test_config.py
- tests/test_cache.py
- src/pystatis/profile.py
- nb/03_find.ipynb
🧰 Additional context used
🪛 GitHub Check: codecov/patch
src/pystatis/http_helper.py
[warning] 77-77: src/pystatis/http_helper.py#L77
Added line #L77 was not covered by tests
[warning] 232-232: src/pystatis/http_helper.py#L232
Added line #L232 was not covered by tests
[warning] 260-260: src/pystatis/http_helper.py#L260
Added line #L260 was not covered by tests
src/pystatis/db.py
[warning] 35-35: src/pystatis/db.py#L35
Added line #L35 was not covered by tests
🔇 Additional comments (55)
tests/cassettes/13111-01-03-4.yaml (1)
8-8
: LGTM: Enhanced compression support with zstd
The addition of zstd
to the Accept-Encoding header is a good improvement. Zstandard (zstd) is a modern compression algorithm that offers better compression ratios and performance compared to traditional algorithms.
Also applies to: 6504-6504
tests/cassettes/12211-Z-11.yaml (1)
8-8
: LGTM: Enhanced compression support with zstd
The addition of zstd to Accept-Encoding is a good improvement for performance. Just ensure that your application's dependencies support zstd compression if you plan to actually use it in production.
tests/cassettes/32121-01-02-4.yaml (2)
8-8
: LGTM: Accept-Encoding header update includes modern compression
The addition of zstd
(Zstandard) compression support alongside existing algorithms is a good improvement, as it's becoming increasingly common in modern web services.
Also applies to: 10722-10722
10696-10698
:
Consider using dynamic dates in test fixtures
The response headers contain hardcoded future dates (December 2024) which could cause test failures when these dates pass. Consider using relative dates or implementing a dynamic date generation mechanism in your test setup.
Let's check if this pattern exists in other cassette files:
Would you like me to provide an example implementation using dynamic dates for the test fixtures?
Also applies to: 10753-10755
src/pystatis/table.py (3)
6-6
: No issues found with the import statement
The import of Any
from the typing
module is appropriate and necessary for type annotations used in the code.
29-29
: Improved type annotation for self.metadata
Specifying self.metadata
as dict[str, Any]
enhances type clarity and is a good practice.
46-46
: Explicit return type for get_data
method
Adding -> None
to the get_data
method signature improves code readability and type checking.
nb/02_geo_visualization_int_students_germany.ipynb (2)
406-409
: LGTM! Improved readability of DataFrame filtering operations.
The code has been nicely reformatted with proper line breaks and indentation, making complex boolean conditions more readable.
Also applies to: 1344-1346
1408-1413
: LGTM! Well-structured DataFrame merge.
The merge operation is clear and readable with explicit parameter names.
tests/cassettes/AI-N-01-2-5.yaml (1)
8-8
: LGTM: Addition of zstd compression support
The addition of zstd
to Accept-Encoding is a good improvement as it's a modern, high-performance compression algorithm. However, let's verify that the codebase properly handles zstd compressed responses.
Also applies to: 14018-14018
✅ Verification successful
The previous search showed zstd in many cassettes, but we need to:
- Understand how it's being used in the cassettes
- Find the test/client code that handles these requests
This will help determine if the compression is properly handled in the codebase.
The cassettes show that python-requests/2.32.3
is being used as the HTTP client. We need to verify:
- How the requests session is configured
- If there are any custom transport adapters
- Whether zstd compression is handled by dependencies
This will help determine if the compression support is properly implemented.
LGTM: zstd compression support is handled by requests library
The zstd compression in the cassettes is safe because:
- The project uses requests 2.32.3 which natively supports zstd compression
- The compression handling is automatic - requests library handles content negotiation and decompression transparently
- No custom compression adapters are needed as this is built into the requests library
Reasoning: The pyproject.toml shows requests ^2.27.1 as a dependency, and the cassettes show requests 2.32.3 in use, which has built-in zstd support. The Accept-Encoding header changes are automatically handled by the requests library's content negotiation system, requiring no additional code changes.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify zstd compression handling in the codebase
# Check if there are any compression-related utility functions or dependencies
rg -l "(?i)compress|decompress|zstd|zstandard" --type python
# Check for zstd related dependencies in project configuration
rg -l "zstd|zstandard" --type toml --type yaml
Length of output: 1229
Script:
#!/bin/bash
# Let's check the content of a few cassettes to understand the zstd usage pattern
head -n 20 tests/cassettes/AI-N-01-2-5.yaml
head -n 20 tests/cassettes/32421-0012.yaml
# Check for any HTTP client configuration or test setup files
fd -e py -e yaml "client|config|setup|test" --exec grep -l -i "compression|zstd|requests"
Length of output: 1733
Script:
#!/bin/bash
# Let's check for the HTTP client setup and test configuration
fd -e py "test|client" --type f --exec rg -l "requests\.Session|HTTPAdapter"
# Look for requirements or dependency files
fd "requirements|setup.py|pyproject.toml" --type f --exec cat {}
Length of output: 2331
tests/cassettes/41312-01-01-4.yaml (1)
13357-13357
: Verify consistent timestamp updates across all cassettes
The Date and Expires headers have been updated to December 2024. We should ensure this update is consistent across all cassette files to maintain temporal coherence in the test suite.
Let's verify the timestamp consistency:
Also applies to: 13359-13359, 13412-13412, 13414-13414
✅ Verification successful
All cassettes have been consistently updated to December 2024
The verification shows that all cassette files have been updated to use December 2024 dates. No June 2024 dates were found, and all cassettes contain properly formatted December 2024 timestamps, maintaining temporal coherence across the test suite.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check for any inconsistent dates in cassette files
echo "Checking for any remaining June 2024 dates in cassettes..."
rg "Jun 2024" "tests/cassettes/"
echo "Checking for consistent December 2024 dates..."
rg "Dec 2024" "tests/cassettes/"
Length of output: 10649
tests/test_helloworld.py (1)
22-25
: Verify if both GET and POST mocks are needed.
The test now mocks both requests.get
and requests.post
with the same return value. If the underlying logincheck
function has been updated to use POST exclusively, we might be able to remove the GET mock.
✅ Verification successful
Both GET and POST mocks are required
The codebase shows that both HTTP methods are actively used:
whoami
function usesrequests.get
logincheck
function usesrequests.post
Therefore, both mocks in the test file are necessary and should be kept.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Check if GET is still used in the logincheck function
ast-grep --pattern 'def logincheck($$$) {
$$$
requests.get($$$)
$$$
}'
# Check if both GET and POST are used
ast-grep --pattern 'def logincheck($$$) {
$$$
requests.post($$$)
$$$
}'
Length of output: 158
Script:
#!/bin/bash
# Let's try a different approach using ripgrep to search for requests.get and requests.post
rg "requests\.(get|post)" -A 2 -B 2
# Also check the test file for both mocks
rg "requests\.(get|post)" tests/test_helloworld.py
Length of output: 2154
tests/conftest.py (2)
7-7
: Good security practice: Filtering sensitive headers.
Adding "username" and "password" to filtered headers improves security by preventing sensitive credentials from being recorded in test cassettes.
9-20
: Well-documented VCR configuration for POST requests.
The configuration and documentation clearly explain the need for body matching due to Genesis v5's POST request requirements. The comprehensive match_on criteria ensure accurate request matching.
src/pystatis/helloworld.py (1)
47-54
: LGTM! Security improvement in credential handling.
The switch from query parameters to POST with form-encoded body is a security improvement, as it prevents credentials from being logged in server access logs.
pyproject.toml (2)
15-15
: LGTM! Version bump follows semantic versioning.
The version increment to 0.4.0 is appropriate for the significant tooling changes introduced in this release.
47-55
: LGTM! Dependency updates look good.
The consolidation of linting tools into ruff and the updates to development dependencies are well-considered changes that will improve the development workflow.
Let's verify the compatibility of the updated dependencies:
✅ Verification successful
Dependencies are compatible and configurations are properly set up
The verification shows that:
- The poetry.lock is in sync with pyproject.toml for the updated dependencies
- Ruff configuration in ruff.toml is compatible with ruff 0.8.1, using standard rule sets
- The project has properly migrated linting configurations from older tools to ruff
- Test configurations are properly set up for the pytest 8.0.0 and pytest-cov 6.0.0
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check for any known compatibility issues between the updated dependencies
# Check if poetry.lock is in sync with pyproject.toml
if [ -f "poetry.lock" ]; then
# Extract and compare dependencies
grep -A 1 "name = \"ruff\"" poetry.lock
grep -A 1 "name = \"pre-commit\"" poetry.lock
grep -A 1 "name = \"pytest-cov\"" poetry.lock
fi
Length of output: 264
Script:
#!/bin/bash
# Let's verify if there are any potential compatibility issues by:
# 1. Checking if pytest-cov is compatible with the specified pytest version
# 2. Looking for any existing pytest configuration that might need updates
# 3. Checking for any ruff configuration that might need adjustments
# Check for pytest configuration
fd -e ini -e cfg -e toml -e py . -x grep -l "pytest" {}
# Check for ruff configuration
fd -e toml -e py . -x grep -l "ruff" {}
# Look for any existing linting configurations that might need updates
fd -e ini -e toml . -x grep -l "flake8\|pylint\|black" {}
Length of output: 382
Script:
#!/bin/bash
# Let's check the content of ruff.toml and any pytest configuration in pyproject.toml
# to ensure they're compatible with the updated versions
# Check ruff configuration
cat ruff.toml
# Check pytest configuration in pyproject.toml
rg "\[tool\.pytest\]|\[pytest\]" -A 10 pyproject.toml
# Check if there are any specific linting rules that might need updates
rg "select|ignore|extend-select|extend-ignore" ruff.toml
Length of output: 431
src/pystatis/find.py (3)
4-4
: LGTM!
The addition of Any
from typing module supports proper type annotations in the codebase.
55-55
: LGTM!
The return type annotation correctly specifies that the method doesn't return a value.
102-102
: LGTM!
The Any
type annotation for **kwargs
correctly indicates that the method accepts arbitrary keyword arguments.
tests/cassettes/52111-0001.yaml (3)
53-57
: LGTM!
The response headers have been properly updated to reflect the new ZIP format and include a timestamp in the filename.
3-3
: LGTM! Verify API compatibility with older clients.
The switch to POST method with form-encoded body is a good practice for complex parameters. The addition of zstd compression support can improve performance.
Also applies to: 8-8, 17-17
✅ Verification successful
API compatibility is maintained across endpoints
Based on the search results, there is a clear pattern in how the endpoints are used:
- The
www-genesis.destatis.de
endpoint consistently uses POST method with form-encoded body (like in 52111-0001.yaml) - The
www.regionalstatistik.de
endpoint consistently uses GET method with query parameters (like in AI-N-01-2-5.yaml) - The
ergebnisse.zensus2022.de
endpoint consistently uses POST method (like in 4000W-2030.yaml)
This shows that each specific domain maintains its own consistent API pattern, and the changes in 52111-0001.yaml align with the established pattern for the www-genesis.destatis.de domain. The change from GET to POST is not a compatibility concern as it follows the domain's standard practice.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Check if any other files still use GET method for this endpoint
rg -l "method: GET.*genesisWS/rest/2020/data/tablefile" tests/cassettes/
Length of output: 74
Script:
#!/bin/bash
# Let's first check what files exist in the cassettes directory
fd . tests/cassettes/
# Then search for any GET or POST methods in those files
rg "method: (GET|POST)" tests/cassettes/ -A 2
# Also search for the endpoint pattern to ensure we don't miss anything
rg "genesisWS/rest/2020/data/tablefile" tests/cassettes/
Length of output: 26056
21-48
: Document the response format change in the API documentation.
The response format has changed from CSV to ZIP, which improves data compression but may affect clients expecting CSV format.
src/pystatis/results.py (3)
4-4
: LGTM!
The addition of Any
from typing module supports the enhanced type annotations in the file.
52-52
: LGTM!
The method signature now properly specifies that it accepts a list of integers and returns a list of strings.
160-160
: LGTM!
The return type annotation dict[str, Any]
accurately describes the metadata dictionary structure.
src/pystatis/cache.py (3)
15-15
: LGTM: Import of ParamDict type alias
The addition of the ParamDict
import aligns with the type annotation updates throughout the file.
25-25
: LGTM: Consistent type annotation updates
The type annotations for the params
parameter have been consistently updated from dict
to ParamDict
across all functions, improving type safety and clarity.
Also applies to: 77-77, 107-107, 154-154
154-154
:
Fix typo in function name
The function name hit_in_cash
contains a typo and should be renamed to hit_in_cache
for clarity and consistency.
-def hit_in_cash(
+def hit_in_cache(
Likely invalid or redundant comment.
src/pystatis/config.py (3)
34-36
: LGTM: Improved regex pattern readability
The "regio" regex pattern has been reformatted for better readability while maintaining its functionality.
38-42
: LGTM: Clear version mapping for databases
The VERSION_MAPPING
provides a clear association between databases and their versions, which is used in conjunction with the restructured LANG_TO_COL_MAPPING
.
43-56
: LGTM: Comprehensive regional key translations
The ARS_OR_AGS_MAPPING
provides consistent translations for regional keys (ARS/AGS) across all supported databases in both German and English.
tests/cassettes/81000-0001.yaml (2)
3-17
: LGTM: Updated request format
The request format has been properly updated to use POST method with form-encoded data, including appropriate headers for content type and length.
Also applies to: 90-104
21-70
: LGTM: Updated response format
The response format has been updated to handle ZIP files instead of CSV, with appropriate content type and disposition headers.
Also applies to: 75-81
nb/04_jobs.ipynb (1)
11-19
: LGTM! Documentation and code organization improvements.
The changes improve readability through:
- Better markdown formatting for headers
- Logical organization of imports
- Enhanced documentation clarity about caching behavior
Also applies to: 34-36, 44-44
tests/cassettes/46181-0001.yaml (1)
3-18
: LGTM! API interaction improvements with proper request/response handling.
The changes implement a more robust API interaction pattern:
- Switched from GET to POST with proper form-encoded bodies
- Updated response handling for ZIP format
- Consistent headers and content types across all requests
Let's verify the consistency of these changes across other test cassettes:
Also applies to: 21-35, 40-46, 55-70, 76-82, 102-117, 120-134, 139-145, 154-169, 175-181
✅ Verification successful
API interaction patterns are consistent and properly implemented
The verification confirms proper implementation of the API interaction patterns:
- Requests use correct
Content-Type: application/x-www-form-urlencoded
headers - Form-encoded request bodies are properly formatted with all required parameters
- Responses alternate between binary (ZIP) data and JSON metadata as expected
- Headers and content types are consistent across language variants (de/en)
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify consistent API interaction patterns across test cassettes
# Check for consistent HTTP method and content type headers
echo "Checking HTTP methods and content types..."
rg -A 5 "method: (GET|POST)" tests/cassettes/
rg "Content-Type: application/x-www-form-urlencoded" tests/cassettes/
# Check for consistent response format
echo "Checking response formats..."
rg "Content-Type: application/zip" tests/cassettes/
Length of output: 65945
Script:
#!/bin/bash
# Let's verify the API interaction patterns more specifically
# Check for Content-Type headers in request bodies
echo "Checking request Content-Type headers..."
rg -A 2 "Content-Type:" tests/cassettes/46181-0001.yaml
# Check for form-encoded bodies
echo "Checking request bodies..."
rg -A 1 "body: " tests/cassettes/46181-0001.yaml
# Check for response format indicators
echo "Checking response format indicators..."
rg -A 3 "response:" tests/cassettes/46181-0001.yaml
Length of output: 2850
tests/cassettes/4000W-2030.yaml (1)
3-18
: LGTM! Consistent API changes with additional security measures.
The changes maintain consistency with the updated API pattern and include proper security headers:
- Same POST method and form data structure
- Consistent response format
- Security headers: X-Frame-Options, X-XSS-Protection, Strict-Transport-Security
Also applies to: 87-104, 125-140, 166-170
src/pystatis/http_helper.py (1)
13-13
: LGTM: Type import addition
The addition of ParamDict
type import enhances type safety across the module.
tests/cassettes/3000G-1008.yaml (1)
3-18
: LGTM: Updated request format
The request format has been correctly updated to:
- Use POST method
- Include form-encoded body
- Set appropriate Content-Type header
tests/cassettes/32161-0003.yaml (2)
3-18
: LGTM: Consistent request format
All requests have been consistently updated to use POST method with proper headers and form-encoded body, matching the changes in http_helper.py.
Also applies to: 66-81, 116-131, 178-193
86-87
: Verify data range update impact
The time range has been extended from 2010-2022. Ensure that any date-dependent tests or validations are updated accordingly.
✅ Verification successful
Based on the search results, I can see that the test files contain various date ranges, including references to both 2010 and 2022. The key findings are:
- The test file
test_table.py
contains test cases that use dates like 2010 and 2022 in assertions and test data - Multiple cassette files contain date ranges, with some using 2010 as a base year for index calculations
- The date ranges appear to be part of the test data and metadata, rather than being hard-coded test logic
Since the changes in the time range are part of the test data/cassettes and don't affect any test assertions or validation logic, I can generate the final response:
No date-dependent test updates needed
The time range extension to 2010-2022 is reflected in the test data/cassettes but does not impact any test assertions or validation logic. The test framework handles the date ranges dynamically through the cassette data.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check for date-dependent tests or validations
# Search for date-related test assertions
rg -A 5 "2010|2022" tests/
Length of output: 74226
tests/cassettes/2000S-2003.yaml (3)
1-80
: LGTM! Security headers are properly configured.
The request-response pair for German data retrieval is well structured with:
- Proper Content-Type headers for form-encoded data
- Security headers including CORS, XSS protection, and HSTS
- Compressed ZIP response format
146-291
: LGTM! English data request-response pair is consistent.
The structure matches the German request-response pair with appropriate language parameter changes.
81-145
: Verify sensitive data masking in responses
The metadata response properly masks sensitive credentials with asterisks. The JSON structure and character encoding for German text are handled correctly.
✅ Verification successful
Sensitive data masking is consistently implemented
The verification confirms that username and password fields are properly masked with exactly 20 asterisks ("********************") across all API response cassettes in the test suite, maintaining consistent and secure handling of credentials.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Verify consistent masking of sensitive data
rg -A 2 '"username":"\*{20}"' tests/cassettes/
Length of output: 48
Script:
#!/bin/bash
# Search for any username/password patterns in cassettes to verify masking
rg -i "username.*password" tests/cassettes/
# Also search for asterisk patterns that may indicate masking
rg "\*{5,}" tests/cassettes/
Length of output: 40754
tests/cassettes/4000W-2041.yaml (1)
Line range hint 1-80
: LGTM! Request-response structure follows established pattern.
The housing data retrieval follows the same secure request-response pattern with proper headers and compression.
tests/cassettes/12211-0001.yaml (2)
67-68
: LGTM! Response sizes are consistent and reasonable.
The Content-Length headers match the actual response body sizes, and the compression ratios look appropriate for the data type.
Also applies to: 196-196, 250-250
18-18
: Verify API host configuration
The requests are being made to a different host (www-genesis.destatis.de) compared to other files (ergebnisse.zensus2022.de). Ensure this is the intended configuration.
Also applies to: 96-96, 146-146, 224-224
✅ Verification successful
Different API hosts are used consistently based on data source
The API hosts are being used consistently across the test cassettes:
www-genesis.destatis.de
- Used for GENESIS-Online tables (e.g. 12211-0001, 91111-0001)www.regionalstatistik.de
- Used for regional statistics tables (e.g. 86000U-Z-01, 73111-01-01-4)ergebnisse.zensus2022.de
- Used for census 2022 tables (e.g. 3000G-1008, 4000W-2041)
Each data source has its own dedicated API endpoint and the hosts are used consistently based on the table ID pattern. This appears to be the intended configuration.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Check for API host configurations
echo "Checking API hosts in cassettes..."
rg -A 1 'uri: https://' tests/cassettes/
Length of output: 17263
tests/cassettes/23111-0001.yaml (2)
3-18
: API interaction updated from GET to POST method.
The changes reflect a more RESTful approach by:
- Using POST method for data retrieval
- Including parameters in the request body instead of URL
- Adding proper Content-Type header for form data
Also applies to: 84-99, 134-149
69-69
: Response format changed from CSV to ZIP.
The API now returns compressed ZIP files instead of plain CSV, which is more efficient for large datasets. This change is consistently applied across all responses.
Also applies to: 73-73, 198-198, 202-202
tests/test_table.py (3)
14-26
: Test data shapes updated to match actual API responses.
The expected shapes for both German and English tables have been updated to reflect the current data structure. For example:
- "23111-0001": (264, 13)
- "32161-0003": (70, 17)
This ensures tests accurately validate the API responses.
Also applies to: 41-53
169-174
: Column names updated for better clarity and internationalization.
The column names have been updated to:
- Use proper spacing and special characters
- Include correct unit suffixes
- Support both German and English localization
Also applies to: 517-522
60-62
: Function signatures reformatted for better readability.
The test function signatures have been split across multiple lines with proper type hints, improving code readability and maintainability.
Also applies to: 729-733
tests/cassettes/1000A-2022.yaml (1)
Line range hint 1-354
: Changes consistent with the updated API interaction pattern.
The modifications follow the same pattern as seen in 23111-0001.yaml:
- POST method with form data
- ZIP response format
- Updated headers
tests/cassettes/11111-02-01-4.yaml (1)
8-8
: LGTM: Test cassette updated with zstd compression support
The changes appropriately add zstd compression support to the Accept-Encoding headers and update the timestamp metadata. These modifications maintain the test's functionality while expanding compression options.
Also applies to: 834-834, 836-836, 860-860, 889-889, 891-891
src/pystatis/db.py (1)
35-37
: LGTM: Improved error message formatting
The multi-line formatting improves readability while maintaining the same error message content.
🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 35-35: src/pystatis/db.py#L35
Added line #L35 was not covered by tests
Summary by CodeRabbit
ParamDict
for improved type clarity.logincheck
function to handle long-running requests and modified request structure.0.3.3
to0.4.0
and modified dependencies inpyproject.toml
.