Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent Behavior in Pattern Matching String #1367

Closed
AvlWx2014 opened this issue Jul 12, 2024 · 1 comment · Fixed by #1368
Closed

Inconsistent Behavior in Pattern Matching String #1367

AvlWx2014 opened this issue Jul 12, 2024 · 1 comment · Fixed by #1368
Assignees

Comments

@AvlWx2014
Copy link

AvlWx2014 commented Jul 12, 2024

Issue Description

On Pydantic 2.8.2 with Pydantic-core 2.20.0 I've noticed some inconsistent behavior validating strings using a pattern where passing the pattern as a string allows invalid inputs to pass validation, while passing the pattern as a compiled Pattern object from re.compile exhibits the expected behavior and rejects the invalid input.

Example:

import re

from pydantic import BaseModel, Field

# Note: this regular expression is based on the lowercase RFC1123 subdomain name regular expression
# the Kubernetes project uses to validate the names of Secrets and other resources.
COMPILED = re.compile(r"[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*")
NOT_COMPILED = r"[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*"

class NotCompiled(BaseModel):
    name: str = Field(pattern=NOT_COMPILED)


class Compiled(BaseModel):
    name: str = Field(pattern=COMPILED)

model = NotCompiled.model_validate({"name": "ShouldntPass"})
print(repr(model))
# NotCompiled(name='ShouldntPass')
model = Compiled.model_validate({"name": "ShouldntPass"})
print(repr(model))  # unreachable, previous line raises ValidationError as expected

This can be reproduced by adding the following two test cases to tests/validators/test_string.py:

        (
            {'pattern': r'[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*'},
            'ShouldntPass',
            'ShouldntPass',
        ),
        (
            {'pattern': re.compile(r'[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*')},
            'ShouldntPass',
            Err(
                "String should match pattern '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*' [type=string_pattern_mismatch"
            ),
        ),

Environment Info

Here is some information on the environment where I noticed the issue:

$ pdm info --env
{
  "implementation_name": "cpython",
  "implementation_version": "3.9.18",
  "os_name": "posix",
  "platform_machine": "x86_64",
  "platform_release": "6.9.7-100.fc39.x86_64",
  "platform_system": "Linux",
  "platform_version": "#1 SMP PREEMPT_DYNAMIC Thu Jun 27 18:06:32 UTC 2024",
  "python_full_version": "3.9.18",
  "platform_python_implementation": "CPython",
  "python_version": "3.9",
  "sys_platform": "linux"
}
$ pdm show pydantic
Name:                  pydantic                                                                                                                                      
Latest version:        2.8.2                                                                                                                                         
Latest stable version: 2.8.2                                                                                                                                         
Installed version:     2.8.2                                                                                                                                         
Summary:               Data validation using Python type hints                                                                                                       
Requires Python:       >=3.8                                                                                                                                         
... # Truncated for brevity                                                                                                                                                        
$ pdm show pydantic-core
Name:                  pydantic_core                                               
Latest version:        2.20.1                                                      
Latest stable version: 2.20.1                                                      
Installed version:     2.20.1                                                      
Summary:               Core functionality for Pydantic validation and serialization
Requires Python:       >=3.8                                                       
... # Truncated for brevity                                                                                                                                                        
@tinez
Copy link
Contributor

tinez commented Jul 12, 2024

A minimal example to reproduce that would be:

import re
from pydantic import BaseModel, Field

class A(BaseModel):
    b: str = Field(pattern=r"[a-z]")
    c: str = Field(pattern=re.compile(r"[a-z]"))

x = A.model_validate({"b": "Abc", "c": "Abc"})

The validation process for b will succeed, but it will fail for c. This seems to happen because in the first case, pydantic_core chooses to use rusts's regex as the regex engine and in the second case it simply calls pythonic re.Pattern.match method. The rust Regex.is_match implementation however, behaves more like python's re.search, in the docs we can find (emphasis mine):

Returns true if and only if there is a match for the regex anywhere in the haystack given.

I've filed a PR for this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants