[str] Add `LiteralString` overload for `getitem` #12714

pradeep90 · 2024-10-01T05:45:17Z

In PEP 675, @gbleaney and I had specified a list of LiteralString-preserving overloads for str. However, we didn't specify an overload for __getitem__ and didn't give any rationale either. IIRC this was an edge case we didn't want to take a strong decision on unless users wanted it.

@carljm brought this up yesterday, so I think it's worth discussing.

Pro: my_literal_string[i] or my_literal_string[i:j] should technically be compatible with LiteralString, since it is a substring of a literal-derived string.

Con: The main downside is that an attacker might control the indexes and try to access a specific substring from a literal string in the code. For example, they might narrow down the string to rm foo or SELECT *.

It's true that join and other methods could also construct dangerous strings from LiteralStrings, and we even call that out as an accepted tradeoff in the PEP:

Trivial functions could be constructed to convert a str to a LiteralString:

     def make_literal(s: str) -> LiteralString:
         letters: Dict[str, LiteralString] = {
             "A": "A",
             "B": "B",
             ...
         }
         output: List[LiteralString] = [letters[c] for c in s]
         return "".join(output)

We could mitigate the above using linting, code review, etc., but ultimately a clever, malicious developer attempting to circumvent the protections offered by LiteralString will always succeed. The important thing to remember is that LiteralString is not intended to protect against malicious developers; it is meant to protect against benign developers accidentally using sensitive APIs in a dangerous way (without getting in their way otherwise).

Without LiteralString, the best enforcement tool API authors have is documentation, which is easily ignored and often not seen. With LiteralString, API misuse requires conscious thought and artifacts in the code that reviewers and future developers can notice.

-- PEP 675 - Appendix B: Limitations

__getitem__, however, seems a bit different, because it (and split, zfill, etc.) accept an index or width that could be used to construct a dangerous query or a humongous string.

So, we need to clarify the intent a little. What was the intent of these overloads? We wanted to forbid "arbitrary user-supplied strings" while allowing methods that preserved literal strings. We were not trying to prevent every possible exploit on the string. Since __getitem__ forbids arbitrary user-supplied strings and preserves literal strings, I think we should add an overload for it.

Thanks to @carljm for bringing this up!

cc @gbleaney @JelleZijlstra in case they have different opinions. Would also love to hear from other users.

@gbleaney

In PEP 675, @gbleaney and I had specified a list of `LiteralString`-preserving [overloads](https://peps.python.org/pep-0675/#appendix-c-str-methods-that-preserve-literalstring) for `str`. However, we didn't specify an overload for `__getitem__` and didn't give any rationale either. IIRC this was an edge case we didn't want to take a strong decision on unless users wanted it. @carljm brought this up yesterday, so I think it's worth discussing. Pro: `my_literal_string[i]` or `my_literal_string[i:j]` should technically be compatible with `LiteralString`, since it is a substring of a literal-derived string. Con: The main downside is that an attacker might control the indexes and try to access a specific substring from a literal string in the code. For example, they might narrow down the string to `rm foo` or `SELECT *`. It's true that `join` and other methods could also construct dangerous strings from `LiteralString`s, and we even call that out as an accepted tradeoff in the PEP: > 4. Trivial functions could be constructed to convert a str to a LiteralString: > > def make_literal(s: str) -> LiteralString: > letters: Dict[str, LiteralString] = { > "A": "A", > "B": "B", > ... > } > output: List[LiteralString] = [letters[c] for c in s] > return "".join(output) > > We could mitigate the above using linting, code review, etc., but ultimately a clever, malicious developer attempting to circumvent the protections offered by LiteralString will always succeed. The important thing to remember is that LiteralString is not intended to protect against malicious developers; it is meant to protect against benign developers accidentally using sensitive APIs in a dangerous way (without getting in their way otherwise). > > Without LiteralString, the best enforcement tool API authors have is documentation, which is easily ignored and often not seen. With LiteralString, API misuse requires conscious thought and artifacts in the code that reviewers and future developers can notice. > > -- [PEP 675 - Appendix B: Limitations](https://peps.python.org/pep-0675/#appendix-b-limitations) `__getitem__`, however, seems a bit different, because it (and `split`, `zfill`, etc.) accept an index or width that could be used to construct a dangerous query or a humongous string. So, we need to clarify the intent a little. What was the intent of these overloads? We wanted to forbid "arbitrary user-supplied strings" while allowing methods that preserved literal strings. We were not trying to prevent every possible exploit on the string. Since `__getitem__` forbids arbitrary user-supplied strings and preserves literal strings, I think we should add an overload for it. Thanks to @carljm for bringing this up! cc @gbleaney @JelleZijlstra in case they have different opinions. Would also love to hear from other users.

…load. This should fix the error: ``` /opt/hostedtoolcache/Python/3.10.15/x64/bin/python -m mypy.stubtest --check-typeshed --custom-typeshed-dir . --allowlist stdlib/@tests/stubtest_allowlists/common.txt --allowlist stdlib/@tests/stubtest_allowlists/linux.txt --allowlist stdlib/@tests/stubtest_allowlists/py310.txt --allowlist stdlib/@tests/stubtest_allowlists/linux-py310.txt error: not checking stubs due to mypy build errors: stdlib/builtins.pyi:595: error: Overloaded function signature 2 will never be matched: signature 1's parameter type(s) are the same or broader [misc] ``` I'm not sure why there is an error, because the overload seems valid to me. Other `LiteralString` overloads have the same `type: ignore[misc]`, so I'm just going to copy them.

github-actions · 2024-10-01T06:00:26Z

Diff from mypy_primer, showing the effect of this PR on open source code:

streamlit (https://github.com/streamlit/streamlit)
- lib/tests/streamlit/elements/lib/column_config_utils_test.py:385:30: error: Invalid index type "str" for "Union[ColumnConfig, str, None]"; expected type "Union[SupportsIndex, slice]"  [index]
+ lib/tests/streamlit/elements/lib/column_config_utils_test.py:385:13: error: No overload variant of "__getitem__" of "str" matches argument type "str"  [call-overload]
+ lib/tests/streamlit/elements/lib/column_config_utils_test.py:385:13: note: Possible overload variants:
+ lib/tests/streamlit/elements/lib/column_config_utils_test.py:385:13: note:     def __getitem__(self, Union[SupportsIndex, slice], /) -> str

prefect (https://github.com/PrefectHQ/prefect)
- src/prefect/cli/cloud/__init__.py:325: error: Invalid index type "str" for "dict[UUID, list[Workspace]]"; expected type "UUID"  [index]
- src/prefect/cli/cloud/__init__.py:325: error: Invalid index type "str" for "str"; expected type "SupportsIndex | slice"  [index]
+ src/prefect/cli/cloud/__init__.py:325: error: No overload variant of "__getitem__" of "str" matches argument type "str"  [call-overload]
+ src/prefect/cli/cloud/__init__.py:325: note: Possible overload variants:
+ src/prefect/cli/cloud/__init__.py:325: note:     def __getitem__(self, SupportsIndex | slice, /) -> str

dd-trace-py (https://github.com/DataDog/dd-trace-py)
+ ddtrace/settings/config.py:840: error: Unused "type: ignore" comment  [unused-ignore]
+ ddtrace/settings/config.py:840: error: No overload variant of "__getitem__" of "str" matches argument type "str"  [call-overload]
+ ddtrace/settings/config.py:840: note: Error code "call-overload" not covered by "type: ignore" comment
+ ddtrace/settings/config.py:840: note: Possible overload variants:
+ ddtrace/settings/config.py:840: note:     def __getitem__(self, SupportsIndex | slice, /) -> str

AlexWaygood

To me this makes sense. I can't think of any way in which this would be any less safe from a security perspective than the existing LiteralString overload for str.__mul__, which also allows arbitrary integers and could also create a potentially huge string.

AlexWaygood · 2024-10-01T07:12:20Z

Diff from mypy_primer, showing the effect of this PR on open source code:

I think this can just be ignored, because mypy still doesn't support LiteralString, and patches their copy of builtins.pyi so that all LiteralString overloads are removed from it.

carljm · 2024-10-01T15:57:29Z

This looks good to me, thanks @pradeep90 !

I don't understand the mypy-primer errors, but I'll take @AlexWaygood 's word for it that they aren't a problem.

AlexWaygood · 2024-10-01T16:08:42Z

I don't understand the mypy-primer errors, but I'll take @AlexWaygood 's word for it that they aren't a problem.

It's a thing mypy does so they can halfway-but-not-really support LiteralString -- they patch typing.pyi so that LiteralString is just an alias to str. But that in turn means that they think all the LiteralString overloads in builtins.pyi overlap, causing spurious errors like those you see in mypy_primer here, which is why they remove all use of LiteralString from their patched version of builtins.pyi. TL;DR we don't need to worry about these primer errors: mypy users won't see them.

JelleZijlstra · 2024-10-01T16:10:31Z

The mypy-primer errors seem to be because mypy issues a different error code for errors involving __getitem__ if __getitem__ is overloaded (e.g., https://github.com/DataDog/dd-trace-py/blob/7f6f3dbe7c2e00fe84cdf944e1121304306fb649/ddtrace/settings/config.py#L840).

pradeep90 added 2 commits October 1, 2024 11:04

AlexWaygood approved these changes Oct 1, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[str] Add `LiteralString` overload for `getitem` #12714

[str] Add `LiteralString` overload for `getitem` #12714

pradeep90 commented Oct 1, 2024

github-actions bot commented Oct 1, 2024

AlexWaygood left a comment

AlexWaygood commented Oct 1, 2024

carljm commented Oct 1, 2024

AlexWaygood commented Oct 1, 2024

JelleZijlstra commented Oct 1, 2024

[str] Add LiteralString overload for __getitem__ #12714

Are you sure you want to change the base?

[str] Add LiteralString overload for __getitem__ #12714

Conversation

pradeep90 commented Oct 1, 2024

github-actions bot commented Oct 1, 2024

AlexWaygood left a comment

Choose a reason for hiding this comment

AlexWaygood commented Oct 1, 2024

carljm commented Oct 1, 2024

AlexWaygood commented Oct 1, 2024

JelleZijlstra commented Oct 1, 2024

[str] Add `LiteralString` overload for `getitem` #12714

[str] Add `LiteralString` overload for `getitem` #12714