Skip to content

Commit

Permalink
[red-knot] Add support for string annotations (#14151)
Browse files Browse the repository at this point in the history
## Summary

This PR adds support for parsing and inferring types within string
annotations.

### Implementation (attempt 1)

This is preserved in
6217f48.

The implementation here would separate the inference of string
annotations in the deferred query. This requires the following:
* Two ways of evaluating the deferred definitions - lazily and eagerly. 
* An eager evaluation occurs right outside the definition query which in
this case would be in `binding_ty` and `declaration_ty`.
* A lazy evaluation occurs on demand like using the
`definition_expression_ty` to determine the function return type and
class bases.
* The above point means that when trying to get the binding type for a
variable in an annotated assignment, the definition query won't include
the type. So, it'll require going through the deferred query to get the
type.

This has the following limitations:
* Nested string annotations, although not necessarily a useful feature,
is difficult to implement unless we convert the implementation in an
infinite loop
* Partial string annotations require complex layout because inferring
the types for stringified and non-stringified parts of the annotation
are done in separate queries. This means we need to maintain additional
information

### Implementation (attempt 2)

This is the final diff in this PR.

The implementation here does the complete inference of string annotation
in the same definition query by maintaining certain state while trying
to infer different parts of an expression and take decisions
accordingly. These are:
* Allow names that are part of a string annotation to not exists in the
symbol table. For example, in `x: "Foo"`, if the "Foo" symbol is not
defined then it won't exists in the symbol table even though it's being
used. This is an invariant which is being allowed only for symbols in a
string annotation.
* Similarly, lookup name is updated to do the same and if the symbol
doesn't exists, then it's not bounded.
* Store the final type of a string annotation on the string expression
itself and not for any of the sub-expressions that are created after
parsing. This is because those sub-expressions won't exists in the
semantic index.

Design document:
https://www.notion.so/astral-sh/String-Annotations-12148797e1ca801197a9f146641e5b71?pvs=4

Closes: #13796 

## Test Plan

* Add various test cases in our markdown framework
* Run `red_knot` on LibCST (contains a lot of string annotations,
specifically
https://github.com/Instagram/LibCST/blob/main/libcst/matchers/_matcher_base.py),
FastAPI (good amount of annotated code including `typing.Literal`) and
compare against the `main` branch output
  • Loading branch information
dhruvmanila authored Nov 15, 2024
1 parent a48d779 commit 9ec690b
Show file tree
Hide file tree
Showing 6 changed files with 569 additions and 87 deletions.
1 change: 1 addition & 0 deletions crates/red_knot_python_semantic/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ license = { workspace = true }
ruff_db = { workspace = true }
ruff_index = { workspace = true }
ruff_python_ast = { workspace = true, features = ["salsa"] }
ruff_python_parser = { workspace = true }
ruff_python_stdlib = { workspace = true }
ruff_source_file = { workspace = true }
ruff_text_size = { workspace = true }
Expand Down
186 changes: 184 additions & 2 deletions crates/red_knot_python_semantic/resources/mdtest/annotations/string.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,191 @@
# String annotations

## Simple

```py
def f() -> "int":
return 1

# TODO: We do not support string annotations, but we should not panic if we encounter them
reveal_type(f()) # revealed: @Todo
reveal_type(f()) # revealed: int
```

## Nested

```py
def f() -> "'int'":
return 1

reveal_type(f()) # revealed: int
```

## Type expression

```py
def f1() -> "int | str":
return 1

def f2() -> "tuple[int, str]":
return 1

reveal_type(f1()) # revealed: int | str
reveal_type(f2()) # revealed: tuple[int, str]
```

## Partial

```py
def f() -> tuple[int, "str"]:
return 1

reveal_type(f()) # revealed: tuple[int, str]
```

## Deferred

```py
def f() -> "Foo":
return Foo()

class Foo:
pass

reveal_type(f()) # revealed: Foo
```

## Deferred (undefined)

```py
# error: [unresolved-reference]
def f() -> "Foo":
pass

reveal_type(f()) # revealed: Unknown
```

## Partial deferred

```py
def f() -> int | "Foo":
return 1

class Foo:
pass

reveal_type(f()) # revealed: int | Foo
```

## `typing.Literal`

```py
from typing import Literal

def f1() -> Literal["Foo", "Bar"]:
return "Foo"

def f2() -> 'Literal["Foo", "Bar"]':
return "Foo"

class Foo:
pass

reveal_type(f1()) # revealed: Literal["Foo", "Bar"]
reveal_type(f2()) # revealed: Literal["Foo", "Bar"]
```

## Various string kinds

```py
# error: [annotation-raw-string] "Type expressions cannot use raw string literal"
def f1() -> r"int":
return 1

# error: [annotation-f-string] "Type expressions cannot use f-strings"
def f2() -> f"int":
return 1

# error: [annotation-byte-string] "Type expressions cannot use bytes literal"
def f3() -> b"int":
return 1

def f4() -> "int":
return 1

# error: [annotation-implicit-concat] "Type expressions cannot span multiple string literals"
def f5() -> "in" "t":
return 1

# error: [annotation-escape-character] "Type expressions cannot contain escape characters"
def f6() -> "\N{LATIN SMALL LETTER I}nt":
return 1

# error: [annotation-escape-character] "Type expressions cannot contain escape characters"
def f7() -> "\x69nt":
return 1

def f8() -> """int""":
return 1

# error: [annotation-byte-string] "Type expressions cannot use bytes literal"
def f9() -> "b'int'":
return 1

reveal_type(f1()) # revealed: Unknown
reveal_type(f2()) # revealed: Unknown
reveal_type(f3()) # revealed: Unknown
reveal_type(f4()) # revealed: int
reveal_type(f5()) # revealed: Unknown
reveal_type(f6()) # revealed: Unknown
reveal_type(f7()) # revealed: Unknown
reveal_type(f8()) # revealed: int
reveal_type(f9()) # revealed: Unknown
```

## Various string kinds in `typing.Literal`

```py
from typing import Literal

def f() -> Literal["a", r"b", b"c", "d" "e", "\N{LATIN SMALL LETTER F}", "\x67", """h"""]:
return "normal"

reveal_type(f()) # revealed: Literal["a", "b", "de", "f", "g", "h"] | Literal[b"c"]
```

## Class variables

```py
MyType = int

class Aliases:
MyType = str

forward: "MyType"
not_forward: MyType

reveal_type(Aliases.forward) # revealed: str
reveal_type(Aliases.not_forward) # revealed: str
```

## Annotated assignment

```py
a: "int" = 1
b: "'int'" = 1
c: "Foo"
# error: [invalid-assignment] "Object of type `Literal[1]` is not assignable to `Foo`"
d: "Foo" = 1

class Foo:
pass

c = Foo()

reveal_type(a) # revealed: Literal[1]
reveal_type(b) # revealed: Literal[1]
reveal_type(c) # revealed: Foo
reveal_type(d) # revealed: Foo
```

## Parameter

TODO: Add tests once parameter inference is supported
Original file line number Diff line number Diff line change
Expand Up @@ -110,3 +110,29 @@ c: builtins.tuple[builtins.tuple[builtins.int, builtins.int], builtins.int] = ((
# error: [invalid-assignment] "Object of type `Literal["foo"]` is not assignable to `tuple[tuple[int, int], int]`"
c: builtins.tuple[builtins.tuple[builtins.int, builtins.int], builtins.int] = "foo"
```

## Future annotations are deferred

```py
from __future__ import annotations

x: Foo

class Foo:
pass

x = Foo()
reveal_type(x) # revealed: Foo
```

## Annotations in stub files are deferred

```pyi path=main.pyi
x: Foo

class Foo:
pass

x = Foo()
reveal_type(x) # revealed: Foo
```
3 changes: 2 additions & 1 deletion crates/red_knot_python_semantic/src/types.rs
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@ mod infer;
mod mro;
mod narrow;
mod signatures;
mod string_annotation;
mod unpacker;

#[salsa::tracked(return_ref)]
Expand All @@ -58,7 +59,7 @@ pub fn check_types(db: &dyn Db, file: File) -> TypeCheckDiagnostics {

/// Infer the public type of a symbol (its type as seen from outside its scope).
fn symbol_by_id<'db>(db: &'db dyn Db, scope: ScopeId<'db>, symbol: ScopedSymbolId) -> Symbol<'db> {
let _span = tracing::trace_span!("symbol_ty_by_id", ?symbol).entered();
let _span = tracing::trace_span!("symbol_by_id", ?symbol).entered();

let use_def = use_def_map(db, scope);

Expand Down
Loading

0 comments on commit 9ec690b

Please sign in to comment.