Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ingest Python #716

Merged
merged 13 commits into from
Mar 25, 2024
Merged

Ingest Python #716

merged 13 commits into from
Mar 25, 2024

Conversation

robrix
Copy link
Contributor

@robrix robrix commented Mar 20, 2024

No description provided.

Copy link
Contributor Author

@robrix robrix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ready for review.

@@ -78,6 +78,7 @@ library
, filepath
, fused-effects ^>= 1.1
, hashable
, language-python
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This lets us hit the ground running since we have a typed AST structure to work with (and autocomplete from, and evaluate directly, and and and).

Comment on lines +36 to +40
data Term
= Module (Py.Module Py.SrcSpan)
| Statement (Py.Statement Py.SrcSpan)
| Expr (Py.Expr Py.SrcSpan)
deriving (Eq, Ord, Show)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We wrap language-python AST nodes up in constructors of a single datatype so as to avoid having to define our own AST for Python and a complicated and error-prone function for copying things from the language-python AST in. This type isn't recursive at all; we just wrap the current layer, and then wrap the next layer on the fly in recursive functions like subterms and eval.

Somewhat to my surprise, the language-python AST (and, I suppose, Python's own syntax) is regular enough to avoid making this very much more complex. There are lots of other types representing identifiers, arguments, parameters, and all the other, weirder bits of syntax, but most of them are auxiliary—not things we would evaluate.

deriving (Eq, Foldable, Functor, Ord, Show, Traversable)

infixl 1 :>>
-- | Non-generic production of the recursive set of subterms.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😭 we can't get this generically for all languages any more using this approach.

Comment on lines +45 to +59
subterms :: Term -> Set.Set Term
subterms t = Set.insert t $ case t of
Module (Py.Module ss) -> suite ss
Statement (Py.Conditional cts e _) -> foldMap (\ (c, t) -> subterms (Expr c) <> suite t) cts <> suite e
Statement (Py.Raise (Py.RaiseV3 e) _) -> maybe Set.empty (subterms . Expr . fst) e
-- FIXME: Py.RaiseV2
-- FIXME: whatever the second field is
Statement (Py.StmtExpr e _) -> subterms (Expr e)
Statement (Py.Fun _ _ _ ss _) -> suite ss
-- FIXME: include 'subterms' of any default values
Expr (Py.Call f as _) -> subterms (Expr f) <> foldMap (\case { Py.ArgExpr e _ -> subterms (Expr e) ; _ -> Set.empty }) as
-- FIXME: support keyword args &c.
_ -> Set.empty -- TBD, and terminals
where
suite = foldMap (subterms . Statement)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that various terminal nodes (literals, etc) are all (correctly) handled by the fallback case—but so are nonterminals we just haven't got around to yet.

I couldn't think of any generic way of implementing this which wouldn't end up being more trouble than hand-writing (and, regrettably, -maintaining) this. Oh, except I just did: an abstract interpretation of a program ends up being a kind of complicated fold. So if we were to define a domain modelling a set of terms, we could use one abstract interpretation to compute the set(s) of terms to use for dead code analysis in a second pass 😝

Comment on lines +72 to +97
Module (Py.Module ss) -> suite ss
Statement (Py.Import is sp) -> setSpan sp $ do
for_ is $ \ Py.ImportItem{ Py.import_item_name = ns } -> case nonEmpty ns of
Nothing -> pure ()
Just ss -> S.simport (pack . Py.ident_string <$> ss)
dunit
Statement (Py.Pass sp) -> setSpan sp dunit
Statement (Py.Conditional cts e sp) -> setSpan sp $ foldr (\ (c, t) e -> do
c' <- eval (Expr c)
dif c' (suite t) e) (suite e) cts
Statement (Py.Raise (Py.RaiseV3 e) sp) -> setSpan sp $ case e of
Just (e, _) -> eval (Expr e) >>= ddie -- FIXME: from clause
Nothing -> dunit >>= ddie
-- FIXME: RaiseV2
-- FIXME: whatever the second field is
Statement (Py.StmtExpr e sp) -> setSpan sp (eval (Expr e))
Statement (Py.Fun n ps _r ss sp) -> let ps' = mapMaybe (\case { Py.Param n _ _ _ -> Just (ident n) ; _ -> Nothing }) ps in setSpan sp $ letrec (ident n) (dabs ps' (foldr (\ (p, a) m -> let' p a m) (suite ss) . zip ps'))
Expr (Py.Var n sp) -> setSpan sp $ let n' = ident n in lookupEnv n' >>= maybe (dvar n') fetch
Expr (Py.Bool b sp) -> setSpan sp $ dbool b
Expr (Py.Strings ss sp) -> setSpan sp $ dstring (pack (mconcat ss))
Expr (Py.Call f as sp) -> setSpan sp $ do
f' <- eval (Expr f)
as' <- traverse eval (mapMaybe (\case { Py.ArgExpr e _ -> Just (Expr e) ; _ -> Nothing }) as)
-- FIXME: support keyword args &c.
dapp f' as'
_ -> fail "TBD"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We handle one fewer case: Let is gone. However, Let was never produced from actual Python code using the TSG generator, so it's unclear why it was in there to begin with.

We also handle statement expressions which I think may not have been handled? And the failure case is explicit here, instead of being hidden in the term ingestion machinery like it had been.

Comment on lines +99 to +101
setSpan s = case fromSpan s of
Just s -> local (\ r -> r{ refSpan = s })
_ -> id
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Locate cases are no longer so bespoke, and are instead sprinkled around throughout the evaluation. I mean, they probably were before anyway, but this is a little more explicit about it.

We could probably call this once, outside the \case, using language-python's typeclass for annotated nodes, but I couldn't be bothered.

, template-haskell >= 2.15 && < 2.19
, template-haskell >= 2.15 && < 2.22
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't mind me, just restoring compilation under newer compilers which CI doesn't exercise yet.

@robrix robrix marked this pull request as ready for review March 25, 2024 14:14
@robrix robrix requested a review from a team as a code owner March 25, 2024 14:14
@robrix robrix merged commit 7bd2ac2 into main Mar 25, 2024
2 checks passed
@robrix robrix deleted the this-branch-name-left-intentionally-blank branch March 25, 2024 18:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants