-
Notifications
You must be signed in to change notification settings - Fork 453
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ingest Python #716
Ingest Python #716
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ready for review.
@@ -78,6 +78,7 @@ library | |||
, filepath | |||
, fused-effects ^>= 1.1 | |||
, hashable | |||
, language-python |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This lets us hit the ground running since we have a typed AST structure to work with (and autocomplete from, and evaluate directly, and and and).
data Term | ||
= Module (Py.Module Py.SrcSpan) | ||
| Statement (Py.Statement Py.SrcSpan) | ||
| Expr (Py.Expr Py.SrcSpan) | ||
deriving (Eq, Ord, Show) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We wrap language-python
AST nodes up in constructors of a single datatype so as to avoid having to define our own AST for Python and a complicated and error-prone function for copying things from the language-python
AST in. This type isn't recursive at all; we just wrap the current layer, and then wrap the next layer on the fly in recursive functions like subterms
and eval
.
Somewhat to my surprise, the language-python
AST (and, I suppose, Python's own syntax) is regular enough to avoid making this very much more complex. There are lots of other types representing identifiers, arguments, parameters, and all the other, weirder bits of syntax, but most of them are auxiliary—not things we would evaluate.
deriving (Eq, Foldable, Functor, Ord, Show, Traversable) | ||
|
||
infixl 1 :>> | ||
-- | Non-generic production of the recursive set of subterms. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😭 we can't get this generically for all languages any more using this approach.
subterms :: Term -> Set.Set Term | ||
subterms t = Set.insert t $ case t of | ||
Module (Py.Module ss) -> suite ss | ||
Statement (Py.Conditional cts e _) -> foldMap (\ (c, t) -> subterms (Expr c) <> suite t) cts <> suite e | ||
Statement (Py.Raise (Py.RaiseV3 e) _) -> maybe Set.empty (subterms . Expr . fst) e | ||
-- FIXME: Py.RaiseV2 | ||
-- FIXME: whatever the second field is | ||
Statement (Py.StmtExpr e _) -> subterms (Expr e) | ||
Statement (Py.Fun _ _ _ ss _) -> suite ss | ||
-- FIXME: include 'subterms' of any default values | ||
Expr (Py.Call f as _) -> subterms (Expr f) <> foldMap (\case { Py.ArgExpr e _ -> subterms (Expr e) ; _ -> Set.empty }) as | ||
-- FIXME: support keyword args &c. | ||
_ -> Set.empty -- TBD, and terminals | ||
where | ||
suite = foldMap (subterms . Statement) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that various terminal nodes (literals, etc) are all (correctly) handled by the fallback case—but so are nonterminals we just haven't got around to yet.
I couldn't think of any generic way of implementing this which wouldn't end up being more trouble than hand-writing (and, regrettably, -maintaining) this. Oh, except I just did: an abstract interpretation of a program ends up being a kind of complicated fold. So if we were to define a domain modelling a set of terms, we could use one abstract interpretation to compute the set(s) of terms to use for dead code analysis in a second pass 😝
Module (Py.Module ss) -> suite ss | ||
Statement (Py.Import is sp) -> setSpan sp $ do | ||
for_ is $ \ Py.ImportItem{ Py.import_item_name = ns } -> case nonEmpty ns of | ||
Nothing -> pure () | ||
Just ss -> S.simport (pack . Py.ident_string <$> ss) | ||
dunit | ||
Statement (Py.Pass sp) -> setSpan sp dunit | ||
Statement (Py.Conditional cts e sp) -> setSpan sp $ foldr (\ (c, t) e -> do | ||
c' <- eval (Expr c) | ||
dif c' (suite t) e) (suite e) cts | ||
Statement (Py.Raise (Py.RaiseV3 e) sp) -> setSpan sp $ case e of | ||
Just (e, _) -> eval (Expr e) >>= ddie -- FIXME: from clause | ||
Nothing -> dunit >>= ddie | ||
-- FIXME: RaiseV2 | ||
-- FIXME: whatever the second field is | ||
Statement (Py.StmtExpr e sp) -> setSpan sp (eval (Expr e)) | ||
Statement (Py.Fun n ps _r ss sp) -> let ps' = mapMaybe (\case { Py.Param n _ _ _ -> Just (ident n) ; _ -> Nothing }) ps in setSpan sp $ letrec (ident n) (dabs ps' (foldr (\ (p, a) m -> let' p a m) (suite ss) . zip ps')) | ||
Expr (Py.Var n sp) -> setSpan sp $ let n' = ident n in lookupEnv n' >>= maybe (dvar n') fetch | ||
Expr (Py.Bool b sp) -> setSpan sp $ dbool b | ||
Expr (Py.Strings ss sp) -> setSpan sp $ dstring (pack (mconcat ss)) | ||
Expr (Py.Call f as sp) -> setSpan sp $ do | ||
f' <- eval (Expr f) | ||
as' <- traverse eval (mapMaybe (\case { Py.ArgExpr e _ -> Just (Expr e) ; _ -> Nothing }) as) | ||
-- FIXME: support keyword args &c. | ||
dapp f' as' | ||
_ -> fail "TBD" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We handle one fewer case: Let
is gone. However, Let
was never produced from actual Python code using the TSG generator, so it's unclear why it was in there to begin with.
We also handle statement expressions which I think may not have been handled? And the failure case is explicit here, instead of being hidden in the term ingestion machinery like it had been.
setSpan s = case fromSpan s of | ||
Just s -> local (\ r -> r{ refSpan = s }) | ||
_ -> id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Locate
cases are no longer so bespoke, and are instead sprinkled around throughout the evaluation. I mean, they probably were before anyway, but this is a little more explicit about it.
We could probably call this once, outside the \case
, using language-python
's typeclass for annotated nodes, but I couldn't be bothered.
, template-haskell >= 2.15 && < 2.19 | ||
, template-haskell >= 2.15 && < 2.22 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't mind me, just restoring compilation under newer compilers which CI doesn't exercise yet.
No description provided.