-
-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor to separate concerns and facilitate new behavior #110
Conversation
jaraco
commented
May 24, 2024
- Extract freeze_data and load_pip_packages to separate concerns.
- Extract load_pip_package function.
- Use degenerate form for fallback.
- Extract package_item behavior.
- Extract is_valid filter on pip package entries with no URL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jaraco! Looks good to me.
micropip/_commands/freeze.py
Outdated
return map( | ||
package_item, | ||
filter(None, map(load_pip_package, importlib.metadata.distributions())), | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious how you feel about the readability of map
and filter
in Python where we don't have any way to do right function application. What I'd really like to see is
importlib.metadata.distributions().map(load_pip_package).filter(None).map(package_item)
so that the logic flows from left to right and top to bottom instead of in a spiral. I tend to use comprehensions instead but they are arguably just as bad if not worse. Combining filter and map does make sense but list comprehensions do it in a strange order. I do like Rust's filter_map
quite a lot.
Anyways, code is clear I'm happy to do it this way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally, I'm a big fan of the general concept of filter and map (and reduce). Part of my affinity comes from my first formal language training being in Scheme. But my reason for sticking with it is how it incentivizes stateless functional programming. Comprehensions and generator expressions achieve the same thing, but often by adding unnecessary variables. If I'm working with a comprehension and it has the form [do(x) for x in input if check(x)]
, I will almost certainly translate that to map(do, filter(check, input))
and avoid creating the name x
. I will sometimes separate the two lines into filtered = filter(check, input);
map(do, filtered)
if the name filtered
improves readability (which it could in the case above).
I do avoid map
/filter
when their use feels contrived or convoluted for the application.
What I'd really like to see is
importlib.metadata.distributions().map(load_pip_package).filter(None).map(package_item)
I agree - that reads better. And other languages (Node, Ruby, ...) design that paradigm into their language. Python, on the other hand, has historically discouraged these chains of operations. So my pragmatic approach is to accept that the language constructs are somewhat inelegant, but still preferable to the alternative of eager-evaluated for/append
loops.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah a lot of my early programming experience was also with lisps. I particularly enjoy programming with only immutable data structures. But it's not always the best for performance.
I'm really excited about the iterator helpers spec in JavaScript which will make doing all of this lazily very ergonomic too:
https://github.com/tc39/proposal-iterator-helpers
def package_item(entry: dict[str, Any]) -> tuple[str, dict[str, Any]]: | ||
return canonicalize_name(entry["name"]), entry |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is interesting, often the need to write a function like this instead of a lambda dissuades me from using map. The combination of spiral control flow with in line lambdas tends to be a bit unmanageable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I wouldn't object to inlining this behavior in a generator expression in load_pip_packages
.
@@ -42,23 +42,23 @@ def freeze_data() -> dict[str, Any]: | |||
def load_pip_packages() -> Iterator[tuple[str, dict[str, Any]]]: | |||
return map( | |||
package_item, | |||
filter(None, map(load_pip_package, importlib.metadata.distributions())), | |||
filter(is_valid, map(load_pip_package, importlib.metadata.distributions())), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is filter(None
just a no-op? It's there to make the diff look better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
filter(None, ...)
is the same as filter(bool, ...)
. It was there to suppress the distributions that returned None
. I included that to maintain the prior expectation that if the url
is None
, no package entry is generated (the continue
in the prior implementation).
I've been using that form since Python 1.5, before bool
was introduced in Python 2.3. It occurs to me I should start using filter(bool, ...)
for better readability.
micropip/_commands/freeze.py
Outdated
|
||
|
||
def is_valid(entry: dict[str, Any]) -> bool: | ||
return entry["url"] is not None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return entry["url"] is not None | |
return entry.get("url", None) is not None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jaraco if we do this change we can merge it. I'll rebase and apply this and then do a rebase merge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would still be interested for answers to my questions about your view on the ergonomics and readability of filter
, map
and other functional constructs in Python.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, whoops. This was a mistake. The proper key should be "file_name"
. Specifically,
return entry["file_name"] is not None
Since load_pip_package
sets file_name
unconditionally, calling .get()
is unnecessary; the code can rely on that key being present, but it may be set to None
.
With the patch as suggested, entry.get("url", None)
will always be None
and no entry will ever be valid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, and it appears you recognized that in the rewritten commit. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, thankfully we have test coverage for this =)
Thanks @jaraco! |