-
Notifications
You must be signed in to change notification settings - Fork 13.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SIP-35] Proposal for Improving Superset’s Python Code Organization #9077
Comments
Looks good overall, however I would caution grouping things like models and/or daos under their domain as you have done above in favor of simply grouping them by function -- i.e. a package for |
@craig-rueda I've centralized the models in the recommendation, but have left the DAOs distributed - I don't think the DAOs will be too heavily reused outside of their domain. If the future proves me wrong, it can be refactored. |
I agree with many issues/concerns raised in this SIP though I’m not overly familiar with commands and DAOs. Are there examples of other apps (preferably Flask) which use these constructs? Note at Airbnb we have a number of Flask apps which follow best practices, i.e., are modular, have empty I’m all for refactoring the Python codebase (leaning more on blueprints), though I’m unsure whether need to restructure the logic at this time to leverage commands and DAOs. A full restructure (as opposed to a refactor) seems like a considerable amount of work and may be somewhat overkill to address the concerns raised in this SIP. |
Would Inversion of control or dependency injection help with this? If not, what will help with it?
This does not seem like much of a problem to me. Except that PyCharm (and other tools) run a module-reorder as part of their automatic code cleanup and this might prohibit something from loading.
I do not understand the problem: Are you saying there is code duplication? If the purpose of a module is to do something for the app, then it stands to reason it would have app-specific code, wouldnt it? I'm confused.
It can be challenging, but IMHO the code is well-written. I had to trace throgh 7 levels of OO inheritance here: #8695 (comment) I would be curious to see how you would improve things and whether the problems is rooted in Flask itself and FAB and Superset have no choice but to bow to their Creator.
this phrase does not express a complete thought. I think there is some sort of grammatical error here. I have no idea what you are suggesting.
One of these is a noun/product just like the things you prefer. The other is a verb.... What is unacceptable about SQL Lab?
Why is this preferred over a find_for_update in the class that |
The empty |
First, thanks for your comments! I appreciate the community taking the issue seriously. @DiggidyDave I agree that empty @john-bodley Superset currently has examples of both the Command pattern and a pattern similar to DAOs. Alembic's individual migration classes are an example of a command and FAB's shared filters are very similar to a DAO, in that they encapsulate shared query logic outside of the model. A DAO allows us to extract query logic from model classes to a clear destination. As we refactor, we need a target to move towards. Blueprints separate functionality well at the API layer, but will not provide a clean abstraction at the model or business logic layer without a few other patterns to lean on. I agree that fully refactoring the code is a large unit of work and expect this to be a target we move towards gradually where it makes sense. Currently the model layer is very thick and we have a lot of business logic in endpoint code. To me this is a difficult problem to solve without a shared concept of where this code should go moving forward and a pattern to leverage. |
Disorganization is a result of many people independently contributing to a project, focusing on shipping functional code quickly. This has worked well for Superset, but has resulted in a fair amount of tech debt. This SIP is about putting some structure in place that can help with future efforts. Blaming committers for these issues is not helpful or appropriate.
The idea is that backend code should be focused on the entities being acted on, rather than the specific part of Superset that is interacting with that entity. For example, SQL Lab and Explore ("products") interact with Datasources, Queries, Users ("objects"). This is proposing that the backend should be focused on providing APIs that are object-centric (
I would argue that you should not have to delve through 7 levels of inheritance to understand what a piece of code is doing. Simpler is better.
You may want to take a look at principles of Clean Architecture for some information on this. |
@suddjian - It certainly is not helpful or appropriate. And if you could explain why you included that comment, I would appreciate it. |
@metaperl But wording that idea as "who let these problems get into the codebase in the first place" reads like pointing fingers at the folks who wrote/reviewed the existing code, which is why I responded to it. |
Chime in my two cents as a passerby. I really like that different modules have their independent folders and we are introducing conventions and increasing consistency. I think an intuitive code structure and consistent paradigms help a lot for future contributors to understand the code. Sharing the approach I took in one of my older projects (a general purpose CMS). It's slightly different than what is proposed, but very similar in concept. My python files are organized like this:
I find the separation of core/shared functionalities and addition of a "modules" folder especially useful because it makes the whole project easier to navigate. Loading modules and booting the app is as simple as this. def load_module(app, name):
"""Dynamically load modules in the `modules/` folder.
package = 'david.modules.%s' % name
mod = __import__(package, fromlist=['admin', 'bp', 'setup', 'view'])
# register module blueprint and setup
register_views(app, mod)
if hasattr(mod, 'view'):
register_views(app, mod.view)
return mod
def register_views(app, *view_modules):
"""Register views, including API endpoints"""
for v in view_modules:
if hasattr(v, 'bp'):
app.register_blueprint(v.bp)
if hasattr(v, 'setup'):
v.setup(app) This is similar to what the Hope this helps. |
I really this we need to untangle the All these modules, components, etc. need to be app agnostic and should be implemented as blueprints. If there are config or similar aspects these modules need leverage one can use the |
Adding my two cents here: Regarding blueprints: I'm inclined on delegating the actual registering to each module manager. We can still say that they occur within This way:
For example:
So
Just a thought, could db_engine_specs follow a similar pattern? The blueprints are handled by FAB's |
@john-bodley I didn't address your comment directly regarding blueprints. Let me do so now: I concur 100% with the idea of untangling the
Were we to move entirely to blueprints being defined in Superset, that would have substantial implications for the extent that we can leverage FAB in endpoints without either reworking FAB internals or adding the ability to nest blueprints to Flask. As we've been building out How would you recommend reconciling this problem? I think that @dpgaspar's suggestion for adding a |
@willbarrett maybe there's a disconnect with my understanding of FAB. Does FAB prevent us from defining all the API endpoints as blueprints and housing these under a root |
@john-bodley My understanding here is a little fuzzy too, so I'll let @dpgaspar correct me if I'm wrong. FAB creates blueprints under the hood whenever we use it to define endpoints. @dpgaspar how far away from the truth am I :)? |
@willbarrett that's a good explanation. @john-bodley Yet, I'm curious about the need to create a big blueprint that holds all API endpoints? Note that we aggregate all API classes under a base API class, where we can impose specific behavior/config. |
I welcome this effort, and the amount of circular imports that typing has exposed really shows that something needs to be done. To get started, I would almost recommend grepping for packages with |
[SIP] Proposal for Improving Superset’s Python Code Organization
Motivation
As I was in the weeds fixing a bunch of Pylint failures, Max and I started going back and forth on this PR: #8777, which we ultimately closed. The root cause of that was a lack of shared understanding on the best code structure for Superset going forward. Talking with others at Preset, I realized that the issue was larger than just a new contributor not understanding project practices. Without a shared understanding, we are lacking a cohesive approach towards refactoring Superset - we need a technical North Star for project structure.
Preset’s developers met and identified a number of pain points in the existing code base. Here they are, with a bit of color to make the meaning clear:
Proposed Change
In order to address these concerns, we’d like to propose a few guiding principles for Python code structure going forward:
__init__.py
files emptyNew patterns to introduce:
Command:
There are multiple patterns named Command, and the one we reference here is most similar to an Alembic migration. Commands perform a single cohesive business operation, have a very small public API, and either entirely succeed or entirely fail. In practice, this requires that database transaction behavior be controlled at the Command level. When possible, commands should be modeled such that they perform a single database transaction. In the case of failure, they raise an exception or return an error code, and it is the responsibility of the caller to translate that into a correct response for a user.
Example command:
DAO (Data Access Object):
A DAO in the context of Superset would be an object that would manage the lifecycle of SQLAlchemy models. Custom queries would be implemented in DAOs and then called from Command objects. In comparison to Command objects, DAOs have relatively broad public interfaces. DAOs should not depend on other DAOs to avoid circular dependencies. If results are needed cross-DAO, that should be orchestrated in the Command. Here’s a sample simplified DAO for illustrative purposes:
Proposed example package structure that follows the above principles:
In this design, all systems related to a specific back-end resource have been grouped under a top-level folder.
__init__.py
files should be left empty to enable only pulling in the portions of the system necessary for a specific entrypoint (Celery shouldn’t needapi.py
orviews.py
for instance)New or Changed Public Interfaces
Over time, the internals of Superset will evolve towards the new structure. Public HTTP interfaces will not be likely to change as a result of the above proposal, but code will move and alter to conform. This will impact organizations that apply their own customizations to Superset.
New dependencies
None
Migration Plan and Compatibility
Introduce refactors to existing code at a manageable pace to allow organizations relying on Superset internals time to adapt.
Rejected Alternatives
Preset discussed Service Objects as an alternative to both Commands and DAOs. We felt that Commands provided easier entrypoints for the ports of our application (API endpoints, views, command line invocations, Celery tasks) than Service Objects, and that introducing DAOs as well helped further break down concerns.
We also considered structuring top-level folders by function (api, models, etc.) but found this resulted in drastically more Python modules overall without substantially simplifying the question of where code should live.
Individuals consulted in creating this SIP
@mistercrunch @craig-rueda @dpgaspar @robdiciuccio @suddjian @nytai
The text was updated successfully, but these errors were encountered: