-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slow response times interacting with data lineage chart on large projects #170
Comments
Profiling highlighted that garbage collection events make up most of the interaction. I believe – please feel free to correct me if you think I'm wrong – that the javascript is loading every node from disk and disposing the buffer separately. There are over 300 nodes in the graph, such behavior would cause a slow down w/o sufficient threads to process everything asynchronously. I'm not proficient with javascript, but my coworkers and I identified what we think is a relevant StackOverflow post, where user Abe shared:
The js in index.html is very large and obfuscated to my eyes. I believe the fix will require knowledge from the dbt core team. |
big + 1 on this one! The website loads the I think we should:
@jtcohen6 you buy it? |
@drewbanin 100% |
I'm not sure where this issue currently stands but @vogt4nick, @drewbanin and @jtcohen6 - we (kraftheinz engineering) are just about wrapped up with converting dbt docs over to React. One of the first things we did was compressed the manifest.json file during the build which solved all of the performance issues. For context, we currently have over 115 projects and the site loads in less than 5 seconds after that change. We will have a repo to share with everyone once we are done packaging it up for public consumption. |
@kevnoo Very cool!! I'd love to see, when it's ready to share |
super cool and very exciting to hear @kevnoo! Anything we can help out with? Also - wondering if you're thinking about upstreaming these changes. Did you rebuild the site from scratch, or does it look like a branch off of the existing docs site? Would love to hear more if you're able to share! |
We are very close to publishing the code but would be open to chatting directly prior to (or after). The site itself looks exactly the same (for the most part). We are using the manifest.json but do not use the catalog.json or run_results.json. The reason for this is because we have also integrated it directly with Snowflake via a backend API. We also had some issues with the catalog.json due to the way we have implemented DBT across all of our projects. The biggest change was a massive revamp to the lineage graph - this is all done now using HTML Canvas. As for upstreaming the changes, there would be some work required to have it act just like the existing version. We have not gone through the process to allow it to be part of the dbt docs generate process - this is something we are definitely open to but would need some guidance on. And obviously there would need to be a version that does include the use of run_results and catalog. These are all things we agree with and would be open to helping out on but as of now can't prioritize it due to a lack of time. |
@kevnoo any update on if/when you'll be publishing the React version of dbt docs? |
Hey @ajbosco - sorry for the delay! It's ready. @AlexanderKutz is going to work on publishing it over the next couple days. |
This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please remove the stale label or comment on the issue, or it will be closed in 7 days. |
+1 dbt Cloud user - the slow response time of the data lineage chart for their larger project made this feature relatively unusable. Any time they tried to move or modify the chart, it lags and takes a few seconds to load correctly again |
This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days. |
Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest. Just add a comment to notify the maintainers. |
I am also facing the same issue as mentioned by @StepienTomasz while loading the lineage graphs for the "sources". We are getting the page unresponsive and it loads after 3-4 minutes. |
Agreed it is very slow and causes a really poor user experience. Would love to see this move forward! We're evangelizing dbt at the org and the docs are a large part of that. |
FYI on what a full rewrite can look and feel like for dbt docs: https://dagster.io/blog/dbt-docs-on-react |
@sungchun12 absolutely loved the blog post!! Any info about lineage view and going to prod? |
@nobgb no updates as I believe official dbt docs development will go through dbt Cloud only. I'll let the dbt maintainers speak to it more! |
couldn't find the info about it in dbt docs. Would you be so kind to share those with me, i'd like to follow that |
https://dagster.io/blog/dbt-docs-on-react this is awesome except one thing: data lineage view is not implemented. |
Describe the bug
Users open the data lineage graph by clicking the button at the bottom-right corner of the page. It takes 2-3 seconds for the graph to load. This issue persists beyond the initial load too. Most interactions take 2-3 seconds to complete when several nodes are selected in the lineage graph. Response times are also slow when using selectors; the page becomes briefly unresponsive and keystrokes aren't immediately input to the text field.
Steps To Reproduce
Serve the data catalog for a large dbt project with relatively large
manifest.json
andcatalog.json
files; in my example, 300+ models and 1800+ tests generate a 6.2 MBmanifest.json
and a 1.2 MBcatalog.json
.Click the "data lineage chart" button on the bottom-right corner of the page.
See profiling output below for benchmark.
Expected behavior
The data lineage chart should load under 500ms (or some other arbitrary threshold determined by users' tolerance).
Screenshots and log output
The "View Lineage Chart" button:
My profiling output:
System information
Which database are you using dbt with?
The output of
dbt --version
:The operating system you're using:
The data catalog is served with the base Docker image library/nginx:1.19.0-alpine.
The documentation is generated with the base Docker image library/python:3.7.7-slim-buster.
The output of
python --version
: 3.7.7The text was updated successfully, but these errors were encountered: