Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: new Columnar upload form and API #28192

Conversation

dpgaspar
Copy link
Member

@dpgaspar dpgaspar commented Apr 24, 2024

SUMMARY

Continuing the work on deprecating/removing server side rendered pages, this adds a new React/Antd Form and backend API for Columnar uploads.

Leverages the work already done on: #28164, #28105, #27840

Columnar upload will accept ZIP files and parquet files.

A refactor was done on CSV and Excel upload, with this PR the initial parsing of the data file to get metadata info such as columns and sheet names was being done on the frontend, now we send the file to the backend to fetch it's metadata info. This is more maintainable and scalable since we avoid code duplication to parse data files, avoid having to add a new frontend dependencies for each file type, avoid blocking the UI while the file is being parsed.
Although we send the entire file to the backend, by using pandas or pyarrow we can avoid having to parse the entire file, so I expect this operation to be relatively lightweight.

3 new endpoints are added to fetch data files metadata:
api/v1/database/csv_upload_metadata
api/v1/database/excel_upload_metadata
api/v1/database/columnar_upload_metadata

Permission names for these endpoints are the same as their counterparts, so csv_upload on Database, excel_upload on Database and columnar_upload on Database

Possibly a single endpoint to upload data and a single endpoint to fetch metadata would be sufficient, but that would imply a single permission for all file types, and that's a breaking change.
I think that a single permission and only 2 REST API endpoints is better, so having granular permissions for each file type is not actually required. I'll add a v5 breaking item if everyone agrees.

Screen.Recording.2024-04-29.at.11.41.47.mov

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

TESTING INSTRUCTIONS

ADDITIONAL INFORMATION

  • Has associated issue:
  • Required feature flags:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • Introduces new feature or API
  • Removes existing feature or API

@github-actions github-actions bot added the api Related to the REST API label Apr 24, 2024
superset/databases/api.py Outdated Show resolved Hide resolved
@github-actions github-actions bot added risk:db-migration PRs that require a DB migration dependencies:npm labels Apr 26, 2024
@dpgaspar dpgaspar marked this pull request as ready for review April 29, 2024 12:32
@dpgaspar dpgaspar requested a review from a team as a code owner April 29, 2024 12:32
@github-actions github-actions bot removed i18n Namespace | Anything related to localization i18n:spanish Translation related to Spanish language i18n:italian Translation related to Italian language i18n:french Translation related to French language i18n:chinese Translation related to Chinese language i18n:japanese Translation related to Japanese language i18n:russian Translation related to Russian language i18n:korean Translation related to Korean language doc Namespace | Anything related to documentation plugins github_actions Pull requests that update GitHub Actions code i18n:dutch i18n:slovak i18n:ukrainian i18n:portuguese i18n:brazilian i18n:traditional-chinese labels May 6, 2024
@dpgaspar dpgaspar merged commit 9a339f0 into apache:master May 6, 2024
29 checks passed
@dpgaspar dpgaspar deleted the danielgaspar/sc-59713/migrate-columnar-data-upload-to-database branch May 6, 2024 14:51
imancrsrk pushed a commit to imancrsrk/superset that referenced this pull request May 10, 2024
jzhao62 pushed a commit to jzhao62/superset that referenced this pull request May 16, 2024
EnxDev pushed a commit to EnxDev/superset that referenced this pull request May 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api Related to the REST API dependencies:npm risk:db-migration PRs that require a DB migration size/XXL
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants