Merge branch 'master' into feature/CLDN-1312

apache · Jun 22, 2022 · 32395da · 32395da
2 parents 210ab09 + a833674
commit 32395da
Show file tree

Hide file tree

Showing 304 changed files with 5,270 additions and 2,514 deletions.
diff --git a/.github/workflows/welcome-new-users.yml b/.github/workflows/welcome-new-users.yml
@@ -0,0 +1,25 @@
+name: Welcome New Contributor
+
+on:
+  pull_request_target:
+    types: [opened]
+
+jobs:
+  welcome:
+    runs-on: ubuntu-latest
+    permissions:
+      issues: write
+
+    steps:
+      - name: Welcome Message
+        uses: actions/first-interaction@v1.0.0
+        with:
+          repo-token: ${{ secrets.GITHUB_TOKEN }}
+          pr-message: |-
+            Congrats on making your first PR and thank you for contributing to Superset! :tada: :heart:
+            We hope to see you in our [Slack](https://apache-superset.slack.com/) community too!
+      - name: First Time Label
+        uses: andymckay/labeler@master
+        with:
+          add-labels: "new:contributor"
+          repo-token: ${{ secrets.GITHUB_TOKEN }}
diff --git a/CHANGELOG.md b/CHANGELOG.md
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -160,7 +160,7 @@ Look through the GitHub issues. Issues tagged with
 
 Superset could always use better documentation,
 whether as part of the official Superset docs,
-in docstrings, `docs/*.rst` or even on the web as blog posts or
+in docstrings, or even on the web as blog posts or
 articles. See [Documentation](#documentation) for more details.
 
 ### Add Translations
@@ -388,23 +388,30 @@ cd superset
 
 The latest documentation and tutorial are available at https://superset.apache.org/.
 
-The site is written using the Gatsby framework and docz for the
-documentation subsection. Find out more about it in `docs/README.md`
+The documentation site is built using [Docusaurus 2](https://docusaurus.io/), a modern static website generator, the source for which resides in `./docs`.
 
-#### Images
+#### Local Development
 
-If you're adding new images to the documentation, you'll notice that the images
-referenced in the rst, e.g.
+To set up a local development environment with hot reloading for the documentation site:
 
-    .. image:: _static/images/tutorial/tutorial_01_sources_database.png
+```shell
+cd docs
+yarn install  # Installs NPM dependencies
+yarn start  # Starts development server at http://localhost:3000
+```
+
+#### Build
+
+To create and serve a production build of the documentation site:
 
-aren't actually stored in that directory. Instead, you should add and commit
-images (and any other static assets) to the `superset-frontend/src/assets/images` directory.
-When the docs are deployed to https://superset.apache.org/, images
-are copied from there to the `_static/images` directory, just like they're referenced
-in the docs.
+```shell
+yarn build
+yarn serve
+```
+
+#### Deployment
 
-For example, the image referenced above actually lives in `superset-frontend/src/assets/images/tutorial`. Since the image is moved during the documentation build process, the docs reference the image in `_static/images/tutorial` instead.
+Commits to `master` trigger a rebuild and redeploy of the documentation site. Submit pull requests that modify the documention with the `docs:` prefix.
 
 ### Flask server
 
@@ -1064,6 +1071,7 @@ LANGUAGES = {
 ```
 
 This script will
+
 1. update the template file `superset/translations/messages.pot` with current application strings.
 2. update language files with the new extracted strings.
 

diff --git a/RELEASING/README.md b/RELEASING/README.md
@@ -422,13 +422,47 @@ with the changes on `CHANGELOG.md` and `UPDATING.md`.
 
 ### Publishing a Convenience Release to PyPI
 
-Using the final release tarball, unpack it and run `./pypi_push.sh`.
-This script will build the JavaScript bundle and echo the twine command
-allowing you to publish to PyPI. You may need to ask a fellow committer to grant
+Extract the release to the `/tmp` folder to build the PiPY release. Files in the `/tmp` folder will be automatically deleted by the OS.
+
+```bash
+mkdir -p /tmp/superset && cd /tmp/superset
+tar xfvz ~/svn/superset/${SUPERSET_VERSION}/${SUPERSET_RELEASE_TARBALL}
+```
+
+Create a virtual environment and install the dependencies
+
+```bash
+cd ${SUPERSET_RELEASE_RC}
+python3 -m venv venv
+source venv/bin/activate
+pip install -r requirements/base.txt
+pip install twine
+```
+
+Create the distribution
+
+```bash
+cd superset-frontend/
+npm ci && npm run build
+cd ../
+flask fab babel-compile --target superset/translations
+python setup.py sdist
+```
+
+Publish to PyPI
+
+You may need to ask a fellow committer to grant
 you access to it if you don't have access already. Make sure to create
 an account first if you don't have one, and reference your username
 while requesting access to push packages.
 
+```bash
+twine upload dist/apache-superset-${SUPERSET_VERSION}.tar.gz
+
+# Set your username to token
+# Set your password to the token value, including the pypi- prefix
+```
+
 ### Announcing
 
 Once it's all done, an [ANNOUNCE] thread announcing the release to the dev@ mailing list is the final step.

diff --git a/UPDATING.md b/UPDATING.md
@@ -31,10 +31,10 @@ assists people when migrating to a new version.
 
 ### Breaking Changes
 
-- [19770](https://github.com/apache/superset/pull/19770): As per SIPs 11 and 68, the native NoSQL Druid connector is deprecated and has been removed. Druid is still supported through SQLAlchemy via pydruid. The config keys `DRUID_IS_ACTIVE` and `DRUID_METADATA_LINKS_ENABLED` have also been removed.
+- [19981](https://github.com/apache/superset/pull/19981): Per [SIP-81](https://github.com/apache/superset/issues/19953) the /explore/form_data api now requires a `datasource_type` in addition to a `datasource_id` for POST and PUT requests
+- [19770](https://github.com/apache/superset/pull/19770): Per [SIP-11](https://github.com/apache/superset/issues/6032) and [SIP-68](https://github.com/apache/superset/issues/14909), the native NoSQL Druid connector is deprecated and has been removed. Druid is still supported through SQLAlchemy via pydruid. The config keys `DRUID_IS_ACTIVE` and `DRUID_METADATA_LINKS_ENABLED` have also been removed.
 - [19274](https://github.com/apache/superset/pull/19274): The `PUBLIC_ROLE_LIKE_GAMMA` config key has been removed, set `PUBLIC_ROLE_LIKE = "Gamma"` to have the same functionality.
 - [19273](https://github.com/apache/superset/pull/19273): The `SUPERSET_CELERY_WORKERS` and `SUPERSET_WORKERS` config keys has been removed. Configure Celery directly using `CELERY_CONFIG` on Superset.
-- [19262](https://github.com/apache/superset/pull/19262): Per [SIP-11](https://github.com/apache/superset/issues/6032) and [SIP-68](https://github.com/apache/superset/issues/14909) the native NoSQL Druid connector is deprecated and will no longer be supported. Druid SQL is still [supported](https://superset.apache.org/docs/databases/druid).
 - [19231](https://github.com/apache/superset/pull/19231): The `ENABLE_REACT_CRUD_VIEWS` feature flag has been removed (premantly enabled). Any deployments which had set this flag to false will need to verify that the React views support their use case.
 - [19230](https://github.com/apache/superset/pull/19230): The `ROW_LEVEL_SECURITY` feature flag has been removed (permantly enabled). Any deployments which had set this flag to false will need to verify that the presence of the Row Level Security feature does not interfere with their use case.
 - [19168](https://github.com/apache/superset/pull/19168): Celery upgrade to 5.X resulted in breaking changes to its command line invocation. Please follow [these](https://docs.celeryq.dev/en/stable/whatsnew-5.2.html#step-1-adjust-your-command-line-invocation) instructions for adjustments. Also consider migrating you Celery config per [here](https://docs.celeryq.dev/en/stable/userguide/configuration.html#conf-old-settings-map).

diff --git a/docker/run-server.sh b/docker/run-server.sh
@@ -27,6 +27,7 @@ gunicorn \
     --worker-class ${SERVER_WORKER_CLASS:-gthread} \
     --threads ${SERVER_THREADS_AMOUNT:-20} \
     --timeout ${GUNICORN_TIMEOUT:-60} \
+    --keep-alive ${GUNICORN_KEEPALIVE:-2} \
     --limit-request-line ${SERVER_LIMIT_REQUEST_LINE:-0} \
     --limit-request-field_size ${SERVER_LIMIT_REQUEST_FIELD_SIZE:-0} \
     "${FLASK_APP}"
diff --git a/docs/README.md b/docs/README.md
@@ -17,36 +17,4 @@ specific language governing permissions and limitations
 under the License.
 -->
 
-# Website
-
-This website is built using [Docusaurus 2](https://docusaurus.io/), a modern static website generator.
-
-### Installation
-
-```
-$ yarn install
-```
-
-### Local Development
-
-```
-$ yarn start
-```
-
-This command starts a local development server and opens up a browser window. Most changes are reflected live without having to restart the server.
-
-### Build
-
-```
-$ yarn build
-```
-
-This command generates static content into the `build` directory and can be served using any static contents hosting service.
-
-### Deployment
-
-```
-$ GIT_USER=<Your GitHub username> USE_SSH=true yarn deploy
-```
-
-If you are using GitHub pages for hosting, this command is a convenient way to build the website and push to the `gh-pages` branch.
+This is the public documentation site for Superset, built using [Docusaurus 2](https://docusaurus.io/). See [CONTRIBUTING.md](../CONTRIBUTING.md#documentation)` for documentation on contributing to documentation.
diff --git a/docs/docs/databases/databricks.mdx b/docs/docs/databases/databricks.mdx
@@ -7,16 +7,12 @@ version: 1
 
 ## Databricks
 
-To connect to Databricks, first install [databricks-dbapi](https://pypi.org/project/databricks-dbapi/) with the optional SQLAlchemy dependencies:
+Databricks now offer a native DB API 2.0 driver, `databricks-sql-connector`, that can be used with the `sqlalchemy-databricks` dialect. You can install both with:
 
 ```bash
-pip install databricks-dbapi[sqlalchemy]
+pip install "superset[databricks]"
 ```
 
-There are two ways to connect to Databricks: using a Hive connector or an ODBC connector. Both ways work similarly, but only ODBC can be used to connect to [SQL endpoints](https://docs.databricks.com/sql/admin/sql-endpoints.html).
-
-### Hive
-
 To use the Hive connector you need the following information from your cluster:
 
 - Server hostname
@@ -27,31 +23,60 @@ These can be found under "Configuration" -> "Advanced Options" -> "JDBC/ODBC".
 
 You also need an access token from "Settings" -> "User Settings" -> "Access Tokens".
 
-Once you have all this information, add a database of type "Databricks (Hive)" in Superset, and use the following SQLAlchemy URI:
+Once you have all this information, add a database of type "Databricks Native Connector" and use the following SQLAlchemy URI:
 
 ```
-databricks+pyhive://token:{access token}@{server hostname}:{port}/{database name}
+databricks+connector://token:{access_token}@{server_hostname}:{port}/{database_name}
 ```
 
 You also need to add the following configuration to "Other" -> "Engine Parameters", with your HTTP path:
 
+```json
+{
+    "connect_args": {"http_path": "sql/protocolv1/o/****"},
+    "http_headers": [["User-Agent", "Apache Superset"]]
+}
 ```
+
+The `User-Agent` header is optional, but helps Databricks identify traffic from Superset. If you need to use a different header please reach out to Databricks and let them know.
+
+## Older driver
+
+Originally Superset used `databricks-dbapi` to connect to Databricks. You might want to try it if you're having problems with the official Databricks connector:
+
+```bash
+pip install "databricks-dbapi[sqlalchemy]"
+```
+
+There are two ways to connect to Databricks when using `databricks-dbapi`: using a Hive connector or an ODBC connector. Both ways work similarly, but only ODBC can be used to connect to [SQL endpoints](https://docs.databricks.com/sql/admin/sql-endpoints.html).
+
+### Hive
+
+To connect to a Hive cluster add a database of type "Databricks Interactive Cluster" in Superset, and use the following SQLAlchemy URI:
+
+```
+databricks+pyhive://token:{access_token}@{server_hostname}:{port}/{database_name}
+```
+
+You also need to add the following configuration to "Other" -> "Engine Parameters", with your HTTP path:
+
+```json
 {"connect_args": {"http_path": "sql/protocolv1/o/****"}}
 ```
 
 ### ODBC
 
 For ODBC you first need to install the [ODBC drivers for your platform](https://databricks.com/spark/odbc-drivers-download).
 
-For a regular connection use this as the SQLAlchemy URI:
+For a regular connection use this as the SQLAlchemy URI after selecting either "Databricks Interactive Cluster" or "Databricks SQL Endpoint" for the database, depending on your use case:
 
 ```
-databricks+pyodbc://token:{access token}@{server hostname}:{port}/{database name}
+databricks+pyodbc://token:{access_token}@{server_hostname}:{port}/{database_name}
 ```
 
 And for the connection arguments:
 
-```
+```json
 {"connect_args": {"http_path": "sql/protocolv1/o/****", "driver_path": "/path/to/odbc/driver"}}
 ```
 
@@ -62,6 +87,6 @@ The driver path should be:
 
 For a connection to a SQL endpoint you need to use the HTTP path from the endpoint:
 
-```
+```json
 {"connect_args": {"http_path": "/sql/1.0/endpoints/****", "driver_path": "/path/to/odbc/driver"}}
 ```
diff --git a/docs/docs/databases/druid.mdx b/docs/docs/databases/druid.mdx
@@ -18,6 +18,12 @@ The connection string looks like:
 ```
 druid://<User>:<password>@<Host>:<Port-default-9088>/druid/v2/sql
 ```
+Here's a breakdown of the key components of this connection string:
+
+User: username portion of the credentials needed to connect to your database
+Password: password portion of the credentials needed to connect to your database
+Host: IP address (or URL) of the host machine that's running your database
+Port: specific port that's exposed on your host machine where your database is running
 
 ### Customizing Druid Connection
 

diff --git a/docs/docs/installation/cache.mdx b/docs/docs/installation/cache.mdx
@@ -42,26 +42,6 @@ defined in `DATA_CACHE_CONFIG`.
 
 ## Celery beat
 
-Superset has a Celery task that will periodically warm up the cache based on different strategies.
-To use it, add the following to the `CELERYBEAT_SCHEDULE` section in `config.py`:
-
-```python
-CELERYBEAT_SCHEDULE = {
-    'cache-warmup-hourly': {
-        'task': 'cache-warmup',
-        'schedule': crontab(minute=0, hour='*'),  # hourly
-        'kwargs': {
-            'strategy_name': 'top_n_dashboards',
-            'top_n': 5,
-            'since': '7 days ago',
-        },
-    },
-}
-```
-
-This will cache all the charts in the top 5 most popular dashboards every hour. For other
-strategies, check the `superset/tasks/cache.py` file.
-
 ### Caching Thumbnails
 
 This is an optional feature that can be turned on by activating it’s feature flag on config:

diff --git a/docs/docs/installation/sql-templating.mdx b/docs/docs/installation/sql-templating.mdx
@@ -33,6 +33,26 @@ For example, to add a time range to a virtual dataset, you can write the followi
 SELECT * from tbl where dttm_col > '{{ from_dttm }}' and dttm_col < '{{ to_dttm }}'
 ```
 
+You can also use [Jinja's logic](https://jinja.palletsprojects.com/en/2.11.x/templates/#tests)
+to make your query robust to clearing the timerange filter:
+
+```sql
+SELECT *
+FROM tbl
+WHERE (
+    {% if from_dttm is not none %}
+        dttm_col > '{{ from_dttm }}' AND
+    {% endif %}
+    {% if to_dttm is not none %}
+        dttm_col < '{{ to_dttm }}' AND
+    {% endif %}
+    true
+)
+```
+
+Note how the Jinja parameters are called within double brackets in the query, and without in the
+logic blocks.
+
 To add custom functionality to the Jinja context, you need to overload the default Jinja
 context in your environment by defining the `JINJA_CONTEXT_ADDONS` in your superset configuration
 (`superset_config.py`). Objects referenced in this dictionary are made available for users to use

diff --git a/docs/docusaurus.config.js b/docs/docusaurus.config.js
@@ -37,11 +37,14 @@ const config = {
   projectName: 'superset', // Usually your repo name.
   themes: ['@saucelabs/theme-github-codeblock'],
   plugins: [
-    ["docusaurus-plugin-less", {
-      lessOptions: {
-        javascriptEnabled: true,
-      }
-    }],
+    [
+      'docusaurus-plugin-less',
+      {
+        lessOptions: {
+          javascriptEnabled: true,
+        },
+      },
+    ],
     [
       '@docusaurus/plugin-client-redirects',
       {
@@ -229,8 +232,7 @@ const config = {
       },
       footer: {
         style: 'dark',
-        links: [
-        ],
+        links: [],
         copyright: `Copyright © ${new Date().getFullYear()},
         The <a href="https://www.apache.org/" target="_blank" rel="noreferrer">Apache Software Foundation</a>,
         Licensed under the Apache <a href="https://apache.org/licenses/LICENSE-2.0" target="_blank" rel="noreferrer">License</a>. <br/>
@@ -249,6 +251,7 @@ const config = {
         darkTheme: darkCodeTheme,
       },
     }),
+  scripts: ['/script/matomo.js'],
 };
 
 module.exports = config;