Releases: apache/kyuubi
Kyuubi v0.4.0 released
Hi,
After thorough and extensive testing and running on our real world online cluster safely more than a week, we decide to release Kyuubi v0.4.0.
All tests and feedback are still widely and warmly welcomed! Thanks!
Kyuubi v0.4.0 added some key features, such as,
KYUUBI-105 - Support High Availability with Failover Mode
KYUUBI-118 - Incremental Result Collection Support
KYUUBI-114 - Support to Run in Local Mode
KYUUBI-122 - obtain delegation tokens from possible kerberized services
KYUUBI-128 - LADP Authentication Support
and so on...
Also, v0.4.0 fixed a lot of issues, such as
KYUUBI-109 - add meaningful msg for front service port conflicts
KYUUBI-107 - kill yarn application fast when spark context fails to initialize
KYUUBI-133 - token expiration in HadoopRDD getPartitions
KYUUBI-108 - kyuubi server side direct memory oom
and so on...
Kyuubi v0.4.0 is automatically built against Apache Spark 2.1.3 by default, but it should work well with all Spark releases above 2.1.0 .
All helps are warmly welcomed, thanks!
Best Regards
Kent
Preparing Kyuubi v0.4.0 release
Hi,
We are preparing to release Kyuubi v0.4.0, all tests and feedback are welcomed! Thanks!
Kyuubi v0.4.0 added some key features, such as,
KYUUBI-105 - Support High Availability with Failover Mode
KYUUBI-118 - Incremental Result Collection Support
KYUUBI-114 - Support to Run in Local Mode
KYUUBI-122 - obtain delegation tokens from possible kerberized services
KYUUBI-128 - LADP Authentication Support
and so on...
Also, v0.4.0 fixed a lot of issues, such as
KYUUBI-109 - add meaningful msg for front service port conflicts
KYUUBI-107 - kill yarn application fast when spark context fails to initialize
KYUUBI-133 - token expiration in HadoopRDD getPartitions
KYUUBI-108 - kyuubi server side direct memory oom
and so on...
Kyuubi v0.4.0 is automatically built against Apache Spark 2.1.3 by default, but it should work well with all Spark releases above 2.1.0 .
All helps are warm welcomed, thanks!
Best Regards
Kent
v0.3.1
Kyuubi 0.3.0 released
Kyuubi is an enhanced edition of the Apache Spark's primordial Thrift JDBC/ODBC Server. It is mainly designed for directly running SQL towards a cluster with all components including HDFS, YARN, Hive MetaStore, and itself secured. Kyuubi is a Spark SQL thrift service with end-to-end multi tenant guaranteed. Please go to Kyuubi Architecture to learn more if you are interested.
Basically, the Thrift JDBC/ODBC Server as a similar ad-hoc SQL query service of Apache Hive's HiveServer2 for Spark SQL, acts as a distributed query engine using its JDBC/ODBC or command-line interface.
In this mode, end-users or applications can interact with Spark SQL directly to run SQL queries, without the need to write any code. We can make pretty business reports with massive data using some BI tools which supported JDBC/ODBC connections, such as Tableau, NetEase YouData and so on. Profiting from Apache Spark's capability, we can archive much more performance improvement than Apache Hive as a SQL on Hadoop service.
But unfortunately, due to the limitations of Spark's own architecture,to be used as an enterprise-class product, there are a number of problems compared with HiveServer2,such as multi-tenant isolation, authentication/authorization, high concurrency, high availability, and so on. And the Apache Spark community's support for this module has been in a state of prolonged stagnation.
Kyuubi has enhanced the Thrift JDBC/ODBC Server in some ways for solving these existing problems, as shown in the following table.
Features | Spark Thrift Server | Kyuubi | Comments |
---|---|---|---|
multiple SparkContext | ✘ | ✔ | User tagged SparkContext |
lazy SparkContext | ✘ | ✔ | Session level SparkContext |
SparkContext cache | ✘ | ✔ | SparkContext Cache Management |
dynamic queue | ✘ | ✔ | Kyuubi identifies spark.yarn.queue in the connection string. |
session level configurations | spark.sql.* |
✔ | Dynamic Resource Requesting |
authentication | ✔ | ✔ | Authentication/Security Guide |
authorization | ✘ | ✔ | Kyuubi ACL Management Guide |
impersonation | ✘ | ✔ | Kyuubi fully support hive.server2.proxy.user and hive.server2.doAs |
multi tenancy | ✘ | ✔ | Based on the above features,Kyuubi is able to run as a multi-tenant server on a LCE supported Yarn cluster. |
operation log | ✘ | ✔ | Kyuubi redirect sql operation log to local file which has an interface for the client to fetch. |
high availability | ✘ | ✔ | ZooKeeper Dynamic Service Discovery |
containerization | ✘ | ✔ | Kyuubi Containerization Guide |
type mapping | ✘ | ✔ | Kyuubi support Spark result/schema to be directly converted to Thrift result/schemas bypassing Hive format results |
Getting Started
Packaging
Please refer to the Building Kyuubi in the online documentation for an overview on how to build Kyuubi.
Start Kyuubi
We can start Kyuubi with the built-in startup script bin/start-kyuubi.sh
.
First of all, export SPARK_HOME
in $KYUUBI_HOME/bin/kyuubi-env.sh
export SPARK_HOME=/the/path/to/a/runable/spark/binary/dir
And then the last, start Kyuubi with bin/start-kyuubi.sh
$ bin/start-kyuubi.sh \
--master yarn \
--deploy-mode client \
--driver-memory 10g \
--conf spark.kyuubi.frontend.bind.port=10009
Run Spark SQL on Kyuubi
Now you can use beeline, Tableau or Thrift API based programs to connect to Kyuubi server.
Stop Kyuubi
bin/stop-kyuubi.sh
Multi Tenancy Support
Prerequisites
Kyuubi may work well with different deployments such as non-secured Yarn, Standalone, Mesos or even local mode, but it is mainly designed for a secured HDFS/Yarn Cluster on which Kyuubi will play well with multi tenant and secure features.
Suppose that you already have a secured HDFS cluster for deploying Spark, Hive or other applications.
Configure Yarn
- YARN Secure Containers
- To configure the NodeManager to use the LinuxExecutorCantainer
- Queues(Optional), please refer to Capacity Scheduler or Fair Scheduler to see more.
Spark on Yarn
- Setup for Spark On Yarn Ensure that
HADOOP_CONF_DIR
orYARN_CONF_DIR
points to the directory which contains the (client side) configuration files for the Hadoop cluster.
Configure Hive
- Configuration of Hive is done by placing your
hive-site.xml
,core-site.xml
andhdfs-site.xml
files in$SPARK_HOME/conf
.
Configuration
Please refer to the Configuration Guide in the online documentation for an overview on how to configure Kyuubi.
Authentication
Please refer to the Authentication/Security Guide in the online documentation for an overview on how to enable security for Kyuubi.
Additional Documentations
Building Kyuubi
Kyuubi Deployment Guide
Configuration Guide
Authentication/Security Guide
Kyuubi ACL Management Guide
Kyuubi Architecture
Kyuubi v0.2.0
Kyuubi v0.2.0 released
1. How was this version verified
- PASS . run tpc_ds 99 queries concurrently via multi users on kyuubi against spark 2.1.2 💯
- PASS . run tpc_ds 99 queries concurrently via multi users on kyuubi against spark 2.2.1 💯
- PASS . run tpc_ds 99 queries concurrently via multi users on kyuubi against spark 2.3.0 💯
2. What is Kyuubi
Kyuubi is an enhanced edition of the Apache Spark's primordial Thrift JDBC/ODBC Server. It is mainly designed for directly running SQL towards a cluster with all components including HDFS, YARN, Hive MetaStore, and itself secured.
Kyuubi is a Spark SQL thrift service with end-to-end multi tenant guaranteed. Please go to Kyuubi Architecture to learn more if you are interested.
3. Kyuubi v.s. Thrift JDBC/ODBC Server(Spark)
Basically, the Thrift JDBC/ODBC Server as a similar ad-hoc SQL query service of Apache Hive's HiveServer2 for Spark SQL, acts as a distributed query engine using its JDBC/ODBC or command-line interface. In this mode, end-users or applications can interact with Spark SQL directly to run SQL queries, without the need to write any code. We can make pretty business reports with massive data using some BI tools which supported JDBC/ODBC connections, such as Tableau, NetEase YouData and so on. Benefitting from Apache Spark's capability, we can archive much more performance improvement than Apache Hive as a SQL on Hadoop service.
But unfortunately, due to the limitations of Spark's own architecture,to be used as an enterprise-class product, there are a number of problems compared with HiveServer2,such as multi-tenant isolation, authentication/authorization, high concurrency, high availability, and so on. And the Apache Spark community's support for this module has been in a state of prolonged stagnation.
Kyuubi has enhanced the Thrift JDBC/ODBC Server in some ways for solving these existing problems, as shown in the following table.
Features | Thrift Server | Kyuubi | Comments |
---|---|---|---|
multiple SparkContext |
✘ | ✔ | Spark has several issues to have multiple SparkContext instances in one single JVM. Option spark.driver.allowMultipleContexts=true only enables SparkContext to be instantiated many times but these instances can only share and use the scheduler and execution environments of the last initialized one, which is kind of like a shallow copy of a Java object. Kyuubi provides a way of isolating these components by user to avoid overlapping. |
"lazy" SparkContext |
✘ | ✔ | Each SparkContext initialization is delayed to the phase of first session of a particular user's creation in Kyuubi, while Thrift JDBC/ODBC Server create one only when it starts. |
SparkContext cache |
✘ | ✔ | In Thrift JDBC/ODBC Server, SparkContext is a resident variable. Kyuubi will cache SparkContext instances for a while after session closed before the server terminating them. |
dynamic queue | ✘ | ✔ | We use spark.yarn.queue to specifying the queue that Spark on Yarn applications run into. Once Thrift JDBC/ODBC Server started, it becomes unchangeable, while HiveServer2 could switch queue byset mapred.job.queue.name=thequeue . Kyuubi adopts a compromise method which could identify and use spark.yarn.queue in the connection string. |
session level configurations | spark.sql.* |
✔ | Kyuubi supports all Spark/Hive/Hadoop configurations, such as spark.executor.cores/memory , to be set in the connection string which will be used to initialize SparkContext . |
authentication | ✔ | ✔ | Please refer to the Authentication/Security Guide |
authorization | ✘ | ✘ | Spark Authorizer will be add to Kyuubi soon. |
impersonation | ✘ | ✔ | Kyuubi fully support hive.server2.proxy.user and hive.server2.doAs |
multi tenancy | ✘ | ✔ | Based on the above features,Kyuubi is able to run as a multi-tenant server on a LCE supported Yarn cluster. |
operation log | ✘ | ✔ | Kyuubi redirect sql operation log to local file which has an interface for the client to fetch. |
high availability | ✘ | ✔ | Based on ZooKeeper dynamic service discovery |
cluster mode | ✘ | ✘ | yarn cluster mode will be supported soon |
type mapping | ✘ | ✔ | Kyuubi support Spark result/schema to be directly converted to Thrift result/schemas bypassing Hive format results |
4. How to use Kyuubi
Image that you just download the release package and replace or deploy as the spark thrift server with the following steps.
We can start Kyuubi with the built-in startup script bin/start-kyuubi.sh
.
First of all, export SPARK_HOME
in $KYUUBI_HOME/bin/kyuubi-env.sh
export SPARK_HOME=/the/path/to/a/runable/spark/binary/dir
And then the last, start Kyuubi with bin/start-kyuubi.sh
$ bin/start-kyuubi.sh \
--master yarn \
--deploy-mode client \
--driver-memory 10g \
--conf spark.kyuubi.frontend.bind.port=10009
Now you can use beeline, Tableau or Thrift API based programs to connect to Kyuubi server.
5. Prerequisites
Suppose that you already have a secured HDFS cluster for deploying Spark, Hive or other applications.
5.1 Spark on Yarn
- Setup for Spark On Yarn Ensure that
HADOOP_CONF_DIR
orYARN_CONF_DIR
points to the directory which contains the (client side) configuration files for the Hadoop cluster.
5.2 Configure Hive
- Configuration of Hive is done by placing your
hive-site.xml
,core-site.xml
andhdfs-site.xml
files in$SPARK_HOME/conf
.
6. Additional Documentations
Building Kyuubi
Configuration Guide
Authentication/Security Guide
Kyuubi Architecture
Release Candidate 1st For Kyuubi v0.1.0
Extract hive-thriftserver
sub project from Apache Spark as an individual project named Kyuubi which supports multi tenancy feature on Yarn