Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

release-2.2 branch merge master #1078

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
63 commits
Select commit Hold shift + click to select a range
bfe89f0
2.2.0 -> 2.3.0 (#947)
marsishandsome Jul 24, 2019
5de9ab8
Add tests for primary key (#948)
birdstorm Jul 24, 2019
d42330b
add changelog (#955)
marsishandsome Jul 29, 2019
b8354a8
add multi-column tests (#954)
birdstorm Jul 29, 2019
49ac6c5
fix range partition throw UnsupportedSyntaxException error (#960)
marsishandsome Jul 30, 2019
0fb1b6f
fix view parsing problem (#953)
zhexuany Jul 30, 2019
35f8781
make tispark can read from a hash partition table (#966)
zhexuany Jul 31, 2019
290d07c
increase ci worker number (#965)
marsishandsome Jul 31, 2019
f9e4be9
update readme for tispark-2.1.2 release (#968)
marsishandsome Jul 31, 2019
10592d4
update document for pyspark (#975)
marsishandsome Aug 1, 2019
22ee66e
fix one jar bug (#972)
marsishandsome Aug 1, 2019
86348d1
adding common port number used by spark cluster (#973)
zhexuany Aug 1, 2019
12d53b3
fix cost model in table scan (#977)
zhexuany Aug 1, 2019
9b7bb49
create an UninitializedType for TypeDecimal (#979)
zhexuany Aug 2, 2019
b5e6e53
update sparkr doc (#976)
marsishandsome Aug 2, 2019
0586fa6
use spark-2.4.3 to run ut (#978)
marsishandsome Aug 2, 2019
1973d53
a better design for get auto table id (#980)
zhexuany Aug 2, 2019
1a95291
fix bug: ci SpecialTiDBTypeTestSuite failed with tidb-3.0.1 (#984)
marsishandsome Aug 5, 2019
0c3bcfe
improve TiConfiguration getPdAddrsString function (#963)
marsishandsome Aug 5, 2019
3314557
bump grpc to 1.17 (#982)
zhexuany Aug 5, 2019
a59c6f5
Add multiple-column PK tests (#970)
birdstorm Aug 6, 2019
93128b8
add retry for batchGet (#986)
birdstorm Aug 6, 2019
102894a
use tispark self-made m2 cahce file (#990)
marsishandsome Aug 7, 2019
b5d339c
add spark sql document for batch write (#991)
marsishandsome Aug 7, 2019
aa21adb
add auto mode for test.data.load (#994)
marsishandsome Aug 7, 2019
465911f
fix typo (#996)
birdstorm Aug 8, 2019
911e890
fix index scan bug (#995)
zhexuany Aug 8, 2019
db2e53a
refine doc (#1003)
zhexuany Aug 8, 2019
827d96f
add tidb-3.0 compatibility document (#998)
marsishandsome Aug 9, 2019
db2a2d6
add log4j config document (#1008)
marsishandsome Aug 12, 2019
9307951
refactor batch write region pre-split (#999)
marsishandsome Aug 13, 2019
73d0369
add ci simple mode (#1012)
marsishandsome Aug 13, 2019
b2fcfd5
clean up redundant code (#997)
birdstorm Aug 13, 2019
af92ced
prohibit agg or groupby pushdown on double read (#1004)
zhexuany Aug 13, 2019
56c3441
remove split region code (#1015)
zhexuany Aug 13, 2019
ab9aeea
add supported scala version (#1013)
marsishandsome Aug 13, 2019
a42bc88
Fix scala compiler version (#1010)
birdstorm Aug 13, 2019
4f8a6d3
fix reflection bug for hdp release (#1017) (#1018)
marsishandsome Aug 14, 2019
f90f961
check by grammarly (#1022)
marsishandsome Aug 15, 2019
73f765e
add benchmark result for batch write (#1025)
marsishandsome Aug 16, 2019
6f00a5d
release tispark 2.1.3 (#1026) (#1035)
marsishandsome Aug 16, 2019
0f7aec6
support setting random seed in daily regression test (#1032)
marsishandsome Aug 16, 2019
e799863
Remove create in tisession (#1021)
zhexuany Aug 16, 2019
9551f7d
set tikv region size from 96M to 1M (#1031)
marsishandsome Aug 16, 2019
e7d51f5
adding unique indices test for batch write (#1014)
zhexuany Aug 17, 2019
95f9698
use one unique seed (#1043)
marsishandsome Aug 19, 2019
ce9f818
remove unused code (#1030)
birdstorm Aug 19, 2019
1cc6dc6
adding batch write pk insertion test (#1044)
zhexuany Aug 19, 2019
0e430a9
fix table not found bug in TiSession because of synchronization (#1041)
marsishandsome Aug 20, 2019
deac3fe
fix test failure (#1051)
zhexuany Aug 20, 2019
bb7c646
fix reflection bug: pass in different arguments for different version…
marsishandsome Aug 21, 2019
7e3b92d
Adding pk and unique index test for batch write (#1049)
zhexuany Aug 21, 2019
a1e6b79
fix distinct without alias bug: disable pushdown aggregate with alias…
marsishandsome Aug 21, 2019
fdb938e
improve the doc (#1053)
zhexuany Aug 22, 2019
60eec59
Refactor RegionStoreClient logic (#989)
birdstorm Aug 23, 2019
420279a
using stream rather removeIf (#1057)
zhexuany Aug 23, 2019
b72a7b2
Remove redundant pre-write/commit logic in LockResolverTest (#1062)
birdstorm Aug 23, 2019
f578c3b
adding recreate flag when create tisession (#1064)
zhexuany Aug 26, 2019
8bebb76
fix issue 1047 (#1066)
zhexuany Aug 26, 2019
927cacf
cleanup code in TiBatchWrite (#1067)
zhexuany Aug 26, 2019
b402ade
release tispark-2.1.4 (#1068) (#1069)
marsishandsome Aug 27, 2019
8888d30
update document for tispark-2.1.4 release (#1070)
marsishandsome Aug 27, 2019
22dee73
Merge remote-tracking branch 'origin/master' into feature/merge-master
marsishandsome Aug 29, 2019
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 1 addition & 2 deletions .ci/build.groovy
Original file line number Diff line number Diff line change
Expand Up @@ -7,13 +7,12 @@ def call(ghprbActualCommit, ghprbPullId, ghprbPullTitle, ghprbPullLink, ghprbPul

catchError {
node ('build') {
def ws = pwd()
deleteDir()
container("java") {
stage('Checkout') {
dir("/home/jenkins/git/tispark") {
sh """
archive_url=http://172.16.30.25/download/builds/pingcap/tiflash/cache/tiflash-m2-cache_latest.tar.gz
archive_url=http://fileserver.pingcap.net/download/builds/pingcap/tispark/cache/tispark-m2-cache-latest.tar.gz
if [ ! "\$(ls -A /maven/.m2/repository)" ]; then curl -sL \$archive_url | tar -zx -C /maven || true; fi
"""
if (sh(returnStatus: true, script: '[ -d .git ] && [ -f Makefile ] && git rev-parse --git-dir > /dev/null 2>&1') != 0) {
Expand Down
82 changes: 51 additions & 31 deletions .ci/integration_test.groovy
Original file line number Diff line number Diff line change
Expand Up @@ -6,45 +6,48 @@ def call(ghprbActualCommit, ghprbCommentBody, ghprbPullId, ghprbPullTitle, ghprb
def TIDB_BRANCH = "master"
def TIKV_BRANCH = "master"
def PD_BRANCH = "master"
def MVN_PROFILE = ""
def PARALLEL_NUMBER = 9
def MVN_PROFILE = "-Pjenkins"
def TEST_MODE = "simple"
def PARALLEL_NUMBER = 18

// parse tidb branch
def m1 = ghprbCommentBody =~ /tidb\s*=\s*([^\s\\]+)(\s|\\|$)/
if (m1) {
TIDB_BRANCH = "${m1[0][1]}"
}
m1 = null
println "TIDB_BRANCH=${TIDB_BRANCH}"

// parse pd branch
def m2 = ghprbCommentBody =~ /pd\s*=\s*([^\s\\]+)(\s|\\|$)/
if (m2) {
PD_BRANCH = "${m2[0][1]}"
}
m2 = null
println "PD_BRANCH=${PD_BRANCH}"

// parse tikv branch
def m3 = ghprbCommentBody =~ /tikv\s*=\s*([^\s\\]+)(\s|\\|$)/
if (m3) {
TIKV_BRANCH = "${m3[0][1]}"
}
m3 = null
println "TIKV_BRANCH=${TIKV_BRANCH}"

// parse mvn profile
def m4 = ghprbCommentBody =~ /profile\s*=\s*([^\s\\]+)(\s|\\|$)/
if (m4) {
MVN_PROFILE = "-P${m4[0][1]}"
MVN_PROFILE = MVN_PROFILE + " -P${m4[0][1]}"
}

// parse test mode
def m5 = ghprbCommentBody =~ /mode\s*=\s*([^\s\\]+)(\s|\\|$)/
if (m5) {
TEST_MODE = "${m5[0][1]}"
}

def readfile = { filename ->
def file = readFile filename
return file.split("\n") as List
}

def remove_last_str = { str ->
return str.substring(0, str.length() - 1)
}

def get_mvn_str = { total_chunks ->
def mvnStr = " -DwildcardSuites="
for (int i = 0 ; i < total_chunks.size() - 1; i++) {
Expand All @@ -65,8 +68,7 @@ def call(ghprbActualCommit, ghprbCommentBody, ghprbPullId, ghprbPullTitle, ghprb
println "${NODE_NAME}"
container("golang") {
deleteDir()
def ws = pwd()


// tidb
def tidb_sha1 = sh(returnStdout: true, script: "curl ${FILE_SERVER_URL}/download/refs/pingcap/tidb/${TIDB_BRANCH}/sha1").trim()
sh "curl ${FILE_SERVER_URL}/download/builds/pingcap/tidb/${tidb_sha1}/centos7/tidb-server.tar.gz | tar xz"
Expand All @@ -90,23 +92,38 @@ def call(ghprbActualCommit, ghprbCommentBody, ghprbPullId, ghprbPullTitle, ghprb
sh """
cp -R /home/jenkins/git/tispark/. ./
git checkout -f ${ghprbActualCommit}
find core/src -name '*Suite*' > test
find core/src -name '*Suite*' | grep -v 'MultiColumnPKDataTypeSuite' > test
shuf test -o test2
mv test2 test
"""

if(TEST_MODE != "simple") {
sh """
find core/src -name '*MultiColumnPKDataTypeSuite*' >> test
"""
}

sh """
sed -i 's/core\\/src\\/test\\/scala\\///g' test
sed -i 's/\\//\\./g' test
sed -i 's/\\.scala//g' test
shuf test -o test2
mv test2 test
split test -n r/$PARALLEL_NUMBER test_unit_ -a 1 --numeric-suffixes=1
split test -n r/$PARALLEL_NUMBER test_unit_ -a 2 --numeric-suffixes=1
"""

for (int i = 1; i <= PARALLEL_NUMBER; i++) {
sh """cat test_unit_$i"""
if(i < 10) {
sh """cat test_unit_0$i"""
} else {
sh """cat test_unit_$i"""
}
}

sh """
cd tikv-client
./scripts/proto.sh
cd ..
cp .ci/log4j-ci.properties core/src/test/resources/log4j.properties
bash core/scripts/version.sh
bash core/scripts/fetch-test-data.sh
mv core/src/test core-test/src/
bash tikv-client/scripts/proto.sh
"""
}

Expand All @@ -120,31 +137,35 @@ def call(ghprbActualCommit, ghprbCommentBody, ghprbPullId, ghprbPullTitle, ghprb

def run_tispark_test = { chunk_suffix ->
dir("go/src/github.com/pingcap/tispark") {
run_chunks = readfile("test_unit_${chunk_suffix}")
if(chunk_suffix < 10) {
run_chunks = readfile("test_unit_0${chunk_suffix}")
} else {
run_chunks = readfile("test_unit_${chunk_suffix}")
}

print run_chunks
def mvnStr = get_mvn_str(run_chunks)
sh """
archive_url=http://172.16.30.25/download/builds/pingcap/tiflash/cache/tiflash-m2-cache_latest.tar.gz
archive_url=http://fileserver.pingcap.net/download/builds/pingcap/tispark/cache/tispark-m2-cache-latest.tar.gz
if [ ! "\$(ls -A /maven/.m2/repository)" ]; then curl -sL \$archive_url | tar -zx -C /maven || true; fi
"""
sh """
cp .ci/log4j-ci.properties core/src/test/resources/log4j.properties
export MAVEN_OPTS="-Xmx6G -XX:MaxPermSize=512M -XX:ReservedCodeCacheSize=51M"
mvn compile ${MVN_PROFILE} -DskipCloneProtoFiles=true
mvn test ${MVN_PROFILE} -Dtest=moo ${mvnStr} -DskipCloneProtoFiles=true
mvn compile ${MVN_PROFILE}
mvn test ${MVN_PROFILE} -Dtest=moo ${mvnStr}
"""
}
}

def run_tikvclient_test = { chunk_suffix ->
dir("go/src/github.com/pingcap/tispark") {
sh """
archive_url=http://172.16.30.25/download/builds/pingcap/tiflash/cache/tiflash-m2-cache_latest.tar.gz
archive_url=http://fileserver.pingcap.net/download/builds/pingcap/tispark/cache/tispark-m2-cache-latest.tar.gz
if [ ! "\$(ls -A /maven/.m2/repository)" ]; then curl -sL \$archive_url | tar -zx -C /maven || true; fi
"""
sh """
export MAVEN_OPTS="-Xmx6G -XX:MaxPermSize=512M -XX:ReservedCodeCacheSize=512M"
mvn test ${MVN_PROFILE} -am -pl tikv-client -DskipCloneProtoFiles=true
mvn test ${MVN_PROFILE} -am -pl tikv-client
"""
unstash "CODECOV_TOKEN"
sh 'curl -s https://codecov.io/bash | bash -s - -t @CODECOV_TOKEN'
Expand All @@ -155,7 +176,6 @@ def call(ghprbActualCommit, ghprbCommentBody, ghprbPullId, ghprbPullTitle, ghprb
node("test_java") {
println "${NODE_NAME}"
container("java") {
def ws = pwd()
deleteDir()
unstash 'binaries'
unstash 'tispark'
Expand All @@ -167,17 +187,17 @@ def call(ghprbActualCommit, ghprbCommentBody, ghprbPullId, ghprbPullTitle, ghprb
killall -9 tikv-server || true
killall -9 pd-server || true
sleep 10
bin/pd-server --name=pd --data-dir=pd &>pd.log &
bin/pd-server --name=pd --data-dir=pd --config=go/src/github.com/pingcap/tispark/config/pd.toml &>pd.log &
sleep 10
bin/tikv-server --pd=127.0.0.1:2379 -s tikv --addr=0.0.0.0:20160 --advertise-addr=127.0.0.1:20160 &>tikv.log &
bin/tikv-server --pd=127.0.0.1:2379 -s tikv --addr=0.0.0.0:20160 --advertise-addr=127.0.0.1:20160 --config=go/src/github.com/pingcap/tispark/config/tikv.toml &>tikv.log &
sleep 10
ps aux | grep '-server' || true
curl -s 127.0.0.1:2379/pd/api/v1/status || true
bin/tidb-server --store=tikv --path="127.0.0.1:2379" --config=go/src/github.com/pingcap/tispark/config/tidb.toml &>tidb.log &
sleep 60
"""

timeout(60) {
timeout(120) {
run_test(chunk_suffix)
}
} catch (err) {
Expand Down
2 changes: 2 additions & 0 deletions .ci/log4j-ci.properties
Original file line number Diff line number Diff line change
Expand Up @@ -24,3 +24,5 @@ log4j.logger.org.apache.hadoop.hive.ql.exec.FunctionRegistry=ERROR

# tispark
log4j.logger.com.pingcap=ERROR
log4j.logger.com.pingcap.tispark.utils.ReflectionUtil=DEBUG
log4j.logger.org.apache.spark.sql.test.SharedSQLContext=DEBUG
2 changes: 2 additions & 0 deletions .ci/tidb_config-for-daily-test.properties
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
# The seed used to generate test data (0 means random).
test.data.generate.seed=0
147 changes: 147 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,147 @@
# TiSpark Changelog
All notable changes to this project will be documented in this file.

## [TiSpark 2.1.4] 2019-08-27
### Fixes
- Fix distinct without alias bug: disable pushdown aggregate with alias [#1055](https://github.com/pingcap/tispark/pull/1055)
- Fix reflection bug: pass in different arguments for different version of same function [#1037](https://github.com/pingcap/tispark/pull/1037)

## [TiSpark 2.1.3] 2019-08-15
### Fixes
- Fix cost model in table scan [#1023](https://github.com/pingcap/tispark/pull/1023)
- Fix index scan bug [#1024](https://github.com/pingcap/tispark/pull/1024)
- Prohibit aggregate or group by pushdown on double read [#1027](https://github.com/pingcap/tispark/pull/1027)
- Fix reflection bug for HDP release [#1017](https://github.com/pingcap/tispark/pull/1017)
- Fix scala compiler version [#1019](https://github.com/pingcap/tispark/pull/1019)

## [TiSpark 2.2.0]
### New Features
* Natively support writing data to TiKV using Spark Data Source API
* Support select from partition table [#916](https://github.com/pingcap/tispark/pull/916)
* Release one tispark jar (both support Spark-2.3.x and Spark-2.4.x) instead of two [#933](https://github.com/pingcap/tispark/pull/933)
* Add spark version to tispark udf ti_version [#943](https://github.com/pingcap/tispark/pull/943)

## [TiSpark 2.1.2] 2019-07-29
### Fixes
* Fix improper response with region error [#922](https://github.com/pingcap/tispark/pull/922)
* Fix view parsing problem [#953](https://github.com/pingcap/tispark/pull/953)

## [TiSpark 1.2.1]
### Fixes
* Fix count error, if advanceNextResponse is empty, we should read next region (#899)
* Use fixed version of proto (#898)

## [TiSpark 2.1.1]
### Fixes
* Add TiDB/TiKV/PD version and Spark version supported for each latest major release (#804) (#887)
* Fix incorrect timestamp of tidbMapDatabase (#862) (#885)
* Fix column size estimation (#858) (#884)
* Fix count error, if advanceNextResponse is empty, we should read next region (#878) (#882)
* Use fixed version of proto instead of master branch (#843) (#850)

## [TiSpark 2.1]
### Features
* Support range partition pruning (Beta) (#599)
* Support show columns command (#614)

### Fixes
* Fix build key ranges with xor expression (#576)
* Fix cannot initialize pd if using ipv6 address (#587)
* Fix default value bug (#596)
* Fix possible IndexOutOfBoundException in KeyUtils (#597)
* Fix outputOffset is incorrect when building DAGRequest (#615)
* Fix incorrect implementation of Key.next() (#648)
* Fix partition parser can't parser numerical value 0 (#651)
* Fix prefix length may be larger than the value used. (#668)
* Fix retry logic when scan meet lock (#666)
* Fix inconsistent timestamp (#676)
* Fix tempView may be unresolved when applying timestamp to plan (#690)
* Fix concurrent DAGRequest issue (#714)
* Fix downgrade scan logic (#725)
* Fix integer type default value should be parsed to long (#741)
* Fix index scan on partition table (#735)
* Fix KeyNotInRegion may occur when retrieving rows by handle (#755)
* Fix encode value long max (#761)
* Fix MatchErrorException may occur when Unsigned BigInt contains in group by columns (#780)
* Fix IndexOutOfBoundException when trying to get pd member (#788)

## [TiSpark 2.0]
### Features
* Work with Spark 2.3
* Support use `$database` statement
* Support show databases statement
* Support show tables statement
* No need to use `TiContext.mapTiDBDatabase`, use `$database.$table` to identify a table instead
* Support data type SET and ENUM
* Support data type YEAR
* Support data type TIME
* Support isolation level settings
* Support describe table command
* Support cache tables and uncache tables
* Support read from a TiDB partition table
* Support use TiDB as metastore

### Fixes
* Fix JSON parsing (#491)
* Fix count on empty table (#498)
* Fix ScanIterator unable to read from adjacent empty regions (#519)
* Fix possible NullPointerException when setting show_row_id true (#522)

### Improved
* Make ti version usable without selecting database (#545)

## [TiSpark 1.2]
### Fixes
* Fixes compatibility with PDServer #480

## [TiSpark 1.1]
### Fixes multiple bugs:
* Fix daylight saving time (DST) (#347)
* Fix count(1) result is always 0 if subquery contains limit (#346)
* Fix incorrect totalRowCount calculation (#353)
* Fix request fail with Key not in region after retrying NotLeaderError (#354)
* Fix ScanIterator logic where index may be out of bound (#357)
* Fix tispark-sql dbName (#379)
* Fix StoreNotMatch (#396)
* Fix utf8 prefix index (#400)
* Fix decimal decoding (#401)
* Refactor not leader logic (#412)
* Fix global temp view not visible in thriftserver (#437)

### Adds:
* Allow TiSpark retrieve row id (#367)
* Decode json to string (#417)

### Improvements:
* Improve PD connection issue's error log (#388)
* Add DB prefix option for TiDB tables (#416)

## [TiSpark 1.0.1]
* Fix unsigned index
* Compatible with TiDB before and since 48a42f

## [TiSpark 1.0 GA]
### New Features
TiSpark provides distributed computing of TiDB data using Apache Spark.

* Provide a gRPC communication framework to read data from TiKV
* Provide encoding and decoding of TiKV component data and communication protocol
* Provide calculation pushdown, which includes:
- Aggregate pushdown
- Predicate pushdown
- TopN pushdown
- Limit pushdown
* Provide index related support
- Transform predicate into Region key range or secondary index
- Optimize Index Only queries
- Adaptive downgrade index scan to table scan per region
* Provide cost-based optimization
- Support statistics
- Select index
- Estimate broadcast table cost
* Provide support for multiple Spark interfaces
- Support Spark Shell
- Support ThriftServer/JDBC
- Support Spark-SQL interaction
- Support PySpark Shell
- Support SparkR
10 changes: 0 additions & 10 deletions R/.gitignore

This file was deleted.

11 changes: 0 additions & 11 deletions R/DESCRIPTION

This file was deleted.

1 change: 0 additions & 1 deletion R/NAMESPACE

This file was deleted.

Loading