Releases: cognitedata/cdp-spark-datasource
1.4.61
1.4.18
Download the release from Maven Central.
1.4.18
Fixes
- Stop reading from CDF immediately on task completion/cancellation.
This will allow Spark to start processing other tasks more quickly, especially when
there are exceptions thrown by tasks.
1.4.17
Fixes
- Handle additional uncaught exceptions locally, instead of having them kill the executor.
1.4.16
Download the release from Maven Central.
1.4.16
Fixes
- Handle some uncaught exceptions locally, instead of having them kill the executor.
1.4.15
1.4.14
Download the release from Maven Central.
1.4.14
Enhancements
- Set max retry delay on requests to 30 seconds by default, configurable via
maxRetryDelay
option.
Fixes
- Fix a potential deadlock in handling exceptions when reading and writing data from CDF.
1.4.13
Enhancements
relationships
have been added as a new resource type. See relationships
for more information.- The
labels
field is now available for assets on read and insert operations.
1.4.12
Enhancements
- Spark 3 is now supported!
labels
have been added as a new resource type. See Labels
for more information.
1.4.11
Fixes
- Fix a bug where certain operations would throw a
MatchError
instead of the intended exception type.
1.4.10
Enhancements
- Improved error message when attempting to use the asset hierarchy builder to move an asset between different root assets.
1.4.9
Enhancements
- Upgrade to Cognite Scala SDK 1.4.1
- Throw a more helpful error message when attempting to use sequences that contain columns without an externalId.
1.4.8
Fixes
- Attempting an update without specifying either
id
orexternalId
will now result in aCdfSparkException
instead of anIllegalArgumentException
.
1.4.7
Enhancements
- The
X-CDP-App
andX-CDP-ClientTag
headers can now be configured using theapplicationName
andclientTag
options.
See the Common Options section for more info.
Fixes
- Nested rows/structs are now correctly encoded as plain JSON objects when writing to RAW tables.
These were previously encoded according to the internal structure oforg.apache.spark.sql.Row
.
1.4.6
Fixes
- Use the configured batch size also when using savemode
1.4.5
Fixes
- Make all exceptions be custom exceptions with common base type.
1.4.4
Fixes
- Excludes the netty-transport-native-epoll dependency, which isn't handled
correctly by Spark's --packages support.
1.4.3
Fixes
- Still too many dependencies excluded. Please use 1.4.4 instead.
1.4.2
Download the release from Maven Central.
1.4.2
Enhancements
- Clean up dependencies to avoid evictions.
This resolves issues on Databricks where some evicted dependencies were loaded,
which were incompatible with the versions of the dependencies that should have
been used.
1.4.1
We excluded too many dependencies in this release. Please use 1.4.2 instead.
Enhancements
- Clean up dependencies to avoid evictions.
1.4.0
Breaking changes
- Metadata values are no longer silently truncated to 512 characters.
1.3.1
Enhancements
- Deletes are now supported for
datapoints
. See README.md for examples.
Fixes
- An incorrect version was used for one of the library dependencies.
1.3.0
Breaking changes
Although not breaking for most users, this release updates some core
dependencies to new major releases. In particular, it is therefore
not possible to load 1.3.x releases at the same time as 0.4.x releases.
Enhancements
-
Sequences are now supported, see README.md for examples using
sequences
andsequencerows
. -
Files now support upsert, delete, and several new fields like
dataSetId
have been added. -
Files now supports parallel retrieval.
1.2.20
Enhancements
- Improved error message when a column has a incorrect type
Fixes
- Filter pushdown can now handle null values in cases like
p in (NULL, 1, 2)
. - Asset hierarchy now handles duplicated root parentExternalId.
- NULL fields in metadata are ignored for all resource types.
1.2.19
Enhancements
- Improve data points read performance, concurrently reading different time
ranges and streaming the results to Spark as the data is received.
1.2.18
Download the release from Maven Central.
1.2.18
Enhancements
- GZip compression is enabled for all requests.
Fixes
-
"name" is now optional for upserts on assets when external id is
specified and the asset already exists. -
More efficient usage of threads.
1.2.17
Download the release from Maven Central.
1.2.17
Fixes
- Reimplement draining the read queue on a separate thread pool.
1.2.14
Download the release from Maven Central.
1.2.14
Enhancements
-
dataSetId
can now be set for asset hierarchies. -
Metrics are now reported for deletes.
Fixes
- Empty updates of assets, events, or time series no longer cause errors.
1.2.16
Download the release from Maven Central.
1.2.16
Breaking changes
- Include the latest data point when reading aggregates. Please note that this is a breaking change
and that updating to this version may change the result of reading aggregated data points.
Enhancements
-
Data points are now written in batches of 100,000 rather than 1,000.
-
The error messages thrown when one or more columns don't match will
now say which columns have the wrong type. -
Time series delete now supports the
ignoreUnknownIds
option. -
Assets now include
parentExternalId
.
Fixes
-
Schema for RAW tables will now correctly be inferred from the first 1,000 rows.
-
Release threads from the threadpool when they are no longer going to be used.