1.0.0 (2022-03-12)

Added:

A new trait ConnectorInterface that simplify the use of custom connectors
New traits in io.github.setl.internal:
- CanVacuum
- CanUpdate
- CanPartition
- CanWait
New IO methods in SparkRepository:
- drop
- delete
- create
- vacuum
- awaitTermination
- stopStreaming

Changed:

Parameters of the method DeltaConnector.update
Parameters of the method DeltaConnector.partition
Parameter readCache in Setl.setSparkRepository was renamed to cacheData to avoid ambiguity
Deprecated FileConnector.delete() to avoid ambiguity (use FileConnector.drop() instead)
Upgraded spark-cassandra-connector to 3.0.0 for the mvn profile spark_3.0
New logo
Update Delta version to v1.0 (PR #234)

Fixed:

DeltaConnector reader options (PR #170)

Removed:

Deprecated methods and constructors

1.0.0-RC2 (2021-03-31)

BREAKING CHANGE:

Change group id to io.github.setl-framework (PR #192)

1.0.0-RC1 (2020-08-19)

Added:

Spark 3.0 support

Changed:

Downgraded default hadoop version to 3.2.0

Fixed:

Save mode in DynamoDB Connector

0.4.3 (2020-07-10)

Changed:

Updated spark-cassandra-connector from 2.4.2 to 2.5.0 (PR #117)
Updated spark-excel-connector from 0.12.4 to 0.13.1 (PR #117)
Updated spark-dynamodb-connector from 1.0.1 to 1.0.4 (PR #117)
Updated scalatest (scope test) from 3.1.0 to 3.1.2 (PR #117)
Updated postgresql (scope test) from 42.2.9 to 42.2.12 (PR #117)

Added:

Added pipeline dependency check before starting the spark job (PR #114)
Added default Spark job group and description (PR #116)
Added StructuredStreamingConnector (PR #119)
Added DeltaConnector (PR #118)
Added ZipArchiver that can zip files/directories (PR #124)

Fixed

Fixed path separator in FileConnectorSuite that cause test failure
Fixed Setl.hasExternalInput that always returns false (PR #121)

0.4.2 (2020-02-15)

Fixed cross building issue (#111)

0.4.1 (2020-02-13)

Changed:

Changed benchmark unit of time to seconds (#88)
Improved test coverage

Fixed:

The master URL of SparkSession can now be overwritten in local environment (#74)
FileConnector now lists path correctly for nested directories (#97)

Added:

Added Mermaid diagram generation to Pipeline (#51)
Added showDiagram() method to Pipeline that prints the Mermaid code and generates the live editor URL 🎩🐰✨ (#52)
Added Codecov report and Scala API doc
Added delete method in JDBCConnector (#82)
Added drop method in DBConnector (#83)

Added support for both of the following two Spark configuration styles in SETL builder (#86)

setl.config {
  spark {
    spark.app.name = "my_app"
    spark.sql.shuffle.partitions = "1000"
  }
}

setl.config_2 {
  spark.app.name = "my_app"
  spark.sql.shuffle.partitions = "1000"
}

0.4.0 (2020-01-09)

Changed:

BREAKING CHANGE: Renamed DCContext to Setl
Changed the default application environment config path into setl.environment
Changed the default context config path into setl.config
Optimized DeliverableDispatcher
Optimized PipelineInspector (#33)

Fixed:

Fixed issue of DynamoDBConnector that doesn't take user configuration
Fixed issue of CompoundKey annotation. Now SparkRepository handles correctly columns having multiple compound keys. (#36)

Added:

Added support for private variable delivery (#24)
Added empty SparkRepository as placeholder (#30)
Added annotation Benchmark that could be used on methods of an AbstractFactory (#35)

0.3.5 (2019-12-16)

BREAKING CHANGE: replace the Spark compatible version by the Scala compatible version in the artifact ID. The old artifact id dc-spark-sdk_2.4 was changed to dc-spark-sdk_2.11 (or dc-spark-sdk_2.12)
Upgraded dependencies
Added Scala 2.12 support
Removed SparkSession from Connector and SparkRepository constructor (old constructors are kept but now deprecated)
Added Column type support in FindBy method of SparkRepository and Condition
Added method setConnector and setRepository in Setl that accept object of type Connector/SparkRepository

0.3.4 (2019-12-06)

Added read cache into spark repository to avoid consecutive disk IO.
Added option autoLoad in the Delivery annotation so that DeliverableDispatcher can still handle the dependency injection in the case where the delivery is missing but a corresponding repository is present.
Added option condition in the Delivery annotation to pre-filter loaded data when autoLoad is set to true.
Added option id in the Delivery annotation. DeliveryDispatcher will match deliveries by the id in addition to the payload type. By default the id is an empty string ("").
Added setConnector method in DCContext. Each connector should be delivered with an ID. By default the ID will be itsconfig path.
Added support of wildcard path for SparkRepository and Connector
Added JDBCConnector

0.3.3 (2019-10-22)

Added SnappyCompressor.
Added method persist(persistence: Boolean) into Stage and Factory to activate/deactivate output persistence. By default the output persistence is set to true.
Added implicit method filter(cond: Set[Condition]) for Dataset and DataFrame.
Added setUserDefinedSuffixKey and getUserDefinedSuffixKey to SparkRepository.

0.3.2 (2019-10-14)

Added @Compress annotation. SparkRepository will compress all columns having this annotation by using a Compressor (the default compressor is XZCompressor)

case class CompressionDemo(@Compress col1: Seq[Int],
                           @Compress(compressor = classOf[GZIPCompressor]) col2: Seq[String])

Added interface Compressor and implemented XZCompressor and GZIPCompressor
Added SparkRepositoryAdapter[A, B]. It will allow a SparkRepository[A] to write/read a data store of type B by using an implicit DatasetConverter[A, B]
Added trait Converter[A, B] that handles the conversion between an object of type A and an object of type B
Added abstract class DatasetConverter[A, B] that extends a Converter[Dataset[A], Dataset[B]]
Added auto-correction for SparkRepository.findby(conditions) method when we filter by case class field name instead of column name
Added DCContext that simplifies the creation of SparkSession, SparkRepository, Connector and Pipeline
Added a builder for ConfigLoader to simplify the instantiation of a ConfigLoader object
Added readStandardJSON and writeStandardJSON method into JSONConnector to read/write standard JSON format file

0.3.1 (2019-08-23)

Added sequential mode in class Stage. Use can turn in on by setting parallel to true.
Added external data flow description in pipeline description
Added method beforeAll into ConfigLoader
Added new method addStage and addFactory that take a class object as input. The instantiation will be handled by the stage.
Removed implicit argument encoder from all methods of Repository trait
Added new get method to Pipeline: get[A](cls: Class[_ <: Factory[_]): A.

0.3.0 (2019-07-22)

New Features

Added Delivery annotation to handle inputs of a Factory

class Foo {
  @Delivery(producer = classOf[Factory1], optional = true)
  var input1: String = _

  @Delivery(producer = classOf[Factory2])
  var input2: String = _
}

Added an optional argument suffix in FileConnector and SparkRepository
Added method partitionBy in FileConnector and SparkRepository
Added possibility to filter by name pattern when a FileConnector is trying to read a directory. To do this, add filenamePattern into the configuration file
Added possibility to create a Conf object from Map.
```
Conf(Map("a" -> "A"))
```
Improved Hadoop and S3 compatibility of connectors

Developper Features

Added DispatchManager class. It will dispatch its deliverable object to setters (denoted by @Delivery) of a factory
Added Deliverable class, which contains a payload to be delivered
Added PipelineInspector to describe a pipeline
Added FileConnector and DBConnector

Fixed Issue

Fixed issue of file path containing whitespace character(s) in the URI creation (52eee322aacd85e0b03a96435b07c4565e894934)

Other changes

Removed EnrichedConnector
Removed V1 interfaces

0.2.8 (2019-07-09)

Added a second argument to CompoundKey to handle primary and sort keys

0.2.7 (2019-06-21)

Added Conf into SparkRepositoryBuilder and changed all the set methods of SparkRepositoryBuilder to use the conf object
Changed package name io.github.setl.annotations to io.github.setl.annotation

0.2.6 (2019-06-18)

Added annotation ColumnName, which could be used to replace the current column name with an alias in the data storage.
Added annotation CompoundKey. It could be used to define a compound key for databases that only allow one partition key
Added sheet name into arguments of ExcelConnector

0.2.5 (2019-06-12)

Added DynamoDB V2 repository
Added auxiliary constructors of case class Condition
Added SchemaConverter

0.2.4 (2019-06-11)

Added DynamoDB Repository

0.2.3 (2019-06-11)

Removed scope provided from connectors and TypeSafe config

0.2.2 (2019-06-11)

Added DynamoDB Connector

0.2.1 (2019-06-03)

Removed unnecessary Type variable in Connector
Added ConnectorBuilder to directly build a connector from a typesafe's Config object
Added auxiliary constructor in SparkRepositoryBuilder
Added enumeration AppEnv

0.2.0 (2019-05-21)

Changed spark version to 2.4.3
Added SparkRepositoryBuilder that allows creation of a SparkRepository for a given class without creating a dedicated Repository class
Added Excel support for SparkRepository by creating ExcelConnector
Added Logging trait

0.1.6 (2019-04-25)

Fixed Factory class covariance issue (0764d10d616c3171d9bfd58acfffafbd8b9dda15)
Added documentation

0.1.5 (2019-04-23)

Added changelog
Changed .gitlab-ci.yml to speed up CI

0.1.4 (2019-04-19)

Added unit tests
Added .gitlab-ci.yml

Files

CHANGELOG.md

Latest commit

History

CHANGELOG.md

File metadata and controls

1.0.0 (2022-03-12)

Added:

Changed:

Fixed:

Removed:

1.0.0-RC2 (2021-03-31)

BREAKING CHANGE:

1.0.0-RC1 (2020-08-19)

Added:

Changed:

Fixed:

0.4.3 (2020-07-10)

Changed:

Added:

Fixed

0.4.2 (2020-02-15)

0.4.1 (2020-02-13)

Changed:

Fixed:

Added:

0.4.0 (2020-01-09)

Changed:

Fixed:

Added:

0.3.5 (2019-12-16)

0.3.4 (2019-12-06)

0.3.3 (2019-10-22)

0.3.2 (2019-10-14)

0.3.1 (2019-08-23)

0.3.0 (2019-07-22)

New Features

Developper Features

Fixed Issue

Other changes

0.2.8 (2019-07-09)

0.2.7 (2019-06-21)

0.2.6 (2019-06-18)

0.2.5 (2019-06-12)

0.2.4 (2019-06-11)

0.2.3 (2019-06-11)

0.2.2 (2019-06-11)

0.2.1 (2019-06-03)

0.2.0 (2019-05-21)

0.1.6 (2019-04-25)

0.1.5 (2019-04-23)

0.1.4 (2019-04-19)