title | sidebar_label |
---|---|
Spark Connectors for Pravega |
Overview |
This documentation describes the connector API and usage to read and write Pravega streams with Apache Spark.
Build end-to-end stream processing and batch pipelines that use Pravega as the stream storage and message bus, and Apache Spark for computation over the streams.
- Getting Started
- Samples
- Configuration
- Compatibility Matrix
- Building the Connector
- Features & Highlights
- Limitations
- Releases
- Pre-Built Artifacts
- Learn More
- Support
- About
- Exactly-once processing guarantees for both Reader and Writer, supporting end-to-end exactly-once processing pipelines
- A Spark micro-batch reader connector allows Spark streaming applications to read Pravega Streams. Pravega stream cuts (i.e. offsets) are used to reliably recover from failures and provide exactly-once semantics.
- A Spark batch reader connector allows Spark batch applications to read Pravega Streams.
- A Spark writer allows Spark batch and streaming applications to write to Pravega Streams. Writes are optionally contained within Pravega transactions, providing exactly-once semantics.
- Seamless integration with Spark's checkpoints.
- Parallel Readers and Writers supporting high throughput and low latency processing.
The latest releases can be found on the Github Release project page.
Releases are published to Maven Central. Spark and Gradle will automatically download the required artifacts. However, if you wish, you may download the artifacts manually using the links below.
The pre-built artifacts are available in the following locations:
- Maven Central (releases)
- GitHub Packages (snapshots)
Don’t hesitate to ask! Contact the developers and community on Slack (signup) if you need any help. Open an issue if you found a bug on Github Issues.
Spark Connectors for Pravega is 100% open source and community-driven. All components are available under Apache 2 License on GitHub.