The python support is currently limited to DataFrames only. Please, refer to scala DataFrame documentation for the complete list of features.
Here is an example:
- Run
pyspark
providing the spark-redis jar file
$ ./bin/pyspark --jars <path-to>/spark-redis-<version>-jar-with-dependencies.jar
By default it connects to localhost:6379
without any password, you can change the connection settings in the following manner:
$ bin/spark-shell --jars <path-to>/spark-redis-<version>-jar-with-dependencies.jar --conf "spark.redis.host=localhost" --conf "spark.redis.port=6379" --conf "spark.redis.auth=passwd"
- Read DataFrame from json, write/read from Redis:
df = spark.read.json("examples/src/main/resources/people.json")
df.write.format("org.apache.spark.sql.redis").option("table", "people").option("key.column", "name").save()
loadedDf = spark.read.format("org.apache.spark.sql.redis").option("table", "people").load()
loadedDf.show()
- Check the data with redis-cli:
127.0.0.1:6379> hgetall people:Justin
1) "age"
2) "19"
3) "name"
4) "Justin"
The self-contained application can be configured in the following manner:
SparkSession\
.builder\
.appName("myApp")\
.config("spark.redis.host", "localhost")\
.config("spark.redis.port", "6379")\
.config("spark.redis.auth", "passwd")\
.getOrCreate()