Avro schemas used in RADAR-base. The schemas are organized as follows:
- The
commons
directory contains all schemas used inside Kafka and data fed into Kafka.- In the
active
subdirectory, add schemas for active data collection, like questionnaires or assignments. - In the
catalogue
subdirectory, modify schemas for cataloguing data types. - In the
kafka
subdirectory, add schemas used throughtout Kafka, like record keys. - In the
monitor
subdirectory, add schemas for monitoring applications that gather data. - In the
passive
subdirectory, add schemas for passive data collection, like wearables. - In the
stream
subdirectory, add schemas used in Kafka Streams.
- In the
- The
specifications
directory contains specifications of what data types are collected through which devices.- Java SDKs for each of the components are provided in the
java-sdk
folder, see installation instructions there. They are automatically generated from the Avro schemas using the Avro specification (version in Versions.kt).
- Java SDKs for each of the components are provided in the
This project can be used in RADAR-base by using the radarbase/kafka-init
Docker image. The schemas and specifications can be extended by locally creating a directory structure that includes a commons
and specifications
directory and mounting it to the image, to the /schema/conf/commons
and /schema/conf/specifications
directories, respectively. You can provide a file path in CONFIG_YAML
that points to a config.yaml
file that is mounted in the docker container. The config file has the following format:
# Specify any Kafka properties needed to connect to the Kafka cluster
kafka:
security.protocol: PLAINTEXT
# Configure additional topics, or specify the properties of a topic listed elsewhere.
topics:
my_custom_topic:
# Enable configuration of this topic
enabled: true
# Number of partitions to use, if newly created
partitions: 3
# Replication factor, if newly created
replicationFactor: 2
# Topic properties in Kafka. See
# <https://docs.confluent.io/platform/current/installation/configuration/topic-configs.html#ak-topic-configurations-for-cp>
properties:
cleanup.policy: compact
# Key schema for the topic
keySchema: my.key.Schema
# Value schema for the topic
valueSchema: my.value.Schema
# Whether to register the schemas to the Schema Registry
registerSchema: false
# Schema configuration. This refers to the files in the commons directory.
schemas:
# Only include given schema directory files. You can use File glob syntax as described in <https://docs.oracle.com/javase/8/docs/api/java/nio/file/FileSystem.html#getPathMatcher-java.lang.String->
# If include is specified, exclude will be ignored. The glob pattern should start from the commons directory.
include: []
# Exclude all given schema directory files. You can use File glob syntax as described in <https://docs.oracle.com/javase/8/docs/api/java/nio/file/FileSystem.html#getPathMatcher-java.lang.String->
# If include is specified, exclude will be ignored. The glob pattern should start from the commons directory.
exclude:
- active/**
# You can specify additional schemas, using the format for each respective specification directory.
monitor:
# The object name is the path it would have, the value is a plain string containing a JSON object
application/application_uptime2.avsc: >
{
"namespace": "org.radarcns.monitor.application",
"type": "record",
"name": "ApplicationUptime2",
"doc": "Length of application uptime.",
"fields": [
{ "name": "time", "type": "double", "doc": "Device timestamp in UTC (s)." },
{ "name": "uptime", "type": "double", "doc": "Time since last app start (s)." }
]
}
# Source configuration. This refers to the files in the specifications directory.
sources:
# Only include given specification directory files. You can use File glob syntax as described in <https://docs.oracle.com/javase/8/docs/api/java/nio/file/FileSystem.html#getPathMatcher-java.lang.String->
# If include is specified, exclude will be ignored. The glob pattern should start from the specifications directory.
include:
- passive/*
# Exclude all given specification directory files. You can use File glob syntax as described in <https://docs.oracle.com/javase/8/docs/api/java/nio/file/FileSystem.html#getPathMatcher-java.lang.String->
# If include is specified, exclude will be ignored. The glob pattern should start from the specifications directory.
exclude: []
# You can specify additional sources, using the format for each respective specification directory.
monitor:
- vender: test
model: test
version: 1.0.0
data:
type: UPTIME
topic: application_uptime2
value_schema: .monitor.application.ApplicationUptime2
Please see the inline comments for more information on their values.
The Avro schemas should follow the Google JSON style guide.
In addition, schemas in the commons
directory should follow the following guidelines:
- Try to avoid abbreviations in the field names and write out the field name instead.
- There should be no need to add
value
at the end of a field name. - Enumerator items should be written in uppercase characters separated by underscores.
- Add documentation (the
doc
property) to each schema, each field, and each enum. The documentation should show in text what is being measured, how, and what units or ranges are applicable. Abbreviations and acronyms in the documentation should be written out. Each doc property should start with a capital and end with a period. - Prefer a categorical specification (an Avro enum) over a free string if that categorization is expected to remain very stable. This disambiguates the possible values for analysis. If a field is expected to be extended outside this project or very often within this project, use a free string instead.
- Prefer a flat record over a hierarchical record. This simplifies the organization of the data downstream, for example, when mapping to CSV.
- Prefer written out fields to arrays. This simplifies the organization of the data downstream, for example, when mapping to CSV.
- Give each schema a proper namespace, preferably
org.radarcns.passive.<vendor>
fully in lowercase, without any numbers, uppercase letters or symbols (except.
). For the Empatica E4, the vendor is Empatica, so the namespace isorg.radarcns.passive.empatica
. For generic types, like a phone, Android Wear device or Android application, the namespace could just beorg.radarcns.passive.phone
,org.radarcns.passive.wear
, ororg.radarcns.monitor.application
. - In the schema name, use upper camel case and name the device explicitly (for example,
EmpaticaE4Temperature
). - For fields that are inherent to a record, and will never be removed or renamed, no default value is needed. For all other fields:
- if the type is an enum, use an
UNKNOWN
symbol as default value - otherwise, set the type to a union of
["null", <intended type>]
and set the default value tonull
.
- if the type is an enum, use an
Avro schemas are automatically validated against RADAR-base guide lines while building. For more details, check catalog validator.
The RADAR schema tools can be tested locally using Docker. To run the tools, first install Docker. Then run
docker-compose build
docker-compose up -d zookeeper-1 kafka-1 schema-registry-1
Now you can run tools commands with
# usage
docker-compose run --rm tools
# validation
docker-compose run --rm tools radar-schemas-tools validate /schema/merged
# list topic information
docker-compose run --rm tools radar-schemas-tools list /schema/merged
# register schemas with the schema registry
docker-compose run --rm tools radar-schemas-tools register http://schema-registry-1:8081 /schema/merged
# create topics with zookeeper
docker-compose run --rm tools radar-schemas-tools create -s kafka-1:9092 -b 1 -r 1 -p 1 /schema/merged
# run source-catalogue webservice
docker-compose run -p 8080:8080 --rm tools radar-catalog-server -p 8080 /schema/merged
# and in a separate console, run
curl localhost:8080/source-types
# back up the _schemas topic
docker-compose run --rm tools radar-schemas-tools schema-topic --backup -f schema.json -b 1 -s kafka-1:9092 -f /schema/conf/backup.json /schema/merged
# ensure the validity of the _schemas topic
docker-compose run --rm tools radar-schemas-tools schema-topic --ensure -f schema.json -b 1 -s kafka-1:9092 -f /schema/conf/backup.json -r 1 /schema/merged
-
Create topics on Confluent Cloud
1.1. Create a
config.yaml
file. A Confluent Cloud config for Java application based on this template.kafka: bootstrap.servers: {{ BROKER_ENDPOINT }} security.protocol: SASL_SSL sasl.jaas.config: org.apache.kafka.common.security.plain.PlainLoginModule required username="{{ CLUSTER_API_KEY }}" password="{{ CLUSTER_API_SECRET }}"; ssl.endpoint.identification.algorithm: https sasl.mechanism: PLAIN
1.2. Run
create
commanddocker run --rm -v "$PWD/config.yaml:/etc/radar-schemas-tools/config.yaml" radarbase/radar-schemas-tools radar-schemas-tools create -c /etc/radar-schemas-tools/config.yaml /schema/merged
-
Register schemas on Confluent Cloud schema registry
docker run --rm -v "$PWD/config.yaml:/etc/radar-schemas-tools/config.yaml" radarbase/radar-schemas-tools radar-schemas-tools register -c /etc/radar-schemas-tools/config.yaml -u SR_API_KEY -p SR_API_SECRET SR_ENDPOINT /schema/merged
Note that the
SR_ENDPOINT
and/schema/merged
are positional arguments and should be placed at the end of the command.