Skip to content

Open-source metadata collector based on ODD Specification

License

Notifications You must be signed in to change notification settings

Appen/odd-collector

 
 

Repository files navigation

Code style: black

odd-collector

ODD Collector is a lightweight service that gathers metadata from all your data sources.

To learn more about collector types and ODD Platform's architecture, read the documentation.

Preview:

Implemented adapters

Service Config example
Cassandra config
ClickHouse config
Dbt config
Elasticsearch config
Feast config
Hive config
Kafka config
Kubeflow config
MariaDB config, supported via MySql adapter
MongoDB config
MSSql config
MySql config
Neo4j config
PostgreSQL config
Presto config
Redash config
Redshift config
Snowflake config
Superset config
Tableau config
Tarantool config
Trino config
Vertica config
ODBC config, README.md
Cube config
ODD Adapter config
Apache Druid config
Oracle config

Class diagram of adapter class hierarchy

This may help you to understand which fields you need for each adapter in collector_config.yaml and also may be helpful for a new adapter developer. Adapter domain class hierarchy

PlantUML code for above diagram: domain_classes.plantuml

To regenerate picture, you have 2 options:

  1. Having PlantUML installed locally, do
java -jar plantuml.jar domain_classes.plantuml
  1. Use PyCharm or other IDE's PlantUML plugin

Building

docker build .

M1 building issue

libraries pyodbc , confluent-kafka and grpcio have problem during installing and building project on Mac M1.

Possible solutions

# NOTE: be aware of versions
# NOTE: easiest way is to add all export statements to your .bashrc/.zshrc file

# pyodbc dependencies
brew install unixodbc freetds openssl

export LDFLAGS="-L/opt/homebrew/lib  -L/opt/homebrew/Cellar/unixodbc/2.3.11/include -L/opt/homebrew/opt/freetds/lib -L/opt/homebrew/opt/openssl@3/lib"
export CFLAGS="-I/opt/homebrew/Cellar/unixodbc/2.3.11/include -I/opt/homebrew/opt/freetds/include"
export CPPFLAGS="-I/opt/homebrew/include -I/opt/homebrew/Cellar/unixodbc/2.3.11/include -I/opt/homebrew/opt/openssl@3/include"

# cunfluent-kafka
brew install librdkafka

export C_INCLUDE_PATH=/opt/homebrew/Cellar/librdkafka/1.9.0/include
export LIBRARY_PATH=/opt/homebrew/Cellar/librdkafka/1.9.0/lib
export PATH="/opt/homebrew/opt/openssl@3/bin:$PATH"

# grpcio
export GRPC_PYTHON_BUILD_SYSTEM_OPENSSL=1
export GRPC_PYTHON_BUILD_SYSTEM_ZLIB=1

Docker compose example

Custom .env file for docker-compose.yaml

LOGLEVEL=DEBUG
PLATFORM_HOST_URL=http://odd-platform:8080
POSTGRES_PASSWORD=postgres_password_secret

There are 3 options for config field pass:

  1. Explicitly set it in collector_config.yaml file, i.e database: odd-platform-db
  2. Use .env file or ENV variables
  3. In situation when plugins have same field names, we can explicitly set ENV variable to collector_config.yaml, i.e. password: !ENV ${POSTGRES_PASSWORD}

Custom collector-config.yaml

platform_host_url: http://localhost:8080
default_pulling_interval: 10
token: ""
plugins:
  - type: postgresql
    name: test_postgresql_adapter
    host: "localhost"
    port: 5432
    database: "some_database_name"
    user: "some_user_name"
    password: !ENV ${POSTGRES_PASSWORD}
  - type: mysql
    name: test_mysql_adapter
    host: "localhost"
    port: 3306
    database: "some_database_name"
    user: "some_user_name"
    password: "some_password"

docker-compose.yaml

version: "3.8"
services:
  # --- ODD Platform ---
  database:
    ...
  odd-platform:
    ...
  
  odd-collector:
    image: ghcr.io/opendatadiscovery/odd-collector:latest
    restart: always
    volumes:
      - collector_config.yaml:/app/collector_config.yaml
    environment:
      - PLATFORM_HOST_URL=${PLATFORM_HOST_URL}
      - POSTGRES_PASSWORD=${POSTGRES_PASSWORD}
    depends_on:
      - odd-platform

About

Open-source metadata collector based on ODD Specification

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 99.7%
  • Other 0.3%