Query & Load Parquet Files

Imagine you have a large dataset stored in Parquet files. You want to share this data with your team, enabling them to query it using SQL. However, these files are too large to be stored locally and too slow to download from cloud storage every time. You can put these files on a server that is accessible to your team and run a MyDuck Server instance on it. Then, your team can query the dataset easily with either a Postgres or a MySQL client.

Below, we’ll show you how to query and load the example.parquet file from the docs/data/ directory by attaching it into a MyDuck Server container.

Steps

Run MyDuck Server:

docker run -p 13306:3306 -p 15432:5432 \
     -v /path/to/example.parquet:/home/admin/data/example.parquet \
     apecloud/myduckserver:main

Connect to MyDuck Server using psql:
```
psql -h 127.0.0.1 -p 15432 -U mysql
```

Query the Parquet file directly:

SELECT * FROM '/home/admin/data/example.parquet' LIMIT 10;

Load the Parquet file into a DuckDB table:

CREATE TABLE test_data AS SELECT * FROM '/home/admin/data/example.parquet';
SELECT * FROM test_data LIMIT 10;

Query the data with MySQL client & syntax:

mysql -h 127.0.0.1 -uroot -P13306 main

SELECT * FROM `test_data`;

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Files

load-parquet-files.md

load-parquet-files.md

Query & Load Parquet Files

Steps

Files

load-parquet-files.md

Latest commit

History

load-parquet-files.md

File metadata and controls

Query & Load Parquet Files

Steps