Skip to content

Commit

Permalink
Add guide for custom SQL database support with HSQLDB (#986)
Browse files Browse the repository at this point in the history
* Add guide for custom SQL database support with HSQLDB

This commit introduces documentation detailing the process of extending the Kotlin DataFrame library to support custom SQL databases, using HSQLDB as an example. The guide includes prerequisites, implementation of a custom database type, and example code for managing database tables and schemas. Additionally, updates have been made to reflect the possibility of registering custom SQL databases in existing files.

* Add Gradle instructions to custom SQL database guide
  • Loading branch information
zaleslaw authored Dec 5, 2024
1 parent 0478671 commit fb0853b
Show file tree
Hide file tree
Showing 3 changed files with 200 additions and 54 deletions.
1 change: 1 addition & 0 deletions docs/StardustDocs/d.tree
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,7 @@
<toc-element topic="io.md">
<toc-element topic="read.md"/>
<toc-element topic="readSqlDatabases.md"/>
<toc-element topic="readSqlFromCustomDatabase.md"/>
<toc-element topic="write.md"/>
</toc-element>
<toc-element topic="info.md">
Expand Down
84 changes: 30 additions & 54 deletions docs/StardustDocs/topics/readSqlDatabases.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,10 +32,11 @@ Also, there are a few **extension functions** available on `Connection`,
**NOTE:** This is an experimental module, and for now,
we only support four databases: MS SQL, MariaDB, MySQL, PostgreSQL, and SQLite.

Moreover, since release 0.15 we support the possibility to register custom SQL database, read more in our [guide](readSqlFromCustomDatabase.md).

Additionally, support for JSON and date-time types is limited.
Please take this into consideration when using these functions.


## Getting started with reading from SQL database in Gradle Project

In the first, you need to add a dependency
Expand Down Expand Up @@ -70,15 +71,15 @@ implementation("com.mysql:mysql-connector-j:$version")

Maven Central version could be found [here](https://mvnrepository.com/artifact/com.mysql/mysql-connector-j).

For SQLite:
For **SQLite**:

```kotlin
implementation("org.xerial:sqlite-jdbc:$version")
```

Maven Central version could be found [here](https://mvnrepository.com/artifact/org.xerial/sqlite-jdbc).

For MS SQL:
For **MS SQL**:

```kotlin
implementation("com.microsoft.sqlserver:mssql-jdbc:$version")
Expand Down Expand Up @@ -158,14 +159,17 @@ otherwise, it will be considered non-nullable for the newly created `DataFrame`
These functions read all data from a specific table in the database.
Variants with a limit parameter restrict how many rows will be read from the table.

**readSqlTable(dbConfig: DbConnectionConfig, tableName: String, limit: Int, inferNullability: Boolean): AnyFrame**
**readSqlTable(dbConfig: DbConnectionConfig, tableName: String, limit: Int, inferNullability: Boolean, dbType: DbType?): AnyFrame**

Read all data from a specific table in the SQL database and transform it into an `AnyFrame` object.

The `dbConfig: DbConnectionConfig` parameter represents the configuration for a database connection,
created under the hood and managed by the library.
Typically, it requires a URL, username, and password.

The `dbType` parameter is the type of database, could be a custom object, provided by user, optional, default is `null`,
to know more, read the [guide](readSqlFromCustomDatabase.md).

```kotlin
import org.jetbrains.kotlinx.dataframe.io.DbConnectionConfig

Expand All @@ -180,7 +184,7 @@ The `limit: Int` parameter allows setting the maximum number of records to be re
val users = DataFrame.readSqlTable(dbConfig, "Users", limit = 100)
```

**readSqlTable(connection: Connection, tableName: String, limit: Int, inferNullability: Boolean): AnyFrame**
**readSqlTable(connection: Connection, tableName: String, limit: Int, inferNullability: Boolean, dbType: DbType?): AnyFrame**

Another variant, where instead of `dbConfig: DbConnectionConfig` we use a JDBC connection: `Connection` object.

Expand Down Expand Up @@ -210,7 +214,7 @@ val users = connection.readDataFrame("Users", 100)
connection.close()
```

**Connection.readDataFrame(sqlQueryOrTableName: String, limit: Int, inferNullability: Boolean): AnyFrame**
**Connection.readDataFrame(sqlQueryOrTableName: String, limit: Int, inferNullability: Boolean, dbType: DbType?): AnyFrame**

Read all data from a specific table in the SQL database and transform it into an `AnyFrame` object.

Expand All @@ -222,7 +226,7 @@ It should not contain `;` symbol.

All other parameters are described above.

**DbConnectionConfig.readDataFrame(sqlQueryOrTableName: String, limit: Int, inferNullability: Boolean): AnyFrame**
**DbConnectionConfig.readDataFrame(sqlQueryOrTableName: String, limit: Int, inferNullability: Boolean, dbType: DbType?): AnyFrame**

If you do not have a connection object or need to run a quick,
isolated experiment reading data from an SQL database,
Expand All @@ -233,7 +237,7 @@ you can delegate the creation of the connection to `DbConnectionConfig`.
These functions execute an SQL query on the database and convert the result into a `DataFrame` object.
If a limit is provided, only that many rows will be returned from the result.

**readSqlQuery(dbConfig: DbConnectionConfig, sqlQuery: String, limit: Int, inferNullability: Boolean): AnyFrame**
**readSqlQuery(dbConfig: DbConnectionConfig, sqlQuery: String, limit: Int, inferNullability: Boolean, dbType: DbType?): AnyFrame**

Execute a specific SQL query on the SQL database and retrieve the resulting data as an AnyFrame.

Expand All @@ -249,7 +253,7 @@ val dbConfig = DbConnectionConfig("URL_TO_CONNECT_DATABASE", "USERNAME", "PASSWO
val df = DataFrame.readSqlQuery(dbConfig, "SELECT * FROM Users WHERE age > 35")
```

**readSqlQuery(connection: Connection, sqlQuery: String, limit: Int, inferNullability: Boolean): AnyFrame**
**readSqlQuery(connection: Connection, sqlQuery: String, limit: Int, inferNullability: Boolean, dbType: DbType?): AnyFrame**

Another variant, where instead of `dbConfig: DbConnectionConfig` we use a JDBC connection: `Connection` object.

Expand Down Expand Up @@ -301,16 +305,18 @@ The `dbType: DbType` parameter specifies the type of our database (e.g., Postgre
supported by a library.
Currently, the following classes are available: `H2, MsSql, MariaDb, MySql, PostgreSql, Sqlite`.

Also, users have an ability to pass objects, describing their custom databases, more information in [guide](readSqlFromCustomDatabase.md).

```kotlin
import org.jetbrains.kotlinx.dataframe.io.db.PostgreSql
import java.sql.ResultSet

val df = DataFrame.readResultSet(resultSet, PostgreSql)
```

**readResultSet(resultSet: ResultSet, connection: Connection, limit: Int, inferNullability: Boolean): AnyFrame**
**readResultSet(resultSet: ResultSet, connection: Connection, limit: Int, inferNullability: Boolean, dbType: DbType?): AnyFrame**

Another variant, where instead of `dbType: DbType` we use a JDBC connection: `Connection` object.
Another variant, we use a JDBC connection: `Connection` object.

```kotlin
import java.sql.Connection
Expand Down Expand Up @@ -340,7 +346,7 @@ val df = rs.readDataFrame(connection, 10)
connection.close()
```

**ResultSet.readDataFrame(connection: Connection, limit: Int, inferNullability: Boolean): AnyFrame**
**ResultSet.readDataFrame(connection: Connection, limit: Int, inferNullability: Boolean, dbType: DbType?): AnyFrame**

Reads the data from a `ResultSet` and converts it into a `DataFrame`.

Expand All @@ -352,7 +358,7 @@ that the `ResultSet` belongs to.
These functions read all data from all tables in the connected database.
Variants with a limit parameter restrict how many rows will be read from each table.

**readAllSqlTables(dbConfig: DbConnectionConfig, limit: Int, inferNullability: Boolean): Map\<String, AnyFrame>**
**readAllSqlTables(dbConfig: DbConnectionConfig, limit: Int, inferNullability: Boolean, dbType: DbType?): Map\<String, AnyFrame>**

Retrieves data from all the non-system tables in the SQL database and returns them as a map of table names to `AnyFrame` objects.

Expand All @@ -368,7 +374,7 @@ val dbConfig = DbConnectionConfig("URL_TO_CONNECT_DATABASE", "USERNAME", "PASSWO
val dataframes = DataFrame.readAllSqlTables(dbConfig)
```

**readAllSqlTables(connection: Connection, limit: Int, inferNullability: Boolean): Map\<String, AnyFrame>**
**readAllSqlTables(connection: Connection, limit: Int, inferNullability: Boolean, dbType: DbType?): Map\<String, AnyFrame>**

Another variant, where instead of `dbConfig: DbConnectionConfig` we use a JDBC connection: `Connection` object.

Expand All @@ -389,7 +395,7 @@ The purpose of these functions is to facilitate the retrieval of table schema.
By providing a table name and either a database configuration or connection,
these functions return the [DataFrameSchema](schema.md) of the specified table.

**getSchemaForSqlTable(dbConfig: DbConnectionConfig, tableName: String): DataFrameSchema**
**getSchemaForSqlTable(dbConfig: DbConnectionConfig, tableName: String, dbType: DbType?): DataFrameSchema**

This function captures the schema of a specific table from an SQL database.

Expand All @@ -405,7 +411,7 @@ val dbConfig = DbConnectionConfig("URL_TO_CONNECT_DATABASE", "USERNAME", "PASSWO
val schema = DataFrame.getSchemaForSqlTable(dbConfig, "Users")
```

**getSchemaForSqlTable(connection: Connection, tableName: String): DataFrameSchema**
**getSchemaForSqlTable(connection: Connection, tableName: String, dbType: DbType?): DataFrameSchema**

Another variant, where instead of `dbConfig: DbConnectionConfig` we use a JDBC connection: `Connection` object.

Expand All @@ -427,7 +433,7 @@ These functions return the schema of an SQL query result.
Once you provide a database configuration or connection and an SQL query,
they return the [DataFrameSchema](schema.md) of the query result.

**getSchemaForSqlQuery(dbConfig: DbConnectionConfig, sqlQuery: String): DataFrameSchema**
**getSchemaForSqlQuery(dbConfig: DbConnectionConfig, sqlQuery: String, dbType: DbType?): DataFrameSchema**

This function executes an SQL query on the database and then retrieves the resulting schema.

Expand All @@ -443,7 +449,7 @@ val dbConfig = DbConnectionConfig("URL_TO_CONNECT_DATABASE", "USERNAME", "PASSWO
val schema = DataFrame.getSchemaForSqlQuery(dbConfig, "SELECT * FROM Users WHERE age > 35")
```

**getSchemaForSqlQuery(connection: Connection, sqlQuery: String): DataFrameSchema**
**getSchemaForSqlQuery(connection: Connection, sqlQuery: String, dbType: DbType?): DataFrameSchema**

Another variant, where instead of `dbConfig: DbConnectionConfig` we use a JDBC connection: `Connection` object.

Expand Down Expand Up @@ -472,11 +478,11 @@ val schema = connection.getDataFrameSchema("SELECT * FROM Users WHERE age > 35")

connection.close()
```
**Connection.getDataFrameSchema(sqlQueryOrTableName: String): DataFrameSchema**
**Connection.getDataFrameSchema(sqlQueryOrTableName: String, dbType: DbType?): DataFrameSchema**

Retrieves the schema of an SQL query result or an SQL table using the provided database configuration.

**DbConnectionConfig.getDataFrameSchema(sqlQueryOrTableName: String): DataFrameSchema**
**DbConnectionConfig.getDataFrameSchema(sqlQueryOrTableName: String, dbType: DbType?): DataFrameSchema**

Retrieves the schema of an SQL query result or an SQL table using the provided database configuration.

Expand Down Expand Up @@ -507,49 +513,19 @@ The `dbType: DbType` parameter specifies the type of our database (e.g., Postgre
supported by a library.
Currently, the following classes are available: `H2, MariaDb, MySql, PostgreSql, Sqlite`.

Also, users have an ability to pass objects, describing their custom databases, more information in [guide](readSqlFromCustomDatabase.md).

```kotlin
import org.jetbrains.kotlinx.dataframe.io.db.PostgreSql
import java.sql.ResultSet

val schema = DataFrame.getSchemaForResultSet(resultSet, PostgreSql)
```

**getSchemaForResultSet(connection: Connection, sqlQuery: String): DataFrameSchema**

Another variant, where instead of `dbType: DbType` we use a JDBC connection: `Connection` object.

```kotlin
import java.sql.Connection
import java.sql.DriverManager

val connection = DriverManager.getConnection("URL_TO_CONNECT_DATABASE")

val schema = DataFrame.getSchemaForResultSet(resultSet, connection)

connection.close()
```

### Extension functions for schema reading from the ResultSet

The same example, rewritten with the extension function:

```kotlin
import java.sql.Connection
import java.sql.DriverManager

val connection = DriverManager.getConnection("URL_TO_CONNECT_DATABASE")

val schema = resultSet.getDataFrameSchema(connection)

connection.close()
```

if you are using this extension function

**ResultSet.getDataFrameSchema(connection: Connection): DataFrameSchema**

or

```kotlin
import org.jetbrains.kotlinx.dataframe.io.db.PostgreSql
import java.sql.ResultSet
Expand All @@ -566,7 +542,7 @@ based on
These functions return a list of all [`DataFrameSchema`](schema.md) from all the non-system tables in the SQL database.
They can be called with either a database configuration or a connection.

**getSchemaForAllSqlTables(dbConfig: DbConnectionConfig): Map\<String, DataFrameSchema>**
**getSchemaForAllSqlTables(dbConfig: DbConnectionConfig, dbType: DbType?): Map\<String, DataFrameSchema>**

This function retrieves the schema of all tables from an SQL database
and returns them as a map of table names to [`DataFrameSchema`](schema.md) objects.
Expand All @@ -583,7 +559,7 @@ val dbConfig = DbConnectionConfig("URL_TO_CONNECT_DATABASE", "USERNAME", "PASSWO
val schemas = DataFrame.getSchemaForAllSqlTables(dbConfig)
```

**getSchemaForAllSqlTables(connection: Connection): Map\<String, DataFrameSchema>**
**getSchemaForAllSqlTables(connection: Connection, dbType: DbType?): Map\<String, DataFrameSchema>**

This function retrieves the schema of all tables using a JDBC connection: `Connection` object
and returns them as a list of [`DataFrameSchema`](schema.md).
Expand Down
Loading

0 comments on commit fb0853b

Please sign in to comment.