Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom fileio docs #1238

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open

Custom fileio docs #1238

wants to merge 6 commits into from

Conversation

sikehish
Copy link

This PR addresses #1233.
Do let me know if any changes are to be made.

Copy link
Contributor

@kevinjqliu kevinjqliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for working on this! I added a few comments.
Generally I think we can keep the section light and just mention that the option exists.
i lean toward making it similar to https://py.iceberg.apache.org/configuration/#custom-catalog-implementations
wdyt?

Comment on lines +58 to +59
| `schema.name-mapping.default` | Name mapping strategy | N/A | Default name mapping for schema evolution. |
| `format-version` | `{1, 2}` | 2 | The version of the Iceberg table format to use. |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these changes are from the other PR.
Can you rebase this PR with main so that only changes for this PR will show?

@@ -151,6 +161,62 @@ For the FileIO there are several configuration options available:

<!-- markdown-link-check-enable-->

### Custom FileIO Implementations

<!-- markdown-link-check-disable -->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this is not necessary unless there are links here that would not work


<!-- markdown-link-check-disable -->

The `pyIceberg` library allows you to use custom FileIO implementations, enabling flexible file handling tailored to your specific needs. This feature is particularly useful when working with different storage backends or file formats.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: PyIceberg

Comment on lines +178 to +210
#### Implementation Details

The following functions are key to inferring and loading custom FileIO implementations:

##### `_infer_file_io_from_scheme`

```python
def _infer_file_io_from_scheme(path: str, properties: Properties) -> Optional[FileIO]:
```

- **Purpose**: Infers the appropriate FileIO implementation based on the scheme of the provided file path.
- **Parameters**:
- `path` (str): The file path from which to infer the scheme.
- `properties` (Properties): Configuration properties to assist with loading.
- **Returns**: An instance of `FileIO` if a suitable implementation is found; otherwise, `None`.

##### Usage Example

```python
file_io = _infer_file_io_from_scheme("s3://my-bucket/my-file.txt", properties)
```

##### `load_file_io`

```python
def load_file_io(properties: Properties = EMPTY_DICT, location: Optional[str] = None) -> FileIO:
```

- **Purpose**: Loads the custom FileIO implementation specified in the `properties`.
- **Parameters**:
- `properties` (Properties): A dictionary of configuration properties, which may include `PY_IO_IMPL`.
- `location` (Optional[str]): An optional location to specify the file path.
- **Returns**: An instance of `FileIO`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these are not necessary as part of docs


The `pyIceberg` library allows you to use custom FileIO implementations, enabling flexible file handling tailored to your specific needs. This feature is particularly useful when working with different storage backends or file formats.

#### Bringing Your Own FileIO with `PY_IO_IMPL`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: merge with above section

@kevinjqliu kevinjqliu added this to the PyIceberg 0.9.0 release milestone Oct 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants