-
Notifications
You must be signed in to change notification settings - Fork 174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Custom fileio docs #1238
base: main
Are you sure you want to change the base?
Custom fileio docs #1238
Conversation
…n to Table behaviour options table
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for working on this! I added a few comments.
Generally I think we can keep the section light and just mention that the option exists.
i lean toward making it similar to https://py.iceberg.apache.org/configuration/#custom-catalog-implementations
wdyt?
| `schema.name-mapping.default` | Name mapping strategy | N/A | Default name mapping for schema evolution. | | ||
| `format-version` | `{1, 2}` | 2 | The version of the Iceberg table format to use. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these changes are from the other PR.
Can you rebase this PR with main so that only changes for this PR will show?
@@ -151,6 +161,62 @@ For the FileIO there are several configuration options available: | |||
|
|||
<!-- markdown-link-check-enable--> | |||
|
|||
### Custom FileIO Implementations | |||
|
|||
<!-- markdown-link-check-disable --> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this is not necessary unless there are links here that would not work
|
||
<!-- markdown-link-check-disable --> | ||
|
||
The `pyIceberg` library allows you to use custom FileIO implementations, enabling flexible file handling tailored to your specific needs. This feature is particularly useful when working with different storage backends or file formats. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: PyIceberg
#### Implementation Details | ||
|
||
The following functions are key to inferring and loading custom FileIO implementations: | ||
|
||
##### `_infer_file_io_from_scheme` | ||
|
||
```python | ||
def _infer_file_io_from_scheme(path: str, properties: Properties) -> Optional[FileIO]: | ||
``` | ||
|
||
- **Purpose**: Infers the appropriate FileIO implementation based on the scheme of the provided file path. | ||
- **Parameters**: | ||
- `path` (str): The file path from which to infer the scheme. | ||
- `properties` (Properties): Configuration properties to assist with loading. | ||
- **Returns**: An instance of `FileIO` if a suitable implementation is found; otherwise, `None`. | ||
|
||
##### Usage Example | ||
|
||
```python | ||
file_io = _infer_file_io_from_scheme("s3://my-bucket/my-file.txt", properties) | ||
``` | ||
|
||
##### `load_file_io` | ||
|
||
```python | ||
def load_file_io(properties: Properties = EMPTY_DICT, location: Optional[str] = None) -> FileIO: | ||
``` | ||
|
||
- **Purpose**: Loads the custom FileIO implementation specified in the `properties`. | ||
- **Parameters**: | ||
- `properties` (Properties): A dictionary of configuration properties, which may include `PY_IO_IMPL`. | ||
- `location` (Optional[str]): An optional location to specify the file path. | ||
- **Returns**: An instance of `FileIO`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these are not necessary as part of docs
|
||
The `pyIceberg` library allows you to use custom FileIO implementations, enabling flexible file handling tailored to your specific needs. This feature is particularly useful when working with different storage backends or file formats. | ||
|
||
#### Bringing Your Own FileIO with `PY_IO_IMPL` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: merge with above section
This PR addresses #1233.
Do let me know if any changes are to be made.