Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: optimize IO path for reading manifest #2396

Merged
merged 8 commits into from
May 29, 2024

Conversation

wjones127
Copy link
Contributor

@wjones127 wjones127 commented May 26, 2024

Fixes #2338
Partially addresses #2318

For a dataset on local SSD with 8,000 versions, we get 6x faster load time and 3x faster append.

  • Added special code path for local filesystem for finding latest manifest. This path skips the metadata call for paths that aren't relevant, both fixing Dataset not found during frequent writes #2338 and improving performance on local filesystems overall.
  • Fixed code path where we were reading the manifest file twice
  • Changed CloudObjectReader and LocalFileReader to both cache the file size, so we aren't making multiple calls to get the size of the same object/file. Also allowed passing the size when opening, in case we already have it from a list operation.
  • Deprecated some more methods for loading a dataset, in favor of using DatasetBuilder. Also consolidated the implementations to use DatasetBuilder, so we have fewer code paths to worry about and test.

TODO

  • Cleanup
  • Add IO unit test for loading a dataset
  • Check repro from 2318

Copy link

ACTION NEEDED

Lance follows the Conventional Commits specification for release automation.

The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification.

For details on the error please inspect the "PR Title Check" action.

@wjones127 wjones127 changed the title Fix/manifest listing perf: optimize IO path for reading manifest May 26, 2024
@codecov-commenter
Copy link

codecov-commenter commented May 26, 2024

Codecov Report

Attention: Patch coverage is 80.45977% with 51 lines in your changes are missing coverage. Please review.

Project coverage is 79.77%. Comparing base (8c1ee00) to head (066fdab).
Report is 6 commits behind head on main.

Files Patch % Lines
rust/lance/src/dataset.rs 79.78% 15 Missing and 4 partials ⚠️
...ust/lance-table/src/io/commit/external_manifest.rs 50.00% 9 Missing and 4 partials ⚠️
rust/lance-table/src/io/commit.rs 86.95% 6 Missing and 3 partials ⚠️
rust/lance-io/src/local.rs 78.26% 1 Missing and 4 partials ⚠️
rust/lance-io/src/object_reader.rs 73.33% 4 Missing ⚠️
rust/lance/src/utils/test.rs 66.66% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2396      +/-   ##
==========================================
- Coverage   79.99%   79.77%   -0.22%     
==========================================
  Files         200      200              
  Lines       54519    55158     +639     
  Branches    54519    55158     +639     
==========================================
+ Hits        43612    44004     +392     
- Misses       8389     8619     +230     
- Partials     2518     2535      +17     
Flag Coverage Δ
unittests 79.77% <80.45%> (-0.22%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

// This is an optimized function that searches for the latest manifest. In
// object_store, list operations lookup metadata for each file listed. This
// method only gets the metadata for the found latest manifest.
fn current_manifest_local(base: &Path) -> std::io::Result<Option<ManifestLocation>> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just curious, why we can't get the latest manifest in /path/to/dataset/latest.manifest

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We no longer use latest.manifest. It was a very flawed design, given mutable files and object storage don't mix well. This explained in detail in #1362, #1365.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gotcha, thanks, but I see some documents in our codebase still describe latest.manifest though, like in docs/format/rst

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will get rid of it.

Comment on lines +77 to +91
async fn resolve_latest_location(
&self,
base_path: &Path,
object_store: &ObjectStore,
) -> std::result::Result<ManifestLocation, Error> {
let path = self.resolve_latest_version(base_path, object_store).await?;
Ok(ManifestLocation {
version: self
.resolve_latest_version_id(base_path, object_store)
.await?,
path,
size: None,
})
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tried not to make a breaking change in ExternalManifestCommitHandler. LMK if you object to anything here.

docs/format.rst Outdated
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We never read the _latest.manifest file, so we should remove mention of it. In a later PR, I will have us stop writing it.

@wjones127 wjones127 marked this pull request as ready for review May 28, 2024 20:30
@@ -111,13 +115,26 @@ impl Reader for LocalObjectReader {
}

/// Returns the file size.
async fn size(&self) -> Result<usize> {
Ok(self.file.metadata()?.len() as usize)
async fn size(&self) -> object_store::Result<usize> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you! for object_store::Result

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still hate the error handling overall. Will refactor it later.

Copy link
Contributor

@westonpace westonpace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solid optimization, thanks :)

rust/lance-table/src/io/commit/external_manifest.rs Outdated Show resolved Hide resolved
Co-authored-by: Weston Pace <weston.pace@gmail.com>
@wjones127 wjones127 merged commit 9a0fa1f into lancedb:main May 29, 2024
20 checks passed
eddyxu pushed a commit that referenced this pull request May 29, 2024
Fixes #2338
Partially addresses #2318

**For a dataset on local SSD with 8,000 versions, we get 6x faster load
time and 3x faster append.**

* Added special code path for local filesystem for finding latest
manifest. This path skips the `metadata` call for paths that aren't
relevant, both fixing #2338 and improving performance on local
filesystems overall.
* Fixed code path where we were reading the manifest file twice
* Changed `CloudObjectReader` and `LocalFileReader` to both cache the
file size, so we aren't making multiple calls to get the size of the
same object/file. Also allowed passing the size when opening, in case we
already have it from a list operation.
* Deprecated some more methods for loading a dataset, in favor of using
`DatasetBuilder`. Also consolidated the implementations to use
`DatasetBuilder`, so we have fewer code paths to worry about and test.

## TODO

* [x] Cleanup
* [x] Add IO unit test for loading a dataset
* [x] Check repro from 2318

---------

Co-authored-by: Weston Pace <weston.pace@gmail.com>
renovate bot added a commit to spiraldb/vortex that referenced this pull request Jun 12, 2024
[![Mend
Renovate](https://app.renovatebot.com/images/banner.svg)](https://renovatebot.com)

This PR contains the following updates:

| Package | Type | Update | Change |
|---|---|---|---|
| [lance](https://togithub.com/lancedb/lance) | dependencies | minor |
`0.10.16` -> `0.12.0` |

---

### Release Notes

<details>
<summary>lancedb/lance (lance)</summary>

### [`v0.12.1`](https://togithub.com/lancedb/lance/releases/tag/v0.12.1)

[Compare
Source](https://togithub.com/lancedb/lance/compare/v0.12.0...v0.12.1)

<!-- Release notes generated using configuration in .github/release.yml
at v0.12.1 -->

#### What's Changed

##### Bug Fixes 🐛

- fix: incorrect chunking was making lance datasets use too much RAM by
[@&#8203;westonpace](https://togithub.com/westonpace) in
[lancedb/lance#2438

**Full Changelog**:
lancedb/lance@v0.12.0...v0.12.1

### [`v0.12.0`](https://togithub.com/lancedb/lance/releases/tag/v0.12.0)

[Compare
Source](https://togithub.com/lancedb/lance/compare/v0.11.1...v0.12.0)

<!-- Release notes generated using configuration in .github/release.yml
at v0.12.0 -->

#### What's Changed

##### Breaking Changes 🛠

- feat: change dataset uri to return full qualified url instead of
object store path by [@&#8203;eddyxu](https://togithub.com/eddyxu) in
[lancedb/lance#2416

##### New Features 🎉

- feat: new shuffler by
[@&#8203;BubbleCal](https://togithub.com/BubbleCal) in
[lancedb/lance#2404
- feat: new index builder by
[@&#8203;BubbleCal](https://togithub.com/BubbleCal) in
[lancedb/lance#2401
- feat: stable row id manifest changes by
[@&#8203;wjones127](https://togithub.com/wjones127) in
[lancedb/lance#2363
- feat: once a table has been created with v1 or v2 format then it
should always use that format by
[@&#8203;westonpace](https://togithub.com/westonpace) in
[lancedb/lance#2435

##### Bug Fixes 🐛

- fix: fix file writer which was not writing page buffers in the correct
order by [@&#8203;westonpace](https://togithub.com/westonpace) in
[lancedb/lance#2413

##### Other Changes

- refactor: refactor logical decoders into "field decoders" by
[@&#8203;westonpace](https://togithub.com/westonpace) in
[lancedb/lance#2407
- refactor: rename use_experimental_writer to use_legacy_format by
[@&#8203;westonpace](https://togithub.com/westonpace) in
[lancedb/lance#2433
- refactor: minor refactor to allow I/O scheduler to be cloned in page
schedulers by [@&#8203;westonpace](https://togithub.com/westonpace) in
[lancedb/lance#2432

**Full Changelog**:
lancedb/lance@v0.11.1...v0.12.0

### [`v0.11.1`](https://togithub.com/lancedb/lance/releases/tag/v0.11.1)

[Compare
Source](https://togithub.com/lancedb/lance/compare/v0.11.0...v0.11.1)

<!-- Release notes generated using configuration in .github/release.yml
at v0.11.1 -->

#### What's Changed

##### New Features 🎉

- feat(java): support jdk8 by
[@&#8203;LuQQiu](https://togithub.com/LuQQiu) in
[lancedb/lance#2362
- feat: support kmode with hamming distance by
[@&#8203;eddyxu](https://togithub.com/eddyxu) in
[lancedb/lance#2366
- feat: row id index structures (experimental) by
[@&#8203;wjones127](https://togithub.com/wjones127) in
[lancedb/lance#2303
- feat: update merge_insert to add statistics for inserted, updated,
deleted rows by [@&#8203;raunaks13](https://togithub.com/raunaks13) in
[lancedb/lance#2357
- feat: define Flat index as a scan over VectorStorage by
[@&#8203;chebbyChefNEQ](https://togithub.com/chebbyChefNEQ) in
[lancedb/lance#2380
- feat: add some schema utility methods to the v2 reader/writer by
[@&#8203;westonpace](https://togithub.com/westonpace) in
[lancedb/lance#2389
- feat: general compression for value page buffer by
[@&#8203;niyue](https://togithub.com/niyue) in
[lancedb/lance#2368
- feat: make the index cache size (in bytes) available by
[@&#8203;westonpace](https://togithub.com/westonpace) in
[lancedb/lance#2381
- feat: add special uri scheme to use CloudFileReader for local fs by
[@&#8203;chebbyChefNEQ](https://togithub.com/chebbyChefNEQ) in
[lancedb/lance#2402
- feat: add encoder utilities for pushdown by
[@&#8203;westonpace](https://togithub.com/westonpace) in
[lancedb/lance#2388

##### Bug Fixes 🐛

- fix: concat batches before writing to avoid small IO slow down by
[@&#8203;chebbyChefNEQ](https://togithub.com/chebbyChefNEQ) in
[lancedb/lance#2384
- fix: low recall if the num partitions is more than num rows by
[@&#8203;BubbleCal](https://togithub.com/BubbleCal) in
[lancedb/lance#2386
- fix: f32 reduce_min for x86 by
[@&#8203;heiher](https://togithub.com/heiher) in
[lancedb/lance#2385
- fix: fix incorrect validation logic in updater by
[@&#8203;westonpace](https://togithub.com/westonpace) in
[lancedb/lance#2408

##### Performance Improvements 🚀

- perf: make VectorStorage and DistCalculator static to generate better
code by [@&#8203;BubbleCal](https://togithub.com/BubbleCal) in
[lancedb/lance#2355
- perf: optimize IO path for reading manifest by
[@&#8203;wjones127](https://togithub.com/wjones127) in
[lancedb/lance#2396

##### Other Changes

- refactor: make proto conversion fallible and not copy by
[@&#8203;wjones127](https://togithub.com/wjones127) in
[lancedb/lance#2371
- refactor: separate take and schema evolution impls to own files by
[@&#8203;wjones127](https://togithub.com/wjones127) in
[lancedb/lance#2372
- Revert "fix: concat batches before writing to avoid small IO slow down
([#&#8203;2384](https://togithub.com/lancedb/lance/issues/2384))" by
[@&#8203;chebbyChefNEQ](https://togithub.com/chebbyChefNEQ) in
[lancedb/lance#2387
- refactor: shuffle around v2 metadata sections to allow read-on-demand
statistics by [@&#8203;westonpace](https://togithub.com/westonpace) in
[lancedb/lance#2400

#### New Contributors

- [@&#8203;niyue](https://togithub.com/niyue) made their first
contribution in
[lancedb/lance#2368
- [@&#8203;heiher](https://togithub.com/heiher) made their first
contribution in
[lancedb/lance#2385

**Full Changelog**:
lancedb/lance@v0.11.0...v0.11.1

### [`v0.11.0`](https://togithub.com/lancedb/lance/releases/tag/v0.11.0)

[Compare
Source](https://togithub.com/lancedb/lance/compare/v0.10.18...v0.11.0)

<!-- Release notes generated using configuration in .github/release.yml
at v0.11.0 -->

#### What's Changed

##### Breaking Changes 🛠

- feat(rust)!: use BoxedError in Error::IO by
[@&#8203;broccoliSpicy](https://togithub.com/broccoliSpicy) in
[lancedb/lance#2329

##### New Features 🎉

- feat: add v2 support to fragment merge / update paths by
[@&#8203;westonpace](https://togithub.com/westonpace) in
[lancedb/lance#2311
- feat: add priority to I/O scheduler by
[@&#8203;westonpace](https://togithub.com/westonpace) in
[lancedb/lance#2315
- feat: add take_rows operation to the v2 file reader's python bindings
by [@&#8203;westonpace](https://togithub.com/westonpace) in
[lancedb/lance#2331
- feat: added example for reading and writing dataset in rust by
[@&#8203;raunaks13](https://togithub.com/raunaks13) in
[lancedb/lance#2349
- feat: new HNSW implementation by
[@&#8203;BubbleCal](https://togithub.com/BubbleCal) in
[lancedb/lance#2353
- feat: add fragment take / fixed-size-binary support to v2 format by
[@&#8203;westonpace](https://togithub.com/westonpace) in
[lancedb/lance#2354

##### Bug Fixes 🐛

- fix: recognize a simple expression like 'is_foo' as a scalar index
query by [@&#8203;westonpace](https://togithub.com/westonpace) in
[lancedb/lance#2356
- fix: rework list encoder to handle list-struct by
[@&#8203;westonpace](https://togithub.com/westonpace) in
[lancedb/lance#2344
- fix: minor bug fixes for v2 by
[@&#8203;westonpace](https://togithub.com/westonpace) in
[lancedb/lance#2361

##### Documentation 📚

- docs: clearify comments in table.proto -> message DataFragment ->
physical_rows by
[@&#8203;broccoliSpicy](https://togithub.com/broccoliSpicy) in
[lancedb/lance#2346

##### Performance Improvements 🚀

- perf: use the file metadata cache in scalar indices by
[@&#8203;westonpace](https://togithub.com/westonpace) in
[lancedb/lance#2330

##### Other Changes

- chore: remove `m_max` and `use_heuristic` params from HNSW builder by
[@&#8203;BubbleCal](https://togithub.com/BubbleCal) in
[lancedb/lance#2336
- fix(java): fix JNI jar loader issue by
[@&#8203;LuQQiu](https://togithub.com/LuQQiu) in
[lancedb/lance#2340
- ci: fix labeler permissions by
[@&#8203;wjones127](https://togithub.com/wjones127) in
[lancedb/lance#2348
- fix: rework decoding to fix bugs in nested struct decoding by
[@&#8203;westonpace](https://togithub.com/westonpace) in
[lancedb/lance#2337

#### New Contributors

- [@&#8203;broccoliSpicy](https://togithub.com/broccoliSpicy) made their
first contribution in
[lancedb/lance#2346
- [@&#8203;raunaks13](https://togithub.com/raunaks13) made their first
contribution in
[lancedb/lance#2349

**Full Changelog**:
lancedb/lance@v0.10.18...v0.11.0

</details>

---

### Configuration

📅 **Schedule**: Branch creation - At any time (no schedule defined),
Automerge - At any time (no schedule defined).

🚦 **Automerge**: Enabled.

♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the
rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about this update
again.

---

- [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check
this box

---

This PR has been generated by [Mend
Renovate](https://www.mend.io/free-developer-tools/renovate/). View
repository job log
[here](https://developer.mend.io/github/spiraldb/vortex).

<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNy4zOTMuMCIsInVwZGF0ZWRJblZlciI6IjM3LjM5My4wIiwidGFyZ2V0QnJhbmNoIjoiZGV2ZWxvcCIsImxhYmVscyI6W119-->

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
AdamGS pushed a commit to AdamGS/vortex that referenced this pull request Jun 14, 2024
[![Mend
Renovate](https://app.renovatebot.com/images/banner.svg)](https://renovatebot.com)

This PR contains the following updates:

| Package | Type | Update | Change |
|---|---|---|---|
| [lance](https://togithub.com/lancedb/lance) | dependencies | minor |
`0.10.16` -> `0.12.0` |

---

### Release Notes

<details>
<summary>lancedb/lance (lance)</summary>

### [`v0.12.1`](https://togithub.com/lancedb/lance/releases/tag/v0.12.1)

[Compare
Source](https://togithub.com/lancedb/lance/compare/v0.12.0...v0.12.1)

<!-- Release notes generated using configuration in .github/release.yml
at v0.12.1 -->

#### What's Changed

##### Bug Fixes 🐛

- fix: incorrect chunking was making lance datasets use too much RAM by
[@&#8203;westonpace](https://togithub.com/westonpace) in
[lancedb/lance#2438

**Full Changelog**:
lancedb/lance@v0.12.0...v0.12.1

### [`v0.12.0`](https://togithub.com/lancedb/lance/releases/tag/v0.12.0)

[Compare
Source](https://togithub.com/lancedb/lance/compare/v0.11.1...v0.12.0)

<!-- Release notes generated using configuration in .github/release.yml
at v0.12.0 -->

#### What's Changed

##### Breaking Changes 🛠

- feat: change dataset uri to return full qualified url instead of
object store path by [@&#8203;eddyxu](https://togithub.com/eddyxu) in
[lancedb/lance#2416

##### New Features 🎉

- feat: new shuffler by
[@&#8203;BubbleCal](https://togithub.com/BubbleCal) in
[lancedb/lance#2404
- feat: new index builder by
[@&#8203;BubbleCal](https://togithub.com/BubbleCal) in
[lancedb/lance#2401
- feat: stable row id manifest changes by
[@&#8203;wjones127](https://togithub.com/wjones127) in
[lancedb/lance#2363
- feat: once a table has been created with v1 or v2 format then it
should always use that format by
[@&#8203;westonpace](https://togithub.com/westonpace) in
[lancedb/lance#2435

##### Bug Fixes 🐛

- fix: fix file writer which was not writing page buffers in the correct
order by [@&#8203;westonpace](https://togithub.com/westonpace) in
[lancedb/lance#2413

##### Other Changes

- refactor: refactor logical decoders into "field decoders" by
[@&#8203;westonpace](https://togithub.com/westonpace) in
[lancedb/lance#2407
- refactor: rename use_experimental_writer to use_legacy_format by
[@&#8203;westonpace](https://togithub.com/westonpace) in
[lancedb/lance#2433
- refactor: minor refactor to allow I/O scheduler to be cloned in page
schedulers by [@&#8203;westonpace](https://togithub.com/westonpace) in
[lancedb/lance#2432

**Full Changelog**:
lancedb/lance@v0.11.1...v0.12.0

### [`v0.11.1`](https://togithub.com/lancedb/lance/releases/tag/v0.11.1)

[Compare
Source](https://togithub.com/lancedb/lance/compare/v0.11.0...v0.11.1)

<!-- Release notes generated using configuration in .github/release.yml
at v0.11.1 -->

#### What's Changed

##### New Features 🎉

- feat(java): support jdk8 by
[@&#8203;LuQQiu](https://togithub.com/LuQQiu) in
[lancedb/lance#2362
- feat: support kmode with hamming distance by
[@&#8203;eddyxu](https://togithub.com/eddyxu) in
[lancedb/lance#2366
- feat: row id index structures (experimental) by
[@&#8203;wjones127](https://togithub.com/wjones127) in
[lancedb/lance#2303
- feat: update merge_insert to add statistics for inserted, updated,
deleted rows by [@&#8203;raunaks13](https://togithub.com/raunaks13) in
[lancedb/lance#2357
- feat: define Flat index as a scan over VectorStorage by
[@&#8203;chebbyChefNEQ](https://togithub.com/chebbyChefNEQ) in
[lancedb/lance#2380
- feat: add some schema utility methods to the v2 reader/writer by
[@&#8203;westonpace](https://togithub.com/westonpace) in
[lancedb/lance#2389
- feat: general compression for value page buffer by
[@&#8203;niyue](https://togithub.com/niyue) in
[lancedb/lance#2368
- feat: make the index cache size (in bytes) available by
[@&#8203;westonpace](https://togithub.com/westonpace) in
[lancedb/lance#2381
- feat: add special uri scheme to use CloudFileReader for local fs by
[@&#8203;chebbyChefNEQ](https://togithub.com/chebbyChefNEQ) in
[lancedb/lance#2402
- feat: add encoder utilities for pushdown by
[@&#8203;westonpace](https://togithub.com/westonpace) in
[lancedb/lance#2388

##### Bug Fixes 🐛

- fix: concat batches before writing to avoid small IO slow down by
[@&#8203;chebbyChefNEQ](https://togithub.com/chebbyChefNEQ) in
[lancedb/lance#2384
- fix: low recall if the num partitions is more than num rows by
[@&#8203;BubbleCal](https://togithub.com/BubbleCal) in
[lancedb/lance#2386
- fix: f32 reduce_min for x86 by
[@&#8203;heiher](https://togithub.com/heiher) in
[lancedb/lance#2385
- fix: fix incorrect validation logic in updater by
[@&#8203;westonpace](https://togithub.com/westonpace) in
[lancedb/lance#2408

##### Performance Improvements 🚀

- perf: make VectorStorage and DistCalculator static to generate better
code by [@&#8203;BubbleCal](https://togithub.com/BubbleCal) in
[lancedb/lance#2355
- perf: optimize IO path for reading manifest by
[@&#8203;wjones127](https://togithub.com/wjones127) in
[lancedb/lance#2396

##### Other Changes

- refactor: make proto conversion fallible and not copy by
[@&#8203;wjones127](https://togithub.com/wjones127) in
[lancedb/lance#2371
- refactor: separate take and schema evolution impls to own files by
[@&#8203;wjones127](https://togithub.com/wjones127) in
[lancedb/lance#2372
- Revert "fix: concat batches before writing to avoid small IO slow down
([#&#8203;2384](https://togithub.com/lancedb/lance/issues/2384))" by
[@&#8203;chebbyChefNEQ](https://togithub.com/chebbyChefNEQ) in
[lancedb/lance#2387
- refactor: shuffle around v2 metadata sections to allow read-on-demand
statistics by [@&#8203;westonpace](https://togithub.com/westonpace) in
[lancedb/lance#2400

#### New Contributors

- [@&#8203;niyue](https://togithub.com/niyue) made their first
contribution in
[lancedb/lance#2368
- [@&#8203;heiher](https://togithub.com/heiher) made their first
contribution in
[lancedb/lance#2385

**Full Changelog**:
lancedb/lance@v0.11.0...v0.11.1

### [`v0.11.0`](https://togithub.com/lancedb/lance/releases/tag/v0.11.0)

[Compare
Source](https://togithub.com/lancedb/lance/compare/v0.10.18...v0.11.0)

<!-- Release notes generated using configuration in .github/release.yml
at v0.11.0 -->

#### What's Changed

##### Breaking Changes 🛠

- feat(rust)!: use BoxedError in Error::IO by
[@&#8203;broccoliSpicy](https://togithub.com/broccoliSpicy) in
[lancedb/lance#2329

##### New Features 🎉

- feat: add v2 support to fragment merge / update paths by
[@&#8203;westonpace](https://togithub.com/westonpace) in
[lancedb/lance#2311
- feat: add priority to I/O scheduler by
[@&#8203;westonpace](https://togithub.com/westonpace) in
[lancedb/lance#2315
- feat: add take_rows operation to the v2 file reader's python bindings
by [@&#8203;westonpace](https://togithub.com/westonpace) in
[lancedb/lance#2331
- feat: added example for reading and writing dataset in rust by
[@&#8203;raunaks13](https://togithub.com/raunaks13) in
[lancedb/lance#2349
- feat: new HNSW implementation by
[@&#8203;BubbleCal](https://togithub.com/BubbleCal) in
[lancedb/lance#2353
- feat: add fragment take / fixed-size-binary support to v2 format by
[@&#8203;westonpace](https://togithub.com/westonpace) in
[lancedb/lance#2354

##### Bug Fixes 🐛

- fix: recognize a simple expression like 'is_foo' as a scalar index
query by [@&#8203;westonpace](https://togithub.com/westonpace) in
[lancedb/lance#2356
- fix: rework list encoder to handle list-struct by
[@&#8203;westonpace](https://togithub.com/westonpace) in
[lancedb/lance#2344
- fix: minor bug fixes for v2 by
[@&#8203;westonpace](https://togithub.com/westonpace) in
[lancedb/lance#2361

##### Documentation 📚

- docs: clearify comments in table.proto -> message DataFragment ->
physical_rows by
[@&#8203;broccoliSpicy](https://togithub.com/broccoliSpicy) in
[lancedb/lance#2346

##### Performance Improvements 🚀

- perf: use the file metadata cache in scalar indices by
[@&#8203;westonpace](https://togithub.com/westonpace) in
[lancedb/lance#2330

##### Other Changes

- chore: remove `m_max` and `use_heuristic` params from HNSW builder by
[@&#8203;BubbleCal](https://togithub.com/BubbleCal) in
[lancedb/lance#2336
- fix(java): fix JNI jar loader issue by
[@&#8203;LuQQiu](https://togithub.com/LuQQiu) in
[lancedb/lance#2340
- ci: fix labeler permissions by
[@&#8203;wjones127](https://togithub.com/wjones127) in
[lancedb/lance#2348
- fix: rework decoding to fix bugs in nested struct decoding by
[@&#8203;westonpace](https://togithub.com/westonpace) in
[lancedb/lance#2337

#### New Contributors

- [@&#8203;broccoliSpicy](https://togithub.com/broccoliSpicy) made their
first contribution in
[lancedb/lance#2346
- [@&#8203;raunaks13](https://togithub.com/raunaks13) made their first
contribution in
[lancedb/lance#2349

**Full Changelog**:
lancedb/lance@v0.10.18...v0.11.0

</details>

---

### Configuration

📅 **Schedule**: Branch creation - At any time (no schedule defined),
Automerge - At any time (no schedule defined).

🚦 **Automerge**: Enabled.

♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the
rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about this update
again.

---

- [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check
this box

---

This PR has been generated by [Mend
Renovate](https://www.mend.io/free-developer-tools/renovate/). View
repository job log
[here](https://developer.mend.io/github/spiraldb/vortex).

<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNy4zOTMuMCIsInVwZGF0ZWRJblZlciI6IjM3LjM5My4wIiwidGFyZ2V0QnJhbmNoIjoiZGV2ZWxvcCIsImxhYmVscyI6W119-->

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Dataset not found during frequent writes
5 participants