Skip to content

Commit

Permalink
Docs: Extend algorithms
Browse files Browse the repository at this point in the history
  • Loading branch information
ashvardanian committed Feb 5, 2024
1 parent 46e957c commit 266c017
Show file tree
Hide file tree
Showing 8 changed files with 236 additions and 365 deletions.
3 changes: 3 additions & 0 deletions .vscode/settings.json
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
"cmake.sourceDirectory": "${workspaceRoot}",
"cSpell.words": [
"allowoverlap",
"aminoacid",
"aminoacids",
"Apostolico",
"Appleby",
Expand All @@ -32,6 +33,7 @@
"Cawley",
"cheminformatics",
"cibuildwheel",
"CONCAT",
"copydoc",
"cptr",
"endregion",
Expand Down Expand Up @@ -103,6 +105,7 @@
"substr",
"SWAR",
"Tanimoto",
"thyrotropin",
"TPFLAGS",
"unigram",
"usecases",
Expand Down
23 changes: 1 addition & 22 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -107,8 +107,6 @@ cmake --build ./build_release --config Release # Which will produce the fol
./build_release/stringzilla_bench_container <path> # for STL containers with string keys
```



You may want to download some datasets for benchmarks, like these:

```sh
Expand Down Expand Up @@ -259,30 +257,11 @@ Alternatively, on Linux, the official Swift Docker image can be used for builds
sudo docker run --rm -v "$PWD:/workspace" -w /workspace swift:5.9 /bin/bash -cl "swift build -c release --static-swift-stdlib && swift test -c release --enable-test-discovery"
```

## Roadmap

The project is in its early stages of development.
So outside of basic bug-fixes, several features are still missing, and can be implemented by you.
Future development plans include:

- [x] [Replace PyBind11 with CPython](https://github.com/ashvardanian/StringZilla/issues/35), [blog](https://ashvardanian.com/posts/pybind11-cpython-tutorial/.
- [x] [Bindings for JavaScript](https://github.com/ashvardanian/StringZilla/issues/25).
- [x] [Reverse-order operations](https://github.com/ashvardanian/StringZilla/issues/12).
- [ ] [Faster string sorting algorithm](https://github.com/ashvardanian/StringZilla/issues/45).
- [x] [Splitting with multiple separators at once](https://github.com/ashvardanian/StringZilla/issues/29).
- [ ] Universal hashing solution.
- [ ] Add `.pyi` interface for Python.
- [x] Arm NEON backend.
- [x] Bindings for Rust.
- [x] Bindings for Swift.
- [ ] Arm SVE backend.
- [ ] Stateful automata-based search.

## General Performance Observations

### Unaligned Loads

One common surface of attach for performance optimizations is minimizing unaligned loads.
One common surface of attack for performance optimizations is minimizing unaligned loads.
Such solutions are beautiful from the algorithmic perspective, but often lead to worse performance.
It's often cheaper to issue two interleaving wide-register loads, than try minimizing those loads at the cost of juggling registers.

Expand Down
363 changes: 207 additions & 156 deletions README.md

Large diffs are not rendered by default.

Binary file added assets/cover-strinzilla.jpeg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
File renamed without changes
Binary file added assets/meme-stringzilla-v3.jpeg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
19 changes: 0 additions & 19 deletions include/stringzilla/stringzilla.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -24,25 +24,6 @@
#define SZ_AVOID_STL (0) // true or false
#endif

/**
* @brief When set to 1, the strings `+` will return an expression template rather than a temporary string.
* This will improve performance, but may break some STL-specific code, so it's disabled by default.
* TODO:
*/
#ifndef SZ_LAZY_CONCAT
#define SZ_LAZY_CONCAT (0) // true or false
#endif

/**
* @brief When set to 1, the library will change `substr` and several other member methods of `string`
* to return a view of its slice, rather than a copy, if the lifetime of the object is guaranteed.
* This will improve performance, but may break some STL-specific code, so it's disabled by default.
* TODO:
*/
#ifndef SZ_PREFER_VIEWS
#define SZ_PREFER_VIEWS (0) // true or false
#endif

/* We need to detect the version of the C++ language we are compiled with.
* This will affect recent features like `operator<=>` and tests against STL.
*/
Expand Down
193 changes: 25 additions & 168 deletions scripts/bench_similarity.ipynb

Large diffs are not rendered by default.

0 comments on commit 266c017

Please sign in to comment.