Documentation tweaks

benbrandt · Mar 9, 2024 · 0c0d22f · 0c0d22f
1 parent 24e1d3f
commit 0c0d22f
Show file tree

Hide file tree

Showing 2 changed files with 18 additions and 19 deletions.
diff --git a/README.md b/README.md
@@ -36,19 +36,14 @@ println!("{}", chunks.count())
 
 Requires the `tokenizers` feature to be activated and adding `tokenizers` to dependencies. The example below, using `from_pretrained()`, also requires tokenizers `http` feature to be enabled.
 
-<details>
-<summary>
-Click to show Cargo.toml.
-</summary>
+**Cargo.toml**
 
 ```toml
 [dependencies]
 text-splitter = { version = "0.7.0", features = ["tokenizers"] }
 tokenizers = { version = "0.15", features = ["http"] }
 ```
 
-</details>
-
 ```rust
 use text_splitter::TextSplitter;
 // Can also use anything else that implements the ChunkSizer
@@ -69,18 +64,13 @@ println!("{}", chunks.count())
 
 Requires the `tiktoken-rs` feature to be activated and adding `tiktoken-rs` to dependencies.
 
-<details>
-<summary>
-Click to show Cargo.toml.
-</summary>
+**Cargo.toml**
 
 ```toml
 text-splitter = { version = "0.7.0", features = ["tiktoken-rs"] }
 tiktoken-rs = "0.5"
 ```
 
-</details>
-
 ```rust
 use text_splitter::TextSplitter;
 // Can also use anything else that implements the ChunkSizer
@@ -122,6 +112,8 @@ println!("{}", chunks.count())
 
 All of the above examples also can also work with Markdown text. If you enable the `markdown` feature, you can use the `MarkdownSplitter` in the same ways as the `TextSplitter`.
 
+**Cargo.toml**
+
 ```toml
 [dependencies]
 text-splitter = { version = "0.7.0", features = ["markdown"] }
@@ -163,7 +155,7 @@ The boundaries used to split the text if using the `chunks` method, in ascending
 
 Splitting doesn't occur below the character level, otherwise you could get partial bytes of a char, which may not be a valid unicode str.
 
-### `Markdown` Semantic Levels
+### `MarkdownSplitter` Semantic Levels
 
 Markdown is parsed according to the CommonMark spec, along with some optional features such as GitHub Flavored Markdown.
 
@@ -189,11 +181,18 @@ There are lots of methods of determining sentence breaks, all to varying degrees
 
 ## Feature Flags
 
-| Feature Flag  | Compatible with     | Description                                                                                                                                                               |
-| ------------- | ------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
-| `markdown`    | -                   | Enables the `MarkdownSplitter` struct for parsing Markdown documents via the CommonMark spec.                                                                             |
-| `tiktoken-rs` | `tiktoken-rs@0.5.8` | Enables the `TextSplitter::new` to take a `tiktoken_rs::CoreBPE` as an argument. This is useful for splitting text for OpenAI models.                                     |
-| `tokenizers`  | `tokenizers@0.15.2` | Enables the `TextSplitter::new` to take a `tokenizers::Tokenizer` as an argument. This is useful for splitting text models that have a Hugging Face-compatible tokenizer. |
+### Document Format Support
+
+| Feature    | Description                                                                                   |
+| ---------- | --------------------------------------------------------------------------------------------- |
+| `markdown` | Enables the `MarkdownSplitter` struct for parsing Markdown documents via the CommonMark spec. |
+
+### Tokenizer Support
+
+| Dependency Feature | Version Supported | Description                                                                                                                                                               |
+| ------------------ | ----------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| `tiktoken-rs`      | `0.5.8`           | Enables the `TextSplitter::new` to take a `tiktoken_rs::CoreBPE` as an argument. This is useful for splitting text for OpenAI models.                                     |
+| `tokenizers`       | `0.15.2`          | Enables the `TextSplitter::new` to take a `tokenizers::Tokenizer` as an argument. This is useful for splitting text models that have a Hugging Face-compatible tokenizer. |
 
 ## Inspiration
 

diff --git a/bindings/python/README.md b/bindings/python/README.md
@@ -99,7 +99,7 @@ The boundaries used to split the text if using the `chunks` method, in ascending
 
 Splitting doesn't occur below the character level, otherwise you could get partial bytes of a char, which may not be a valid unicode str.
 
-### `Markdown` Semantic Levels
+### `MarkdownSplitter` Semantic Levels
 
 Markdown is parsed according to the CommonMark spec, along with some optional features such as GitHub Flavored Markdown.