Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: clarify naming #8

Merged
merged 2 commits into from
Mar 7, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 5 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Chunker: Flexible Text Chunking for Elixir
# TextChunker: Flexible Text Chunking for Elixir

## About
Chunker is an Elixir library for segmenting large text documents, optimizing them for efficient embedding and storage within vector databases for use in resource augmented generation (RAG) applications.
TextChunker is an Elixir library for segmenting large text documents, optimizing them for efficient embedding and storage within vector databases for use in resource augmented generation (RAG) applications.

It prioritizes context preservation and adaptability, and is therefore ideal for analytical, NLP, and other applications where understanding the relationship between text segments is crucial.

Expand All @@ -23,12 +23,12 @@ Fill the gap in the Elixir ecosystem for a good semantic text chunker, and give

## Installation

Add Chunker to your mix.exs:
Add TextChunker to your mix.exs:

```elixir
def deps do
[
{:chunker, "~> 0.1.1"}
{:text_chunker, "~> 0.1.1"}
]
end
```
Expand All @@ -41,12 +41,6 @@ mix deps.get

## Usage

Begin by aliasing Chunker:

```elixir
alias Chunker.TextChunker
```

Chunk your text using the `split` function:

```elixir
Expand Down Expand Up @@ -96,8 +90,6 @@ You can use Recursive Chunk to split text up into any chunk size you wish, with
## Examples

```elixir
alias Chunker.TextChunker

text = "This is a sample text. It will be split into properly-sized chunks using the Chunker library."
opts = [chunk_size: 50, chunk_overlap: 5, format: :plaintext, strategy: &Chunker.Strategies.RecursiveChunk.split/2]

Expand Down Expand Up @@ -134,4 +126,4 @@ Special thanks to the creators of langchain for their initial approach to recurs

## License

Chunker is released under the MIT License. See the [LICENSE](LICENSE) file for details.
TextChunker is released under the MIT License. See the [LICENSE](LICENSE) file for details.
8 changes: 4 additions & 4 deletions lib/chunker/text_chunker.ex → lib/text_chunker.ex
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
defmodule Chunker.TextChunker do
defmodule TextChunker do
@moduledoc """
Provides a high-level interface for text chunking, employing a configurable splitting strategy (defaults to recursive splitting). Manages options and coordinates the process, tracking chunk metadata.

Expand All @@ -8,7 +8,7 @@ defmodule Chunker.TextChunker do
* **Size and Overlap Control:** Provides options for `:chunk_size` and `:chunk_overlap`.
* **Metadata Tracking:** Generates `Chunk` structs containing byte range information.
"""
alias Chunker.Strategies.RecursiveChunk
alias TextChunker.Strategies.RecursiveChunk

@default_opts [
chunk_size: 2000,
Expand All @@ -31,11 +31,11 @@ defmodule Chunker.TextChunker do

```elixir
iex> long_text = "This is a very long text that needs to be split into smaller pieces for easier handling."
iex> Chunker.TextChunker.split(long_text)
iex> TextChunker.split(long_text)
# => [%Chunk{}, %Chunk{}, ...]
```

iex> Chunker.TextChunker.split(long_text, chunk_size: 10, chunk_overlap: 3)
iex> TextChunker.split(long_text, chunk_size: 10, chunk_overlap: 3)
# => Generates many smaller chunks with significant overlap

"""
Expand Down
2 changes: 1 addition & 1 deletion lib/chunker/chunk.ex → lib/text_chunker/chunk.ex
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
defmodule Chunker.Chunk do
defmodule TextChunker.Chunk do
@moduledoc """
Defines the `Chunk` struct, representing a contiguous block of text extracted during the splitting process. It stores the text content along with its corresponding byte range within the original input text.
"""
Expand Down
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
defmodule Chunker.ChunkerBehaviour do
defmodule TextChunker.ChunkerBehaviour do
@moduledoc """
Defines the contract that must be implemented for all text splitting strategies.
"""
alias Chunker.Chunk
alias TextChunker.Chunk

@callback split(text :: binary(), opts :: [keyword()]) :: [Chunk.t()]
end
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
defmodule Chunker.Strategies.RecursiveChunk do
defmodule TextChunker.Strategies.RecursiveChunk do
@moduledoc """
Handles recursive text splitting, aiming to adhere to configured size and overlap requirements.
Employs a flexible separator-based approach to break down text into manageable chunks, while generating metadata for each produced chunk.
Expand All @@ -24,10 +24,10 @@ defmodule Chunker.Strategies.RecursiveChunk do
4. **Metadata Generation:** Tracks byte ranges for each chunk for potential reassembly of the original text.
"""

@behaviour Chunker.ChunkerBehaviour
@behaviour TextChunker.ChunkerBehaviour

alias Chunker.Chunk
alias Chunker.Strategies.RecursiveChunk.Separators
alias TextChunker.Chunk
alias TextChunker.Strategies.RecursiveChunk.Separators

require Logger

Expand All @@ -49,14 +49,14 @@ defmodule Chunker.Strategies.RecursiveChunk do
```elixir
iex> long_text = "This is a very long text that needs to be split into smaller pieces for easier handling."

iex> Chunker.Strategies.RecursiveChunk.split(long_text, chunk_size: 15, chunk_overlap: 5)
iex> TextChunker.Strategies.RecursiveChunk.split(long_text, chunk_size: 15, chunk_overlap: 5)
[
%Chunker.Chunk{
%TextChunker.Chunk{
start_byte: 0,
end_byte: 47,
text: "This is a very long text that needs to be split"
},
%Chunker.Chunk{
%TextChunker.Chunk{
start_byte: 38,
end_byte: 88,
text: " be split into smaller pieces for easier handling."
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
defmodule Chunker.Strategies.RecursiveChunk.Separators do
defmodule TextChunker.Strategies.RecursiveChunk.Separators do
@moduledoc """
Handles separator configuration for the RecursiveChunk text chunking strategy.

Expand Down
3 changes: 1 addition & 2 deletions test/recursive_chunk_test.exs
Original file line number Diff line number Diff line change
@@ -1,8 +1,7 @@
defmodule TextChunkerTest do
use ExUnit.Case

alias Chunker.TestHelpers
alias Chunker.TextChunker
alias TextChunker.TestHelpers

@moduletag timeout: :infinity

Expand Down
4 changes: 2 additions & 2 deletions test/support/test_helpers.ex
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
defmodule Chunker.TestHelpers do
defmodule TextChunker.TestHelpers do
@moduledoc false
@doc """
Extracts the text content from a single `Chunk` struct.
"""
def chunk_text(%Chunker.Chunk{} = chunk), do: chunk.text
def chunk_text(%TextChunker.Chunk{} = chunk), do: chunk.text

@doc """
Extracts the text content from a list of `Chunk` structs.
Expand Down
Loading