A Common Lisp text splitting library
text-splitter
is available via ocicl. Install it like so:
$ ocicl install text-splitter
Load and split documents like so:
(split (make-document-from-file "report.pdf"))
This will produce a list of strings split from report.pdf
using the default size and overlap values (5000 and 200 characters respectively).
You can also create document instances manually like so:
(split (make-instance 'html-document :text MY-HTML-STRING) :size 10000 :overlap 0)
The split
function will take advantage of document structure as it
computes the splits, which is why it is helpful to know what kind of
document we're splitting.
split
will return nil
if it doesn't recognize the document type.
Related projects include:
- cl-embeddings: an LLM embeddings library
- cl-chroma: for a Lisp interface to the Chroma vector database.
- cl-completions: an LLM completions library
- cl-chat: a wrapper around
completions
to maintain chat history,
cl-text-splitter
was written by Anthony
Green and is distributed under the terms
of the MIT license.