Cria, use Python to run LLMs with as little friction as possible.
Cria is a library for programmatically running Large Language Models through Python. Cria is built so you need as little configuration as possible — even with more advanced features.
- Easy: No configuration is required out of the box. Getting started takes just five lines of code.
- Concise: Write less code to save time and avoid duplication.
- Local: Free and unobstructed by rate limits, running LLMs requires no internet connection.
- Efficient: Use advanced features with your own
ollama
instance, or a subprocess.
Running Cria is easy. After installation, you need just five lines of code — no configurations, no manual downloads, no API keys, and no servers to worry about.
import cria
ai = cria.Cria()
prompt = "Who is the CEO of OpenAI?"
for chunk in ai.chat(prompt):
print(chunk, end="")
>>> The CEO of OpenAI is Sam Altman!
or, you can run this more configurable example.
import cria
with cria.Model() as ai:
prompt = "Who is the CEO of OpenAI?"
response = ai.chat(prompt, stream=False)
print(response)
>>> The CEO of OpenAI is Sam Altman!
Warning
If no model is configured, Cria automatically installs and runs the default model: llama3.1:8b
(4.7GB).
-
Cria uses
ollama
, to install it, run the following.curl -fsSL https://ollama.com/install.sh | sh
-
Install Cria with
pip
.pip install cria
To run other LLMs, pass them into your ai
variable.
import cria
ai = cria.Cria("llama2")
prompt = "Who is the CEO of OpenAI?"
for chunk in ai.chat(prompt):
print(chunk, end="") # The CEO of OpenAI is Sam Altman. He co-founded OpenAI in 2015 with...
You can find available models here.
Streams are used by default in Cria, but you can turn them off by passing in a boolean for the stream
parameter.
prompt = "Who is the CEO of OpenAI?"
response = ai.chat(prompt, stream=False)
print(response) # The CEO of OpenAI is Sam Altman!
By default, models are closed when you exit the Python program, but closing them manually is a best practice.
ai.close()
You can also use with
statements to close models automatically (recommended).
Message history is automatically saved in Cria, so asking follow-up questions is easy.
prompt = "Who is the CEO of OpenAI?"
response = ai.chat(prompt, stream=False)
print(response) # The CEO of OpenAI is Sam Altman.
prompt = "Tell me more about him."
response = ai.chat(prompt, stream=False)
print(response) # Sam Altman is an American entrepreneur and technologist who serves as the CEO of OpenAI...
You can reset message history by running the clear
method.
prompt = "Who is the CEO of OpenAI?"
response = ai.chat(prompt, stream=False)
print(response) # Sam Altman is an American entrepreneur and technologist who serves as the CEO of OpenAI...
ai.clear()
prompt = "Tell me more about him."
response = ai.chat(prompt, stream=False)
print(response) # I apologize, but I don't have any information about "him" because the conversation just started...
You can also create a custom message history, and pass in your own context.
context = "Our AI system employed a hybrid approach combining reinforcement learning and generative adversarial networks (GANs) to optimize the decision-making..."
messages = [
{"role": "system", "content": "You are a technical documentation writer"},
{"role": "user", "content": context},
]
prompt = "Write some documentation using the text I gave you."
for chunk in ai.chat(messages=messages, prompt=prompt):
print(chunk, end="") # AI System Optimization: Hybrid Approach Combining Reinforcement Learning and...
In the example, instructions are given to the LLM as the system
. Then, extra context is given as the user
. Finally, the prompt is entered (as a user
). You can use any mixture of roles to specify the LLM to your liking.
The available roles for messages are:
user
- Pass prompts as the user.system
- Give instructions as the system.assistant
- Act as the AI assistant yourself, and give the LLM lines.
The prompt parameter will always be appended to messages under the user
role, to override this, you can choose to pass in nothing for prompt
.
If you are streaming messages with Cria, you can interrupt the prompt mid way.
response = ""
max_token_length = 5
prompt = "Who is the CEO of OpenAI?"
for i, chunk in enumerate(ai.chat(prompt)):
if i >= max_token_length:
ai.stop()
response += chunk
print(response) # The CEO of OpenAI is
response = ""
max_token_length = 5
prompt = "Who is the CEO of OpenAI?"
for i, chunk in enumerate(ai.generate(prompt)):
if i >= max_token_length:
ai.stop()
response += chunk
print(response) # The CEO of OpenAI is
In the examples, after the AI generates five tokens (units of text that are usually a couple of characters long), text generation is stopped via the stop
method. After stop
is called, you can safely break
out of the for
loop.
By default, Cria automatically saves responses in message history, even if the stream is interrupted. To prevent this behaviour though, you can pass in the allow_interruption
boolean.
ai = cria.Cria(allow_interruption=False)
response = ""
max_token_length = 5
prompt = "Who is the CEO of OpenAI?"
for i, chunk in enumerate(ai.chat(prompt)):
if i >= max_token_length:
ai.stop()
break
print(chunk, end="") # The CEO of OpenAI is
prompt = "Tell me more about him."
for chunk in ai.chat(prompt):
print(chunk, end="") # I apologize, but I don't have any information about "him" because the conversation just started...
If you are running multiple models or parallel conversations, the Model
class is also available. This is recommended for most use cases.
import cria
ai = cria.Model()
prompt = "Who is the CEO of OpenAI?"
response = ai.chat(prompt, stream=False)
print(response) # The CEO of OpenAI is Sam Altman.
All methods that apply to the Cria
class also apply to Model
.
Multiple models can be run through a with
statement. This automatically closes them after use.
import cria
prompt = "Who is the CEO of OpenAI?"
with cria.Model("llama3") as ai:
response = ai.chat(prompt, stream=False)
print(response) # OpenAI's CEO is Sam Altman, who also...
with cria.Model("llama2") as ai:
response = ai.chat(prompt, stream=False)
print(response) # The CEO of OpenAI is Sam Altman.
Or, models can be run traditionally.
import cria
prompt = "Who is the CEO of OpenAI?"
llama3 = cria.Model("llama3")
response = llama3.chat(prompt, stream=False)
print(response) # OpenAI's CEO is Sam Altman, who also...
llama2 = cria.Model("llama2")
response = llama2.chat(prompt, stream=False)
print(response) # The CEO of OpenAI is Sam Altman.
# Not required, but best practice.
llama3.close()
llama2.close()
Cria also has a generate
method.
prompt = "Who is the CEO of OpenAI?"
for chunk in ai.generate(prompt):
print(chunk, end="") # The CEO of OpenAI (Open-source Artificial Intelligence) is Sam Altman.
promt = "Tell me more about him."
response = ai.generate(prompt, stream=False)
print(response) # I apologize, but I think there may have been some confusion earlier. As this...
When you run cria.Cria()
, an ollama
instance will start up if one is not already running. When the program exits, this instance will terminate.
However, if you want to save resources by not exiting ollama
, either run your own ollama
instance in another terminal, or run a managed subprocess.
ollama serve
prompt = "Who is the CEO of OpenAI?"
with cria.Model() as ai:
response = ai.generate("Who is the CEO of OpenAI?", stream=False)
print(response)
# If it is the first time you start the program, ollama will start automatically
# If it is the second time (or subsequent times) you run the program, ollama will already be running
ai = cria.Cria(standalone=True, close_on_exit=False)
prompt = "Who is the CEO of OpenAI?"
with cria.Model("llama2") as llama2:
response = llama2.generate("Who is the CEO of OpenAI?", stream=False)
print(response)
with cria.Model("llama3") as llama3:
response = llama3.generate("Who is the CEO of OpenAI?", stream=False)
print(response)
quit()
# Despite exiting, olama will keep running, and be used the next time this program starts.
To format the output of the LLM, pass in the format keyword.
ai = cria.Cria()
prompt = "Return a JSON array of AI companies."
response = ai.chat(prompt, stream=False, format="json")
print(response) # ["OpenAI", "Anthropic", "Meta", "Google", "Cohere", ...].
The current supported formats are:
- JSON
If you have a feature request, feel free to make an issue!
Contributions are highly appreciated.