Name		Name	Last commit message	Last commit date
parent directory ..
Dockerfile		Dockerfile
README.md		README.md
pull.sh		pull.sh
run-gemma2b.gif		run-gemma2b.gif

README.md

Run Gemma with Ollama on Cloud Run

This sample shows how to deploy the Ollama API with gemma:2b on Cloud Run, to run inference using CPU only.

Gemma is Google's open model built from the same research and technology used to create the Gemini models. The 2B version is the smallest version. I didn't try running 7B yet.

Ollama is a framework that makes it easy for developers to prototype apps with open models, including gemma. It comes with a REST API and this sample deploys that API with a Cloud Run service.

Usage

To build the container with gemma:2b included and deploy the Ollama API to a publicly accessible URL on Cloud Run, use the following command from this directory:

gcloud run deploy \
  --cpu=8 \
  --memory=8Gi \
  --execution-environment=gen2 \
  --allow-unauthenticated

Respond to any prompts the command gives you. You might need to enable a few APIs and choose a region to deploy to.

Building the container takes roughly 3 minutes.

Once the command completes, the deploy command shows the public URL of the service. For convenience, store the URL in an environment variable:

URL=$(gcloud run services list --format "value(URL)" --filter metadata.name=ollama-gemma)

Explore the API

To display the list of available models, send a request to api/tags. This should list gemma:2b.

curl $URL/api/tags

Ask Gemma a question:

curl $URL/api/generate -d \
 '{ 
    "model": "gemma:2b", 
    "prompt": "Why is the sky blue?" 
  }'

The first request to a new instance will take some extra setup time because Gemma is loaded into memory. Ollama keeps the model in memory for 5 minutes. I search for a config parameter to change this setting but didn't find it yet.

For the full Ollama API, refer to the API docs.

Is it any good?

Turns out that the token rate isn't half bad. Here's gemma:2b on an 8 CPU instance:

Clean up

To clean up after following this short tutorial, you can do the following:

In Artifact Registry, find the cloud-run-source-deploy repository and remove the container image used by the Cloud Run service you created.
In Cloud Run, delete the service you created.

Links

Use for research, exploration and prototyping.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ollama-gemma

ollama-gemma

README.md

Run Gemma with Ollama on Cloud Run

Usage

Explore the API

Is it any good?

Clean up

Links

Files

ollama-gemma

Directory actions

More options

Directory actions

More options

Latest commit

History

ollama-gemma

Folders and files

parent directory

README.md

Run Gemma with Ollama on Cloud Run

Usage

Explore the API

Is it any good?

Clean up

Links