Missing Caching Support #395

legraphista · 2024-08-06T15:04:47Z

Describe the solution you'd like
Missing caching support equivalent to the Python SDK or Gemini TS SDK

Describe alternatives you've considered
I can create & manage caches with raw requests to the API endpoint, but I cannot use them as the cached_content cannot be passed through the library to the request

The text was updated successfully, but these errors were encountered:

NimJay · 2024-09-27T02:06:29Z

Hi @legraphista, :)
It looks like CachedContent support was added in v1.8.0. But I recommend using the latest version (v1.8.1 as of Sep 26).

NimJay · 2024-09-30T14:36:54Z

How to use Gemini's Context Caching (on Google Cloud's Vertex AI, with TypeScript)

Here's how you'd use Gemini's Context Caching via TypeScript, Google Cloud (Vertex AI), and @google-cloud/vertexai.

1. Evaluate pricing

First, make sure the pricing (of Gemini's Context Caching) and other benefits (e.g., lower latency) makes sense for your use case. For instance, if you're only making about 10 requests a day, it might not be worth the effort/price. There's also a minimum size for your cache (32,769 tokens as of Sep 30, 2024). 1 token ≈ 3.6 characters.

2. Create `CachedContent`

Create a CachedContent. It has a default life span of 1 hour, so update ttl (time-to-live) to your needs. When you create it, Google Cloud will give your CachedContent a unique name. You'll later (when you need to generate content using your cached context) use that name to reference your CachedContent.

const LLM_NAME = `gemini-1.5-flash-002`; // Make sure your model choice is up-to-date and fits your use case

// Example googleCloudRegion value: "us-central1". More info: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/locations
export async function createCachedContent(
  googleCloudProjectId: string, googleCloudRegion: string, initialPrompt: string,
): Promise<CachedContent | undefined> {
  const vertexAI = new VertexAI({ project: googleCloudProjectId, location: googleCloudRegion });
  // cachedContent.name is automatically generated at server side and cannot be specified by users
  const cachedContent: CachedContent = {
    displayName: 'My cached content',
    model: `projects/${googleCloudProjectId}/locations/${googleCloudRegion}/publishers/google/models/${LLM_NAME}`,
    systemInstruction: '',
    contents: [{ role: 'user', parts: [{ text: initialPrompt }] }],
    ttl: `${3600 * 3}s`, // 1 hour = 3600s
  };
  const createdCachedContent = await vertexAI.preview.cachedContents.create(cachedContent);
  // createdCachedContent.name will look like projects/123456781249/locations/us-central1/cachedContents/12345678471874431234
  if (!createdCachedContent || !createdCachedContent.name) {
    console.error('Failed to create CachedContent.');
    return;
  }
  console.log({ createdCachedContent });
  return createdCachedContent;
}

3. Get CachedContent `name`

The name is the unique identifier of your CachedContent. The previous createCachedContent() function returns a CachedContent object containing the name. But if you weren't able to grab the name, you can list your CachedContent objects in your Google Cloud project:

export async function listCachedContents(
  googleCloudProjectId: string, googleCloudRegion: string,
): Promise<CachedContent[] | undefined> {
  const vertexAI = new VertexAI({ project: googleCloudProjectId, location: googleCloudRegion });
  const cachedContentsResponse = await vertexAI.preview.cachedContents.list();
  return cachedContentsResponse.cachedContents; // If 0 CachedContents, this will be undefined
}

4. Reference the `CachedContent` when generating content

Generating content works the same as it would without context caching, except you now (to use your CachedContent) need to use the getGenerativeModelFromCachedContent() method (instead of getGenerativeModel()) and pass in your CachedContent (at least, the name and model fields).

const LLM_NAME = `gemini-1.5-flash-002`; // Make sure your model choice is up-to-date and fits your use case

const generationConfig = {
  temperature: 0.1, // Lower values = less creative, more predictable, more factually accurate
  topP: 0.1, // Lower values = less creative, more predictable, more factually accurate
  maxOutputTokens: 1000,
};

const safetySettings: SafetySetting[] = [
  {
    category: HarmCategory.HARM_CATEGORY_HARASSMENT,
    threshold: HarmBlockThreshold.BLOCK_NONE,
  },
  {
    category: HarmCategory.HARM_CATEGORY_HATE_SPEECH,
    threshold: HarmBlockThreshold.BLOCK_NONE,
  },
  {
    category: HarmCategory.HARM_CATEGORY_SEXUALLY_EXPLICIT,
    threshold: HarmBlockThreshold.BLOCK_NONE,
  },
  {
    category: HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT,
    threshold: HarmBlockThreshold.BLOCK_NONE,
  },
];

export async function generateContentUsingCachedContent(
  googleCloudProjectId: string, googleCloudRegion: string, cachedContentName: string, prompt: string,
): Promise<void> {
  const vertexAI = new VertexAI({ project: googleCloudProjectId, location: googleCloudRegion });
  const request: GenerateContentRequest = {
    contents: [{ role: 'user', parts: [{ text: prompt }] }],
    generationConfig,
    safetySettings,
  };

  // Create a CachedContent object (at minimum, you need the name and model field)
  const cachedContent = {
    name: cachedContentName,
    model: `projects/${googleCloudProjectId}/locations/${googleCloudRegion}/publishers/google/models/${LLM_NAME}`
  };
  const generativeModel = vertexAI.preview.getGenerativeModelFromCachedContent(cachedContent, { model: LLM_NAME });

  // Make the request to Google Cloud / Vertex AI
  console.log(`\n🤖 Sending message to Gemini:\n${prompt}`);
  const result = await generativeModel.generateContent(request);

  // Parse the response from Google Cloud
  if (result && result.response && result.response.candidates && result.response.candidates[0]) {
    const resultCandidate = result.response.candidates[0];
    if (resultCandidate.content.parts[0].text) {
      console.log(`\n🤖 Gemini's response:\n${resultCandidate.content.parts[0].text}`);
      return;
    }
  } else {
    console.error("Unexpected response format:", result);
  }
}

Resources

Always refer to the official docs at cloud.google.com for Google Cloud related guidance:

etc.

You might find more up-to-date, official Google Cloud samples in:

timconnorz · 2024-11-26T20:52:38Z

Hey @NimJay , can you provide some guidance on how one could later append to the cached content? My use case is Q&A over an increasing set of audio recordings. Ideally I could incrementally append to the content cache with new audio snippets as the recording gets larger.

NimJay · 2024-11-27T15:50:44Z

Hi @timconnorz,
Sorry, that's not a supported feature at the moment (source):

You can set a new ttl or expire_time for a cache. Changing anything else about the cache isn't supported.

(I'll reach out to the relevant teams internally.)

legraphista added priority: p3 Desirable enhancement or fix. May not be included in next release. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. labels Aug 6, 2024

product-auto-label bot added the api: aiplatform Issues related to the googleapis/nodejs-vertexai API. label Aug 6, 2024

oleksii-oleniuk-star mentioned this issue Oct 2, 2024

fix: Add Context Cache support for ChatSessionPreview class #433

Merged

oleksii-oleniuk-star mentioned this issue Oct 10, 2024

fix: extend UsageMetadata interface #439

Merged

timconnorz mentioned this issue Nov 26, 2024

Context Caching #468

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Missing Caching Support #395

Missing Caching Support #395

legraphista commented Aug 6, 2024

NimJay commented Sep 27, 2024 •

edited

Loading

NimJay commented Sep 30, 2024 •

edited

Loading

timconnorz commented Nov 26, 2024

NimJay commented Nov 27, 2024

Missing Caching Support #395

Missing Caching Support #395

Comments

legraphista commented Aug 6, 2024

NimJay commented Sep 27, 2024 • edited Loading

NimJay commented Sep 30, 2024 • edited Loading

How to use Gemini's Context Caching (on Google Cloud's Vertex AI, with TypeScript)

1. Evaluate pricing

2. Create CachedContent

3. Get CachedContent name

4. Reference the CachedContent when generating content

Resources

timconnorz commented Nov 26, 2024

NimJay commented Nov 27, 2024

NimJay commented Sep 27, 2024 •

edited

Loading

NimJay commented Sep 30, 2024 •

edited

Loading

2. Create `CachedContent`

3. Get CachedContent `name`

4. Reference the `CachedContent` when generating content