-
-
Notifications
You must be signed in to change notification settings - Fork 264
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for Google's PaLM 2 #20
Comments
I have access now. I managed to get an API key I can use with the text-bison-001 model via Google Vertex AI. https://console.cloud.google.com/vertex-ai/publishers/google/model-garden/text-bison The API call looks like this: url = "https://generativelanguage.googleapis.com/v1beta2/models/text-bison-001:generateText?key={}".format(
api_key
)
response = requests.post(
url,
json={"prompt": {"text": prompt}},
headers={"Content-Type": "application/json"},
)
output = response.json()["candidates"][0]["output"] |
The thing I'm having trouble with here is what to call this. I want a command similar to Naming options:
There are a ton of other models available in the Vertex "model garden": https://console.cloud.google.com/vertex-ai/model-garden - T5-FLAN, Stable Diffusion, BLIP, all sorts of things. It's so very confusing in there! Many of them don't seem to have HTTP API endpoints - some appear to be available only via a notebook interface. |
I'm tempted to go with Maybe the vendors themselves are a distraction - the thing that matters is the model. I've kind of broken this already though by having GPT-4 as a |
For PaLM 2 itself I think the models available to me are |
I'm going to land a |
This looks useful: https://developers.generativeai.google/api/rest/generativelanguage/models/list
I currently get this: {
"models": [
{
"name": "models/chat-bison-001",
"version": "001",
"displayName": "Chat Bison",
"description": "Chat-optimized generative language model.",
"inputTokenLimit": 4096,
"outputTokenLimit": 1024,
"supportedGenerationMethods": [
"generateMessage"
],
"temperature": 0.25,
"topP": 0.95,
"topK": 40
},
{
"name": "models/text-bison-001",
"version": "001",
"displayName": "Text Bison",
"description": "Model targeted for text generation.",
"inputTokenLimit": 8196,
"outputTokenLimit": 1024,
"supportedGenerationMethods": [
"generateText"
],
"temperature": 0.7,
"topP": 0.95,
"topK": 40
},
{
"name": "models/embedding-gecko-001",
"version": "001",
"displayName": "Embedding Gecko",
"description": "Obtain a distributed representation of a text.",
"inputTokenLimit": 1024,
"outputTokenLimit": 1,
"supportedGenerationMethods": [
"embedText"
]
}
]
} So no |
https://developers.generativeai.google/api/rest/generativelanguage/models/countMessageTokens can count tokens:
{
"tokenCount": 23
} |
I'm just going to ship |
Where should it get the API token from? The way I do it for OpenAI tokens right now is bad and needs fixing: Lines 162 to 173 in 293f306
|
My code so far: @cli.command()
@click.argument("prompt", required=False)
@click.option("-m", "--model", help="Model to use", default="text-bison-001")
@click.option("-n", "--no-log", is_flag=True, help="Don't log to database")
def palm2(prompt, model, no_log):
"Execute a prompt against a PaLM 2 model"
if prompt is None:
# Read from stdin instead
prompt = sys.stdin.read()
api_key = get_vertex_api_key()
url = "https://generativelanguage.googleapis.com/v1beta2/models/text-bison-001:generateText?key={}".format(
api_key
)
response = requests.post(
url,
json={"prompt": {"text": prompt}},
headers={"Content-Type": "application/json"},
)
output = response.json()["candidates"][0]["output"]
log(no_log, "vertex", None, prompt, output, model)
click.echo(output) |
if prompt is None:
# Read from stdin instead
prompt = sys.stdin.read() This poses a problem with the UX. This is the reason why I suggested a fix for this in PR#19. |
That's a deliberate design decision at the moment - it means you can run There are other common unix commands that work like this - Since it's possible to detect this situation, perhaps a message to stderr reminding the user to type or paste in content and hit |
I'm not sure to understand. When I run Oh... my bad! Okay, I hit As of now, I would find it better to print the helper message by default, as I suggested in my pull request. However, if an instruction can be shown to the user via |
Now that I've renamed
I'll support |
In that new prototype branch:
|
Figuring out chat mode for Vertex/PaLM2 is proving hard. https://console.cloud.google.com/vertex-ai/publishers/google/model-garden/1?project=cloud-vision-ocr-382418 talks about "PaLM 2 for Chat". https://github.com/GoogleCloudPlatform/python-docs-samples/blob/main/generative_ai/chat.py seems to be the most relevant example code: from vertexai.preview.language_models import ChatModel, InputOutputTextPair
def science_tutoring(temperature: float = 0.2) -> None:
chat_model = ChatModel.from_pretrained("chat-bison@001")
parameters = {
"temperature": temperature, # Temperature controls the degree of randomness in token selection.
"max_output_tokens": 256, # Token limit determines the maximum amount of text output.
"top_p": 0.95, # Tokens are selected from most probable to least until the sum of their probabilities equals the top_p value.
"top_k": 40, # A top_k of 1 means the selected token is the most probable among all tokens.
}
chat = chat_model.start_chat(
context="My name is Miles. You are an astronomer, knowledgeable about the solar system.",
examples=[
InputOutputTextPair(
input_text="How many moons does Mars have?",
output_text="The planet Mars has two moons, Phobos and Deimos.",
),
],
)
response = chat.send_message(
"How many planets are there in the solar system?", **parameters
)
print(f"Response from Model: {response.text}")
return response I think this is where |
Buried deep in a class hierarchy, this looks like the code that actually constructs the JSON to call the API: https://github.com/googleapis/python-aiplatform/blob/c60773a7db8ce7a59d2cb5787dc90937776c0b8f/vertexai/language_models/_language_models.py#L697-L824 The API call then goes through this code: prediction_response = self._model._endpoint.predict(
instances=[prediction_instance],
parameters=prediction_parameters,
) I have not yet tracked down that |
Some useful hints in https://github.com/google/generative-ai-docs/blob/main/site/en/tutorials/chat_quickstart.ipynb - including that PaLM 2 has a "context" concept which appears to be the same thing as an OpenAI system prompt: reply = palm.chat(context="Speak like Shakespeare.", messages='Hello')
print(reply.last)
reply = palm.chat(
context="Answer everything with a haiku, following the 5/7/5 rhyme pattern.",
messages="How's it going?"
)
print(reply.last)
|
Based on that example notebook, I'm going to ditch the terminology "Vertex" and "PaLM 2" and just call it "PaLM". (They never released an API for PaLM 1). I'm also going to move my code out of the experimental plugin and into a |
Got this out of the debugger, after this: import google.generativeai as palm
kwargs = {"messages": self.prompt.prompt}
if self.prompt.system:
kwargs["context"] = self.prompt.system
response = palm.chat(**kwargs)
last = response.last
|
That library doesn't have streaming support yet, issues here: |
The Here's GPT-4:
PaLM messes that one up:
This example from the PaLM example notebook does work though:
|
This was surprising:
I got back a |
Extracted this out to a separate plugin: https://github.com/simonw/llm-palm Closing this issue - future work will happen there instead. |
It lives here now: https://github.com/simonw/llm-palm Refs #20
It lives here now: https://github.com/simonw/llm-palm Refs #20
It lives here now: https://github.com/simonw/llm-palm Refs #20
It lives here now: https://github.com/simonw/llm-palm Refs #20
No description provided.
The text was updated successfully, but these errors were encountered: