-
Notifications
You must be signed in to change notification settings - Fork 119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature: Add Support for Data URI #385
Conversation
Nice, so you want to implement.
And then accept the following?
I like it - please make sure to also add a unit test for this, and add a documentation. If there is an OpenAPI feature that generates Hint: you can |
Codecov ReportAttention: Patch coverage is
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## main #385 +/- ##
==========================================
- Coverage 78.99% 78.66% -0.34%
==========================================
Files 37 39 +2
Lines 2804 2962 +158
==========================================
+ Hits 2215 2330 +115
- Misses 589 632 +43 ☔ View full report in Codecov by Sentry. |
def is_base64_check(s: str): | ||
"""Regex check to quickly check if string is base64 or not.""" | ||
pattern = ( | ||
r"^[A-Za-z0-9+/]{4}([A-Za-z0-9+/]{4})*([A-Za-z0-9+/]{2}==|[A-Za-z0-9+/]{3}=)?$" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a performance efficient check for this? Is it needed (doesn't the Mozilla spec say it needs to say base64?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@michaelfeil do we need this? or want this?
@davleop @stikkireddy Added a lot of changes, to make the data uri validation more native to the pydantic validation. |
@michaelfeil awesome! this is what i was hoping would be changed. you got rid of regex match for base64 format. The mdn guide specifies the format structure and if the base64 key is provided then its assumed that the rest of the string is base64 encoded. If decoding fails we can send that error back to the user. https://developer.mozilla.org/en-US/docs/Web/URI/Schemes/data |
Last question is there any way to support this from OpenAI embeddings spec? extra_body is typically used to provide additional non spec features to allow the server to handle things. It would be nice to be able to reuse the openai embeddings client for things like: client = OpenAI(....)
client.embeddings.create(
model="default",
inputs=["https://...", "data:..."],
extra_body={
"infiniy_image_embeddings": true
} I know its not exactly what openai supports but it would be great to be able to use existing clients to solve this problem if possible. they have built in retries, etc. vllm does this for vllm specific features like guided decoding, etc. |
CHARSET_REGEX = r"[\w\-\+\.]+" | ||
_CHARSET_RE = re.compile("^{}$".format(CHARSET_REGEX)) | ||
|
||
DATA_URI_REGEX = ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stikkireddy Moved the regex directly into the validation of the pydantic type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
libs/infinity_emb/infinity_emb/transformer/vision/utils.py
i assume you will remove that unnecessary code then?
Thanks for picking up my slack! I haven't had much time to take a look at this again. I kinda just implemented this as a solution I needed at the time and thought it might be good to push it up to main :) |
Feature: add support for data URI
Following after OpenAI's vision docs: https://platform.openai.com/docs/guides/vision