You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, we expect image_url, audio_url etc. to be inside the messages that are passed to the chat template. We would like to expand this to supporting image, audio etc. inputs, just like in HuggingFace Transformers:
messages= [
{
"role": "user",
"content": [
{"type": "image"},
{"type": "text", "text": "Can you describe this image?"}
]
},
]
To avoid having to pass multi-modal inputs separately, we propose the following extension:
Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
The text was updated successfully, but these errors were encountered:
@DarkLight1337 We should think about how to separate the code paths between openai compatible format and HF-compatible format.
In particular, currently LLM.chat leverages on the same payload format as the online openai compatible inference, so IMO we should create a new protocol (maybe HFChatCompletionMessageParam, but keep in mind this new format to support PIL image isn't really directly compatible with HF) for this particular purpose so that it does not accidentally affect other codepaths.
🚀 The feature, motivation and pitch
Currently, we expect
image_url
,audio_url
etc. to be inside the messages that are passed to the chat template. We would like to expand this to supportingimage
,audio
etc. inputs, just like in HuggingFace Transformers:To avoid having to pass multi-modal inputs separately, we propose the following extension:
This lets us pass multi-modal data such as PIL images to
LLM.chat
directly without having to encode them into base64 URLs.Alternatives
No response
Additional context
cc @ywang96 @Isotr0py @hmellor
Before submitting a new issue...
The text was updated successfully, but these errors were encountered: