Skip to content

[Feature]: Support HF-style chat template for multi-modal data in offline chat #17551

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
1 task done
Tracked by #4194
DarkLight1337 opened this issue May 1, 2025 · 2 comments · May be fixed by #17919
Open
1 task done
Tracked by #4194

[Feature]: Support HF-style chat template for multi-modal data in offline chat #17551

DarkLight1337 opened this issue May 1, 2025 · 2 comments · May be fixed by #17919
Assignees
Labels
feature request New feature or request good first issue Good for newcomers

Comments

@DarkLight1337
Copy link
Member

DarkLight1337 commented May 1, 2025

🚀 The feature, motivation and pitch

Currently, we expect image_url, audio_url etc. to be inside the messages that are passed to the chat template. We would like to expand this to supporting image, audio etc. inputs, just like in HuggingFace Transformers:

messages = [
    {
        "role": "user",
        "content": [
            {"type": "image"},
            {"type": "text", "text": "Can you describe this image?"}
        ]
    },
]

To avoid having to pass multi-modal inputs separately, we propose the following extension:

messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "image": image},
            {"type": "text", "text": "Can you describe this image?"}
        ]
    },
]

This lets us pass multi-modal data such as PIL images to LLM.chat directly without having to encode them into base64 URLs.

Alternatives

No response

Additional context

cc @ywang96 @Isotr0py @hmellor

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
@sfeng33
Copy link
Contributor

sfeng33 commented May 1, 2025

I can help work on it this week

@ywang96
Copy link
Member

ywang96 commented May 1, 2025

@sfeng33 Thank you for the interest!

@DarkLight1337 We should think about how to separate the code paths between openai compatible format and HF-compatible format.

In particular, currently LLM.chat leverages on the same payload format as the online openai compatible inference, so IMO we should create a new protocol (maybe HFChatCompletionMessageParam, but keep in mind this new format to support PIL image isn't really directly compatible with HF) for this particular purpose so that it does not accidentally affect other codepaths.

ChatCompletionMessageParam = Union[OpenAIChatCompletionMessageParam,
CustomChatCompletionMessageParam]

Doing so also allows us to support more different patterns that huggingface has in mind if any.

@DarkLight1337 DarkLight1337 moved this from Todo to In Progress in Multi-modality Core May 2, 2025
@ywang96 ywang96 linked a pull request May 9, 2025 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request good first issue Good for newcomers
Projects
Status: In Progress
Development

Successfully merging a pull request may close this issue.

3 participants