You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm currently exploring solutions to serve a custom model and would appreciate your insights on whether my use case is feasible with text-generation-inference (TGI).
My model requires a custom embedding logic based not only on input_ids, but also on their position within the full sequence. More specifically, I’m working on a mesh generation task, and in addition to the tokens, I use two types of position IDs:
Tokens
<start_mesh>
<vertex_1>
<vertex_2>
<vertex_3>
<vertex_4>
<vertex_5>
<vertex_6>
<end_mesh>
Vertex position IDs
0
1
2
3
1
2
3
0
Global face position IDs
0
1
1
1
2
2
2
0
When creating the embedding for an input_id, I need to know its position within the full input_ids sequence, including both the prompt and the generated tokens.
I implemented this in a Hugging Face model by customizing the prepare_inputs_for_generation method, ensuring that during generation, I always had access to the full list of input_ids. This allowed me to generate embeddings appropriately.
Now, my question is: How feasible is it to replicate this logic using TGI?
I initially tried vLLM, but found it challenging to access the full sequence of input_ids during the forward pass, particularly with continuous batching where inputs are interleaved.
Any guidance on whether TGI supports this type of positional logic—or if there’s a recommended way to achieve it—would be greatly appreciated!
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Hello team,
I'm currently exploring solutions to serve a custom model and would appreciate your insights on whether my use case is feasible with text-generation-inference (TGI).
My model requires a custom embedding logic based not only on
input_ids
, but also on their position within the full sequence. More specifically, I’m working on a mesh generation task, and in addition to the tokens, I use two types of position IDs:When creating the embedding for an
input_id
, I need to know its position within the fullinput_ids
sequence, including both the prompt and the generated tokens.I implemented this in a Hugging Face model by customizing the
prepare_inputs_for_generation
method, ensuring that during generation, I always had access to the full list ofinput_ids
. This allowed me to generate embeddings appropriately.Now, my question is: How feasible is it to replicate this logic using TGI?
I initially tried vLLM, but found it challenging to access the full sequence of
input_ids
during the forward pass, particularly with continuous batching where inputs are interleaved.Any guidance on whether TGI supports this type of positional logic—or if there’s a recommended way to achieve it—would be greatly appreciated!
Thanks in advance!
Beta Was this translation helpful? Give feedback.
All reactions