Ensemble and tensorrt_llm_bls have different results when using accumulate_tokens #520
Closed
2 of 4 tasks
Labels
bug
Something isn't working
System Info
CPU x86_64
GPU NVIDIA L20
TensorRT branch: v0.8.0
CUDA: NVIDIA-SMI 535.154.05 Driver Version: 535.154.05 CUDA Version: 12.3
Who can help?
@kaiyux @byshiue @schetlur-nv
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
When I use accumulate_tokens, I found the same request has different result.
Expected behavior
The results should be the same.
actual behavior
When the prompt and parameters are the same, I use APIs of
ensemble
andtensorrt_llm_bls
, the results are different.curl -X POST localhost:8820/v2/models/tensorrt_llm_bls/generate_stream
The result is:
The part of text_output is:
curl -X POST localhost:8820/v2/models/ensemble/generate_stream
The result is:
The part of text_output is:
In fact, the result of
ensemble
is expected.I also print the output_ids, they are different.

additional notes
I'm confused as to why this is happening, I think the results just should be the same.
Is there a way to solve this problem.
Thanks.
The text was updated successfully, but these errors were encountered: