[Frontend] [Core] Add Tensorizer support for LoRA adapter serialization and deserialization #17926
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Serializing and deserializing LoRA adapters using
tensorizer
This PR allows LoRA adapters to be serialized and deserialized using
tensorizer
. A test is added to confirm this and serializing LoRA files can be done with the same script that can save vLLM models inexamples/tensorize_vllm_model.py
.Summary of changes
.buildkite/test-pipeline.yaml
's invocation of thetensorize_vllm_model.py
example script to additionally save and load a LoRA adapter, testing a model generation after deserialization to confirm the LoRA adapter was loaded properly--lora-path
totensorize_vllm_model.py
as a base argparser argument that allows a user to specify the HuggingFace reference ID for a LoRA adapter, which can be either serialized or deserialized depending on whether theserialize
ordeserialize
subparser is indicated.test_serialize_and_deserialize_lora
totest_tensorizer.py
testing saving and loading LoRA adapters withtensorizer
.TensorizerConfig
to be passed in as a kwarg toLoRARequest
. When this is done, the LoRA tensors are assumed to be intensorizer
's.tensors
format and deserialized according to the parameters given in theTensorizerConfig
provided.