Feat: support DTensor when saving #3042

S1ro1 · 2025-05-01T17:13:11Z

This enables transformers to use save_pretrained when model was shared with DTensor. Shouldn't break anything as this just failed before.

HuggingFaceDocBuilderDev · 2025-05-01T17:17:14Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

hanouticelina

Thanks @S1ro1 for the PR! I left a comment about the storage size computation of a DTensor

hanouticelina · 2025-05-05T14:51:11Z

src/huggingface_hub/serialization/_torch.py

+    try:
+        from torch.distributed.tensor import DTensor
+
+        if isinstance(tensor, DTensor):
+            # this returns the size of the FULL tensor in bytes
+            return tensor.nbytes
+    except ImportError:
+        pass


I'm not familiar with DTensor, but if the tensor is indeed a DTensor and the import fails line 766, would it be okay to fallback to tensor.untyped_storage().nbytes() ?

I'm not 100% sure, will have to test locally, but I'm pretty sure that would fail on has no method untyped_storage. But this import shouldn't ever fail if the tensor is DTensor. It's wrapped in try/except to avoid version checking as DTensor is torch >= 2.1 (ish).

DTensor is torch >= 2.1 (ish)

okay then all good!

hanouticelina

Looks good! @S1ro1 could you add a simple test for get_torch_storage_size with DTensor in https://github.com/huggingface/huggingface_hub/blob/main/tests/test_serialization.py if possible?

this one should be enough:

@requires("torch")
def test_get_torch_storage_size_dtensor():
    import torch
    import torch.distributed as dist
    from torch.distributed.device_mesh import init_device_mesh
    from torch.distributed.tensor import DTensor, Replicate

    if dist.is_available() and not dist.is_initialized():
        dist.init_process_group(
            backend="gloo",
            store=dist.HashStore(),
            rank=0,
            world_size=1,
        )

    mesh = init_device_mesh("cpu", (1,))
    local = torch.tensor([1, 2, 3, 4, 5], dtype=torch.float16)
    dt = DTensor.from_local(local, mesh, [Replicate()])

    assert get_torch_storage_size(dt) == 5 * 2

(written with the help of pytorch documentation and Claude)

S1ro1 · 2025-05-06T14:11:50Z

Yes, will add something similar, sure. I suppose testing multi-process (Shard()) would be also nice, I can try if I can throw something together with subprocess and torchrun

S1ro1 · 2025-05-09T14:26:07Z

@hanouticelina I've added the test as suggested, it's probably not worth to add more complex test cases as those require torch.distributed.run which would result in having to write the source of the test as string and then call it with subprocess which IMO is not worth.
LMK your thoughts, except of this it should be good to merge now.

Test fails seem to be unrelated

S1ro1 mentioned this pull request May 1, 2025

Feat: save_pretrained for tensor parallel (and other parallelisms) models huggingface/transformers#37919

Open

S1ro1 changed the title ~~tmp: return tensor.nbytes for get_torch_storage_size~~ Feat: support DTensor when saving May 2, 2025

S1ro1 marked this pull request as ready for review May 2, 2025 12:05

hanouticelina reviewed May 5, 2025

View reviewed changes

hanouticelina approved these changes May 6, 2025

View reviewed changes

Feat: support DTensor for storage size and id

2525677

S1ro1 force-pushed the transformers-save-dtensor branch from 2768611 to 94abfb4 Compare May 9, 2025 14:24

Feat: tests

f2f660c

S1ro1 force-pushed the transformers-save-dtensor branch from a9589e1 to f2f660c Compare May 9, 2025 14:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat: support DTensor when saving #3042

Feat: support DTensor when saving #3042

S1ro1 commented May 1, 2025 •

edited

Loading

HuggingFaceDocBuilderDev commented May 1, 2025

hanouticelina left a comment

hanouticelina May 5, 2025

S1ro1 May 5, 2025

hanouticelina May 6, 2025

hanouticelina left a comment

S1ro1 commented May 6, 2025

S1ro1 commented May 9, 2025 •

edited

Loading

Feat: support DTensor when saving #3042

Are you sure you want to change the base?

Feat: support DTensor when saving #3042

Conversation

S1ro1 commented May 1, 2025 • edited Loading

HuggingFaceDocBuilderDev commented May 1, 2025

hanouticelina left a comment

Choose a reason for hiding this comment

hanouticelina May 5, 2025

Choose a reason for hiding this comment

S1ro1 May 5, 2025

Choose a reason for hiding this comment

hanouticelina May 6, 2025

Choose a reason for hiding this comment

hanouticelina left a comment

Choose a reason for hiding this comment

S1ro1 commented May 6, 2025

S1ro1 commented May 9, 2025 • edited Loading

S1ro1 commented May 1, 2025 •

edited

Loading

S1ro1 commented May 9, 2025 •

edited

Loading