Skip to content

IndexFile.diff(None) returns empty after init -> add -> write -> read sequence on a new repository #2025

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
ElJaviLuki opened this issue May 11, 2025 · 1 comment

Comments

@ElJaviLuki
Copy link

ElJaviLuki commented May 11, 2025

Environment:

  • GitPython version: 3.1.44
  • Git version: git version 2.42.0.windows.2
  • Python version: 3.12.0
  • Operating System: Windows 11 Pro 24H2 26100.3775

Description:
When initializing a new repository, adding a file to the index, writing the index to disk, and then explicitly reading the index back, a subsequent call to repo.index.diff(None) incorrectly returns an empty DiffIndex (an empty list). This occurs even though an external git status --porcelain command correctly shows the file as added to the index (stage 'A').

This suggests that the in-memory state of the IndexFile object is not correctly reflecting the on-disk state for the diff(None) operation under these specific circumstances, even after an explicit repo.index.read().

Steps to Reproduce:

import os
import tempfile
import shutil
from git import Repo, IndexFile, Actor

# Setup a temporary directory for the new repository
repo_dir = tempfile.mkdtemp(prefix="test_gitpython_index_issue_")
try:
    # 1. Initialize a new repository
    repo = Repo.init(repo_dir)
    print(f"Repository initialized at: {repo_dir}")
    print(f"Is bare: {repo.bare}") # Should be False

    # 2. Create and add a new file (.gitkeep in this example)
    gitkeep_path = os.path.join(repo.working_tree_dir, ".gitkeep")
    with open(gitkeep_path, 'w') as f:
        f.write("# Initial file\n")
    print(f".gitkeep created at: {gitkeep_path}")

    index = repo.index
    index.add([".gitkeep"]) # Relative path to repo root
    print(f"Added '.gitkeep' to index object in memory.")

    # 3. Write the index to disk
    index.write()
    print(f"Index written to disk at: {index.path}")
    assert os.path.exists(index.path), "Index file should exist on disk"

    # 4. (Optional but good for verification) Check with external git status
    status_output = repo.git.status(porcelain=True)
    print(f"git status --porcelain output: '{status_output}'")
    assert "A  .gitkeep" in status_output or "?? .gitkeep" in status_output # Should be 'A ' after add+write

    # 5. Explicitly re-read the index (or create a new IndexFile instance)
    #    This step is crucial to the bug demonstration.
    index.read() # Force re-read of the IndexFile instance
    # Alternatively: index = IndexFile(repo) # Create new instance, should also read from disk
    print(f"Index explicitly re-read. Number of entries: {len(index.entries)}")
    assert len(index.entries) > 0, "Index should have entries after add/write/read"
    
    # 6. Perform a diff of the index against an empty tree (None)
    # This simulates what happens before an initial commit to see staged changes.
    diff_against_empty_tree = index.diff(None) 
    print(f"index.diff(None) result: {diff_against_empty_tree}")
    print(f"Type of result: {type(diff_against_empty_tree)}")
    for item_diff in diff_against_empty_tree:
        print(f"  Diff item: a_path={item_diff.a_path}, b_path={item_diff.b_path}, change_type={item_diff.change_type}, new_file={item_diff.new_file}")


    # Expected behavior:
    # index.diff(None) should return a DiffIndex containing one Diff object
    # representing the newly added '.gitkeep' file (change_type 'A').
    assert len(diff_against_empty_tree) == 1, \
        f"Expected 1 diff item, got {len(diff_against_empty_tree)}. Entries: {index.entries}"
    diff_item = diff_against_empty_tree[0]
    assert diff_item.change_type == 'A', \
        f"Expected change_type 'A', got '{diff_item.change_type}'"
    assert diff_item.b_path == ".gitkeep", \
        f"Expected b_path '.gitkeep', got '{diff_item.b_path}'"

except Exception as e:
    print(f"An error occurred: {e}")
    raise
finally:
    # Clean up the temporary directory
    # shutil.rmtree(repo_dir)
    # print(f"Cleaned up temp directory: {repo_dir}")
    pass

# To run this reproducer:
# 1. Save as a .py file.
# 2. Ensure GitPython is installed.
# 3. Run `python your_file_name.py`

Actual Behavior:
repo.index.diff(None) returns an empty DiffIndex (i.e., []).

Expected Behavior:
repo.index.diff(None) should return a DiffIndex containing one Diff object for .gitkeep with change_type='A', new_file=True, a_path=None, and b_path='.gitkeep'.

Additional Context:

  • This issue prevents correctly determining staged changes for an initial commit using index.diff(None).
  • The index.entries dictionary does seem to reflect the added file correctly after index.read().
  • The repo.git.status(porcelain=True) command correctly shows the file as staged for addition (A .gitkeep).
  • The problem seems specific to how IndexFile.diff(None) interprets the IndexFile's state after this sequence of operations in a new repository before the first commit. Diffing against HEAD (once a commit exists) or other trees might behave differently.
@Byron
Copy link
Member

Byron commented May 12, 2025

Thanks a lot for reporting, as well as the exhaustive description with a reproducer.

It appears that index.diff(None) doesn't make a call to Git or else I'd expect it to pick up the change. However, I also don't recall to ever have implemented diffing itself in GitPython, and even if it was that, the test proves that the index is up-to-date in memory.

A possible fix could include a possibly modified version of the reproduction above in the first commit and a fix in the next one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

2 participants