Skip to content
This repository was archived by the owner on Mar 27, 2024. It is now read-only.

Use top_level.txt when analyzing pip modules #291

Merged
merged 1 commit into from
Jan 24, 2019

Conversation

nkubala
Copy link
Contributor

@nkubala nkubala commented Jan 23, 2019

Many egg modules contain a top_level.txt file, which contains metadata about the installed module's dependencies. Often the name of the egg module doesn't match up with the name of the directory containing the actual contents (e.g. a module named PyYaml, with its contents in a directory called yaml), so using this file is much more reliable than simple attempting to string match the directory. Additionally, this file gives much greater accuracy when computing the size of a package, especially when a package implicitly includes other dependencies.

Partially addresses #281

// Retrieves size for actual package/script corresponding to each dist-info metadata directory
// by taking the file entry alphabetically before it (for a package) or after it (for a script)
// First, try and use the "top_level.txt",
// Many egg packages contains a "top_level.txt" file describing the directories containing the
Copy link

@weakcamel weakcamel Jan 24, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://setuptools.readthedocs.io/en/latest/formats.html

The minimum project metadata that all eggs must have is a standard Python PKG-INFO file, named PKG-INFO and placed within the metadata directory appropriate to the format.
...
In addition to the PKG-INFO file, an egg’s metadata directory may also include files and directories representing various forms of optional standard metadata ...

And
https://www.python.org/dev/peps/pep-0427/#the-dist-info-directory

  1. Wheel .dist-info directories include at a minimum METADATA, WHEEL, and RECORD.
  2. METADATA is the package metadata, the same format as PKG-INFO as found at the root of sdis

So from the sound of if, it's either one or the other.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice, these could definitely be useful for getting the package name. however I don't see anywhere in the METADATA files the list of dependencies. that said, I do see what looks likes a total list of files in RECORD....this could be useful for wheels, but for eggs PKG-INFO still doesn't contain a list of dependencies. I think trying the top_level.txt is still the right way to go here.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, eggs are pretty oldschool/inconsistent/a bit horrid.
No worries, maybe not as bulletproof, but reading top_level.txt is still an improvement and as you say, if it gives you extra information needed - it sounds like a good pragmatic choice.

@nkubala nkubala merged commit f6af597 into GoogleContainerTools:master Jan 24, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants