-
Notifications
You must be signed in to change notification settings - Fork 814
build,cache: support remote cache sources ( via --cache-to
, --cache-from
) and introduce cacheKey/layerKey
#4118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
build,cache: support remote cache sources ( via --cache-to
, --cache-from
) and introduce cacheKey/layerKey
#4118
Conversation
a839f96
to
e956848
Compare
@nalind PTAL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks generally good.
We need to be very precise in use of terminology in documentation. People already get confused about the difference between containers and images when they're starting out with containers, and we don't want to muddy things any further.
Question: when we have a local cache hit, why don't we try to push to a --cache-to
location? The image which counted as a cache hit may not have come from there.
5d52fbf
to
7408074
Compare
@nalind My initial idea was to save resources by only pushing for steps which are just built and not which are already present in local storage. But upon thinking again I think it will be right to push everytime except for the case when intermediate image was just pulled for the So amended source and test case to match:
|
What do you anticipate the API, and UI, to look like? Entirely new options, or somehow overloading the semantics of the existing input strings (and if so, how was it designed to avoid conflicts?) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m afraid this is a rather sloppy review, I didn’t quite take the time to fully understand the extensive logic dealing with various alternative possibilities.
Highlights:
- Should the cache locations be c/image generic names at all, or strictly registry repositories?
- Don’t use
reportWriter
output to make decisions.
@mtrmac I am not sure about actual implementation of this yet but I just wrote it cause it looked like a nice idea to be able to allow buildah to distribute cache to other storage backends like redis, s3 etc where is it more easier to control eviction of least used cache, scaling etc. At first instance i think implementation of new transports at |
7408074
to
04ee66c
Compare
New push is ensures we are not using this anymore. Still some comments are pending but I hope in another push we can get them resolved. |
// is ignored and will be automatically logged for --log-level debug | ||
if id, err := s.pullCache(ctx, src); id != "" && err == nil { | ||
logCachePulled(src) | ||
cacheID, err = s.intermediateImageExists(ctx, node, addedContentSummary, s.stepRequiresLayer(step)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW it might be worthwhile to extract the “does $candidateImage match the current image and next step” logic from intermediateImageExists
into a separate function, and here call it only with the just-pulled image.
intermediateImageExists
seems to do several a few c/storage operations (at least the .Layer
call is not cached), each with its own locking syscalls and related overhead, so doing an O(1) check instead of an O(existing images) check would be nice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Refactorings like this would be best done in separate commits, or maybe deferred to a separate PR.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will tackle this in a different PR.
There is almost no reason for us to run through cache iteration twice just for `COPY` and `ADD` instruction where we can sure that it will always miss the first pass so optimize cache processing by making sure that we populate `contentSummary` in the right way so we always end up hitting cache in first pass. This should ensure * We always hit cache in the first pass and so second attempt should be made See discussion here: * containers#4118 (comment) * containers#4118 (comment) Signed-off-by: Aditya R <[email protected]>
8220053
to
bb62b23
Compare
Pushed updated tests for |
Following commit * Initiates `cacheKey` or `layerKey` for intermediate images generated for layers. * Allows end users to upload cached layers with `cacheKey` to remote sources using `--cache-to`. `--cache-to` is a optional flag to be used with `buildah build` which publishes cached layers to remote sources. * Allows end users to use cached layers from `remote` sources with `--cache-from`. `--cache-from` is a optional flag to be used with `buildah build` and it pulls cached layers from remote sources in a step by step manner only if is a valid cache hit. Example * Populate cache source or use cached layers if already present ```bash buildah build -t test --layers --cache-to registry/myrepo/cache --cache-from registry/myrepo/cache . ``` Future: * `cacheKey` or `layerKey` model is only being used when working with remote sources however local cache lookup can be also optimized if its is altered to use `cacheKey` model instead of iterating through all the images in local storage. As discussed here References: * Feature is quite similar to `kaniko`'s `--cache-repo`: https://github.com/GoogleContainerTools/kaniko#--cache-repo Closes: issues#620 Signed-off-by: Aditya R <[email protected]>
bb62b23
to
6f660bd
Compare
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: flouthoc, mtrmac, rhatdan The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/lgtm Thanks! |
/hold |
Happy green test buttons @rhatdan. I'm not going to cancel the hold as I'm not sure if you were holding on tests, or holding until release with this one. |
/hold cancel |
@flouthoc : How can i install latest version of buildah in my centos machine so that i can use these parameters "--cache-to, --cache-from" along with buildah bud command . I checked the install.md (https://github.com/containers/buildah/blob/main/install.md) but this will install older version of buildah i.e. 1.11.xxx. Please help me on this so that i can run below command to achieve cache . Exam : buildah bud -t cache-test --layers --cache-to mukulk/cache-test --cache-from mukulk/cache-test . |
@mukulk2020 Thanks for reaching out it seems package on centos has not been updated for a while, I think @lsm5 and @rhatdan can confirm about who maintains it and who can update it. |
@mukulk2020 your screenshot tells me you're on CentOS 7. We're not doing updates for CentOS 7 anymore. Please consider switching to centos stream 8 / 9. |
Following commit
cacheKey
orlayerKey
for intermediate images generatedfor layers. ( only used when working with remote sources )
cacheKey
to remotesources using
--cache-to
.--cache-to
is a optional flag to be usedwith
buildah build
which publishes cached layers to remote sources.remote
sources with--cache-from
.--cache-from
is a optional flag to be used withbuildah build
and it pulls cached layers from remote sources in a stepby step manner only if is a valid cache hit.
Example
Future Improvements:
cacheKey
orlayerKey
model is only being used when working withremote sources however local cache lookup can be also optimized if its
is altered to use
cacheKey
model instead of iterating through all theimages in local storage. As discussed here: LayerKey model for the local cache #3406
References:
kaniko
's--cache-repo
: https://github.com/GoogleContainerTools/kaniko#--cache-repoCloses: #620
Does this PR introduce a user-facing change?