Skip to content

build,cache: support remote cache sources ( via --cache-to, --cache-from ) and introduce cacheKey/layerKey #4118

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

flouthoc
Copy link
Collaborator

@flouthoc flouthoc commented Jul 18, 2022

Following commit

  • Initiates cacheKey or layerKey for intermediate images generated
    for layers. ( only used when working with remote sources )
  • Allows end users to upload cached layers with cacheKey to remote
    sources using --cache-to. --cache-to is a optional flag to be used
    with buildah build which publishes cached layers to remote sources.
  • Allows end users to use cached layers from remote sources with
    --cache-from. --cache-from is a optional flag to be used with
    buildah build and it pulls cached layers from remote sources in a step
    by step manner only if is a valid cache hit.

Example

  • Populate cache source or use cached layers if already present
buildah build -t test --layers --cache-to registry/myrepo/cache --cache-from registry/myrepo/cache .

Future Improvements:

  • Also support pushing cache to other storage units rather than conventional container registry like object storage ( AWS S3, Redis, Minio etc )
  • cacheKey or layerKey model is only being used when working with
    remote sources however local cache lookup can be also optimized if its
    is altered to use cacheKey model instead of iterating through all the
    images in local storage. As discussed here: LayerKey model for the local cache #3406

References:

Closes: #620

Does this PR introduce a user-facing change?

build: adds supports for --cache-to i.e alllow pushing cache layers to remote source
build: adds support for --cache-from i.e allow using cache layers from remote sources

@flouthoc flouthoc force-pushed the support-remote-cache branch 4 times, most recently from a839f96 to e956848 Compare July 18, 2022 06:43
@flouthoc
Copy link
Collaborator Author

@nalind PTAL

@flouthoc flouthoc requested review from nalind and rhatdan July 18, 2022 07:46
Copy link
Member

@nalind nalind left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks generally good.
We need to be very precise in use of terminology in documentation. People already get confused about the difference between containers and images when they're starting out with containers, and we don't want to muddy things any further.
Question: when we have a local cache hit, why don't we try to push to a --cache-to location? The image which counted as a cache hit may not have come from there.

@flouthoc flouthoc force-pushed the support-remote-cache branch 2 times, most recently from 5d52fbf to 7408074 Compare July 19, 2022 06:13
@flouthoc
Copy link
Collaborator Author

flouthoc commented Jul 19, 2022

Question: when we have a local cache hit, why don't we try to push to a --cache-to location?

@nalind My initial idea was to save resources by only pushing for steps which are just built and not which are already present in local storage. But upon thinking again I think it will be right to push everytime except for the case when intermediate image was just pulled for the step.

So amended source and test case to match:

  • If --cache-to is specified always push intermediate image expect for the case when intermediate is just pulled for the STEP which is currently being evaluated.

@flouthoc flouthoc requested a review from nalind July 19, 2022 06:33
@mtrmac
Copy link
Contributor

mtrmac commented Jul 19, 2022

Also support pushing cache to other storage units rather than conventional container registry like object storage ( AWS S3, Redis, Minio etc )

What do you anticipate the API, and UI, to look like? Entirely new options, or somehow overloading the semantics of the existing input strings (and if so, how was it designed to avoid conflicts?)

Copy link
Contributor

@mtrmac mtrmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m afraid this is a rather sloppy review, I didn’t quite take the time to fully understand the extensive logic dealing with various alternative possibilities.

Highlights:

  • Should the cache locations be c/image generic names at all, or strictly registry repositories?
  • Don’t use reportWriter output to make decisions.

@flouthoc
Copy link
Collaborator Author

flouthoc commented Jul 19, 2022

Also support pushing cache to other storage units rather than conventional container registry like object storage ( AWS S3, Redis, Minio etc )

What do you anticipate the API, and UI, to look like? Entirely new options, or somehow overloading the semantics of the existing input strings (and if so, how was it designed to avoid conflicts?)

@mtrmac I am not sure about actual implementation of this yet but I just wrote it cause it looked like a nice idea to be able to allow buildah to distribute cache to other storage backends like redis, s3 etc where is it more easier to control eviction of least used cache, scaling etc. At first instance i think implementation of new transports at c/image which can encapsulate image and put/get from redis, s3 etc but this is just an idea of now.

@flouthoc flouthoc force-pushed the support-remote-cache branch from 7408074 to 04ee66c Compare July 20, 2022 08:11
@flouthoc
Copy link
Collaborator Author

Don’t use reportWriter output to make decisions.

New push is ensures we are not using this anymore. Still some comments are pending but I hope in another push we can get them resolved.

// is ignored and will be automatically logged for --log-level debug
if id, err := s.pullCache(ctx, src); id != "" && err == nil {
logCachePulled(src)
cacheID, err = s.intermediateImageExists(ctx, node, addedContentSummary, s.stepRequiresLayer(step))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW it might be worthwhile to extract the “does $candidateImage match the current image and next step” logic from intermediateImageExists into a separate function, and here call it only with the just-pulled image.

intermediateImageExists seems to do several a few c/storage operations (at least the .Layer call is not cached), each with its own locking syscalls and related overhead, so doing an O(1) check instead of an O(existing images) check would be nice.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Refactorings like this would be best done in separate commits, or maybe deferred to a separate PR.)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will tackle this in a different PR.

flouthoc added a commit to flouthoc/buildah that referenced this pull request Jul 21, 2022
There is almost no reason for us to run through cache iteration twice
just for `COPY` and `ADD` instruction where we can sure that it will
always miss the first pass so optimize cache processing by making sure
that we populate `contentSummary` in the right way so we always end up
hitting cache in first pass.

This should ensure
* We always hit cache in the first pass and so second attempt should be
  made

See discussion here:
* containers#4118 (comment)
* containers#4118 (comment)

Signed-off-by: Aditya R <[email protected]>
@flouthoc flouthoc force-pushed the support-remote-cache branch from 8220053 to bb62b23 Compare July 25, 2022 20:14
@flouthoc flouthoc requested a review from mtrmac July 25, 2022 20:15
@flouthoc
Copy link
Collaborator Author

Pushed updated tests for ADD and COPY case.

Following commit

* Initiates `cacheKey` or `layerKey` for intermediate images generated
  for layers.
* Allows end users to upload cached layers with `cacheKey` to remote
  sources using `--cache-to`. `--cache-to` is a optional flag to be used
with `buildah build` which publishes cached layers to remote sources.
* Allows end users to use cached layers from `remote` sources with
  `--cache-from`. `--cache-from` is a optional flag to be used with
`buildah build` and it pulls cached layers from remote sources in a step
by step manner only if is a valid cache hit.

Example
* Populate cache source or use cached layers if already present
```bash
buildah build -t test --layers --cache-to registry/myrepo/cache --cache-from registry/myrepo/cache .
```

Future:
* `cacheKey` or `layerKey` model is only being used when working with
  remote sources however local cache lookup can be also optimized if its
is altered to use `cacheKey` model instead of iterating through all the
images in local storage. As discussed here

References:
* Feature is quite similar to `kaniko`'s `--cache-repo`: https://github.com/GoogleContainerTools/kaniko#--cache-repo

Closes: issues#620

Signed-off-by: Aditya R <[email protected]>
@flouthoc flouthoc force-pushed the support-remote-cache branch from bb62b23 to 6f660bd Compare July 25, 2022 20:19
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 25, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: flouthoc, mtrmac, rhatdan

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@mtrmac
Copy link
Contributor

mtrmac commented Jul 25, 2022

/lgtm

Thanks!

@rhatdan
Copy link
Member

rhatdan commented Jul 25, 2022

/hold

@TomSweeneyRedHat
Copy link
Member

Happy green test buttons @rhatdan. I'm not going to cancel the hold as I'm not sure if you were holding on tests, or holding until release with this one.

@rhatdan
Copy link
Member

rhatdan commented Jul 26, 2022

/hold cancel

@mukulk2020
Copy link

mukulk2020 commented Jan 2, 2023

@flouthoc : How can i install latest version of buildah in my centos machine so that i can use these parameters "--cache-to, --cache-from" along with buildah bud command . I checked the install.md (https://github.com/containers/buildah/blob/main/install.md) but this will install older version of buildah i.e. 1.11.xxx.
image

Please help me on this so that i can run below command to achieve cache .

Exam : buildah bud -t cache-test --layers --cache-to mukulk/cache-test --cache-from mukulk/cache-test .

@flouthoc
Copy link
Collaborator Author

flouthoc commented Jan 2, 2023

@mukulk2020 Thanks for reaching out it seems package on centos has not been updated for a while, I think @lsm5 and @rhatdan can confirm about who maintains it and who can update it.

@mukulk2020
Copy link

@lsm5 @rhatdan Please check and update the docs and code accordingly..

@flouthoc
Copy link
Collaborator Author

flouthoc commented Jan 2, 2023

I don't think @lsm5 or @rhatdan actually maintains packages on centos but they can guide or tag the right person who can do it. :)

@lsm5
Copy link
Member

lsm5 commented Jan 3, 2023

@mukulk2020 your screenshot tells me you're on CentOS 7. We're not doing updates for CentOS 7 anymore. Please consider switching to centos stream 8 / 9.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

buildah bud --cache-from functionality needed
8 participants