-
Notifications
You must be signed in to change notification settings - Fork 1.2k
⚠️ [Warm Replicas] Implement warm replica support for controllers. #3192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: godwinpang The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Hi @godwinpang. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/ok-to-test |
5d1be84
to
5db61c7
Compare
/retest |
@@ -439,6 +439,11 @@ func (cm *controllerManager) Start(ctx context.Context) (err error) { | |||
return fmt.Errorf("failed to start other runnables: %w", err) | |||
} | |||
|
|||
// Start and wait for sources to start. | |||
if err := cm.runnables.Warmup.Start(cm.internalCtx); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should come right after the caches and the comment shouldn't make assumptions about what the Warmup internally does.
The other issue: This needs to block until the Warmup
has terminated, otherwise we may end up starting the controller before the sources are started, as the check we added only checks if we started to start the sources, not if we finished doing so
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will change to coming right after the caches, but don't completely understand the significance of having it after vs. before the non-leader election runnables; mind explaining a bit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to block until the Warmup has terminated, otherwise we may end up starting the controller before the sources are started, as the check we added only checks if we started to start the sources, not if we finished doing so
Is this a big problem? Replicas will only start the controller after they win leader election so I don't see an issue in the leader election failover case; are you saying that in non-leader election cases the behavior of warmup should be that it completely blocks controller startup?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will change to coming right after the caches, but don't completely understand the significance of having it after vs. before the non-leader election runnables; mind explaining a bit?
Just so it starts asap
Is this a big problem? Replicas will only start the controller after they win leader election so I don't see an issue in the leader election failover case; are you saying that in non-leader election cases the behavior of warmup should be that it completely blocks controller startup?
Yes. The expectation in the controller is that the sources are fully started before the rest of the controller starts. If we don't ensure this actually finished before we start the controller, we violate that expectation.
Additionally, there may be other cases where ppl implement this interface and it seems surprising to me that we wouldn't let the warmup finish before we start.
If we wanted to be really fancy, we could do this per-runnable (i.E. not block all runnables on Warmup
having finished, just the ones whose Warmup
didn't finish) but that is probably more involved.
5db61c7
to
be1b1c2
Compare
/retest |
/retest |
/retest |
/retest |
e05677a
to
1d07efc
Compare
/retest |
1 similar comment
/retest |
1c32d37
to
7cc29dc
Compare
/retest |
7cc29dc
to
43118a3
Compare
/retest |
1 similar comment
/retest |
@@ -93,6 +93,13 @@ type TypedOptions[request comparable] struct { | |||
// | |||
// Note: This flag is disabled by default until a future version. It's currently in beta. | |||
UsePriorityQueue *bool | |||
|
|||
// NeedWarmup specifies whether the controller should start its sources when the manager is not | |||
// the leader. When set to true, the controller will not restart its sources when/if it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please explain what reasons there might be to use this and the restart
bit is incorrect, it would be start
. Please also use the same explanation in pkg/config/controller
// Warmup implements the manager.WarmupRunnable interface. | ||
func (c *Controller[request]) Warmup(ctx context.Context) error { | ||
if c.NeedWarmup == nil || !*c.NeedWarmup { | ||
c.didEventSourcesFinishSync.Store(ptr.To(true)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It didn't finish sync, why are we storing true
in here?
@@ -1014,6 +1014,82 @@ var _ = Describe("controller", func() { | |||
}) | |||
}) | |||
}) | |||
|
|||
Describe("Warmup", func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a test where Warmup
is called but NeedsWarmup
is disabled that validates that nothing happens in Warmup
and that the sources do get started when Start
is called
|
||
// WaitForWarmupComplete is a blocking function that waits for the warmup to be completed. It | ||
// returns false if it could not successfully finish warmup. | ||
WaitForWarmupComplete(context.Context) bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand, why can't we instead wait for Warmup
to return for this check?
This change implements the proposal for warm replicas as proposed in #3121.
It adds a
NeedWarmUp
option for controllers to optionally start as warmed replicas.Note for reviewers: This draft PR is feature complete with tests but the main purpose is to make sure that things are on the right track. Some of the naming / comments are a bit inconsistent, I will address them in a followup cleanup tomorrow.
Builds upon #3190.