-
Notifications
You must be signed in to change notification settings - Fork 101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix reconciliation on Endpoints
's subset changes
#1674
Conversation
Endpoints
updatesEndpoints
's subset changes
d0bc6f7
to
f3c225e
Compare
The idea was to still trigger reconciliation cycles if resources changes by checking if there was a `generation` update. https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md#metadata > generation: a sequence number representing a specific generation of the desired state. Set by the system and monotonically increasing, per-resource. May be compared, such as for RAW and WAW consistency. However, the change caused a degradation in `Endpoints` resources. These resources don't have generation because they don't have a status. As a result, a reconciliation was only triggered on creation and deletion, but not on subset update. This degradated also the behavior of the k8gb controller. The controller became too slow monitoring the health of applications since `Endpoints` updates no longer triggered a reconcile of the GSLB resource. This lead to slow failovers and failing e2e tests. The solution proposed in this PR is to always trigger the `endpointMapHandler` function when an `Endpoint` resource changes. This is the same behavior as before k8gb-io#1652. Signed-off-by: Andre Baptista Aguas <[email protected]>
@@ -169,7 +169,6 @@ func (r *GslbReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl. | |||
return result.RequeueError(fmt.Errorf("getting GSLB servers (%s)", err)) | |||
} | |||
gslb.Status.Servers = servers | |||
fmt.Printf("got servers: %v\n", servers) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Debug statement that was forgotten from a previous commit
@@ -61,6 +61,7 @@ func (a *logrSinkAdapter) Info(_ int, msg string, keysAndValues ...interface{}) | |||
a.WithValues(keysAndValues) | |||
if a.name != "" { | |||
a.z.Info().Msgf("%s: %s %s", a.name, msg, a.valuesAsJSON()) | |||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The lack of this return was leading to duplicated log lines
@@ -27,8 +27,6 @@ import ( | |||
) | |||
|
|||
func TestWeightsExistsInLocalDNSEndpoint(t *testing.T) { | |||
t.Parallel() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There were some random failures in this test, I had to remove parallelism to have it stable. One cluster was taking too long discovering the DNS entries of the other cluster, but I am not sure why.
Before:
--- FAIL: TestWeightsExistsInLocalDNSEndpoint (337.85s)
--- FAIL: TestWeightsExistsInLocalDNSEndpoint/with_gslb_only (245.38s)
--- PASS: TestWeightsExistsInLocalDNSEndpoint/with_ref (92.48s)
After:
--- PASS: TestWeightsExistsInLocalDNSEndpoint (195.55s)
--- PASS: TestWeightsExistsInLocalDNSEndpoint/with_gslb_only (128.23s)
--- PASS: TestWeightsExistsInLocalDNSEndpoint/with_ref (67.32s)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, thanks, I'm glad you found the issue. I'm a bit worried that in the future maybe the tests will crash again at a different point and parallelism will be the cause. Anyway, thanks and happy to see the correction.
Hopefully when we try to revamp the tests we can migrate them one by one to the new framework and can get to the bottom of these parallelism issues |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great catch
The idea was to still trigger reconciliation cycles if resources changes by checking if there was a
generation
update. https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md#metadataHowever, the change caused a degradation watching
Endpoints
resources. These resources don't have ageneration
because they don't have astatus
. As a result, a reconciliation was only triggered on creation and deletion, but not on subset update. This caused the controller to become too slow reacting toEndpoints
changes, thus leading to slow failovers and failing e2e tests if the GSLB reconcile timer didn't fall during the expected window.The solution proposed in this PR is to always trigger the
endpointMapHandler
function when anEndpoint
resource changes. This is the same behavior as before #1652.In addition, the bind image
internetsystemsconsortium/bind9:9.16
pulled from theedgedns
cluster was no longer available, so it is updated to version9.18
: