Skip to content

sinkv2 metric cause a large increase in latency of KAFKA controller #8957

Closed
@zhaoli2333

Description

What did you do?

We use TICDC 6.5.2 to sync data into our KAKFA cluster。There are approximately 4000 topics in our KAKFA cluster and we created 50+ ticdc jobs。

What did you expect to see?

All components run normally。

What did you see instead?

The latency(including produce and consumer latency) of the KAFKA controller increased immediately after we started the jobs.
We checked the authorizer log on the KAFKA controller node and found huge numbers of Topic Describe requests.
After more experiments, we found that every TICDC job tried to describe all the topics in the KAFKA cluster every 5 seconds which caused the controller overload.

After checking the source code, we found that there was an unnecessary operation when the sinkv2 generated metrics by running this:

m.updateBrokers()

which meant to get broker info but triggered unnecessarily describe requests for all topics.

Versions of the cluster

Upstream TiDB cluster version (execute SELECT tidb_version(); in a MySQL client):

(paste TiDB cluster version here)

Upstream TiKV version (execute tikv-server --version):

(paste TiKV version here)

TiCDC version (execute cdc version):

v6.5.2

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    affects-6.1This bug affects the 6.1.x(LTS) versions.affects-6.5This bug affects the 6.5.x(LTS) versions.affects-7.1This bug affects the 7.1.x(LTS) versions.area/ticdcIssues or PRs related to TiCDC.severity/majortype/bugThe issue is confirmed as a bug.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions