Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[doc]: High frequency counter HLD #1795

Open
wants to merge 19 commits into
base: master
Choose a base branch
from
Open

Conversation

Pterosaur
Copy link
Contributor

@Pterosaur Pterosaur commented Sep 1, 2024

High frequency counter HLD initial version

@Pterosaur Pterosaur force-pushed the hfc branch 3 times, most recently from bc0e247 to c14e643 Compare September 3, 2024 06:38
### Architecture Design

This section covers the changes that are required in the SONiC architecture. In general, it is expected that the current architecture is not changed.
This section should explain how the new feature/enhancement (module/sub-module) fits in the existing architecture.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here. these "guidance" words are better to be removed. very confusing to read.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, Yes, there are many template "guidance" words I haven't removed them.


#### Modules ####

##### Netlink Module #####
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i know the numbered prefix are not there in the template, but the doc will be much more readable with it. we should consider add them here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @zhangyanzhao , Do you have any suggestions or concerns about adding a number prefix to each title in the template?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @zhangyanzhao , Do you have any suggestions or concerns about adding a number prefix to each title in the template?

Good suggestion. Updated the template. Github markdown does not support auto number yet, need community member to update it manually when creating HLD doc with the template.


Pin CPU?

![netlink_dma_channel](netlink_dma_channel.drawio.svg)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we still uses the chunk size concept in our API?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we have.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or leverage the existing solution, time-based reporting?

Signed-off-by: Ze Gan <[email protected]>
Signed-off-by: Ze Gan <[email protected]>
@Pterosaur Pterosaur force-pushed the hfc branch 8 times, most recently from 290f7f9 to 0604b27 Compare September 9, 2024 12:33
@Pterosaur Pterosaur requested a review from r12f September 9, 2024 12:33
@Pterosaur Pterosaur marked this pull request as ready for review September 9, 2024 12:33
@Pterosaur Pterosaur requested a review from lguohan September 9, 2024 12:34
Signed-off-by: Ze Gan <[email protected]>

- For high-frequency counters, the native IPFIX timestamp unit of seconds is insufficient. Therefore, we introduce an additional element, `observationTimeNanoseconds`, for each record to meet our requirements.
- The enterprise bit is always set to 1 for stats records.
- The element ID of IPFIX is derived from the object index. For example, for `Ethernet5`, the element ID will be `0x5 | 0x8000 = 0x8005`.
Copy link
Contributor

@marian-pritsak marian-pritsak Sep 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having a variable element ID per SAI object will blow up the number of templates in the report (separate template ID per port, queue etc.).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my mind, I assume all stats(per port, queue etc) in one profile will be encoded into ONE template. I will generate a unique template ID for a profile.


### Data format

We will use IPFIX as the report format, with all numbers in the IPFIX message in network-order (Big-endian).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You cannot make the assumption that data will be Big- or Little-endian. If it is collected by the ASIC and DMAd to a ring buffer, then kernel driver would need to correct the endianness of each data set. We need to have a bit in the element ID or Enterprise number that would indicate endianness.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you worried that endian conversion will consume too many CPU clocks?

```
STREAM_TELEMETRY_GROUP|{{profile_name}}|{{group_name}}
"object_names": {{list of object name}}
"object_counters": {{list of stats of object}}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you plan to handle the update of the counter list or object list from the report POV? is it going to be a new template ID?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I assume we will use a new template ID.
We treat any configuration update as a new profile. The previous session will be interrupted. The SAI report object will get a new SAI_TAM_REPORT_ATTR_REPORT_IPFIX_TEMPLATE_ID

Signed-off-by: Ze Gan <[email protected]>
Signed-off-by: Ze Gan <[email protected]>
@mssonicbld
Copy link
Collaborator

/azp run

Copy link

linux-foundation-easycla bot commented Jan 15, 2025

CLA Signed

The committers listed above are authorized under a signed CLA.

Copy link

No pipelines are associated with this pull request.

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

No pipelines are associated with this pull request.

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

No pipelines are associated with this pull request.

Pterosaur and others added 2 commits January 15, 2025 21:29
Signed-off-by: Ze Gan <[email protected]>

Move SAI section to OCP reference

Signed-off-by: Ze Gan <[email protected]>
@mssonicbld
Copy link
Collaborator

/azp run

Copy link

No pipelines are associated with this pull request.

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

No pipelines are associated with this pull request.

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

No pipelines are associated with this pull request.

@Pterosaur
Copy link
Contributor Author

Hi @r12f , Please help to review or approve it if everything is good to you

@r12f
Copy link
Contributor

r12f commented Jan 21, 2025

Looks good! But somehow I could not approve this due to a GitHub bug...

Copy link
Contributor

@r12f r12f left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, the approval works now. There is an old comment somehow getting stuck and blocking the sign off. It works now.

@@ -0,0 +1,647 @@
# Stream telemetry high level design <!-- omit in toc -->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Stream telemetry" is a term already used by sonic gnmi based telemetry implementation. Suggest you replace it with another term like "high frequency counters telemetry" (just an example).

@Pterosaur Pterosaur changed the title [doc]: Stream Telemetry HLD [doc]: High frequency counter telemetry HLD Mar 12, 2025
@Pterosaur Pterosaur changed the title [doc]: High frequency counter telemetry HLD [doc]: High frequency counter HLD Mar 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: 📋 In Plan Features
Development

Successfully merging this pull request may close these issues.

6 participants