Description
Description:
An error occurs when setIntfIp attempts to assign an IP address to a PortChannel before its initialization is complete. This indicates a race condition between swss (which assigns the IP) and teamd (which creates the PortChannel).
Observed Behavior:
In the logs, we see that setIntfIp fails when trying to assign an IP to PortChannel108:
2025 Feb 25 04:45:11.722046 arc-switch1004 ERR swss#intfmgrd: :- setIntfIp: Command '/sbin/ip address "add" "10.0.0.70/31" dev "PortChannel108"' failed with rc 2
Subsequent log entries confirm that PortChannel108 had not been fully initialized when setIntfIp was executed:
2025 Feb 25 04:45:11.889335 arc-switch1004 INFO teamd#supervisord: teammgrd Using team device "PortChannel108".
2025 Feb 25 04:45:12.031833 arc-switch1004 WARNING teamd#tlm_teamd: :- try_add_lag: Can't connect to teamd LAG='PortChannel108', error='No such file or directory'. attempt=1
These logs indicate that PortChannel108 was still in the process of being created when setIntfIp was executed.
Later, we see that PortChannel108 becomes fully initialized only after setIntfIp had already failed:
2025 Feb 25 04:45:12.166107 arc-switch1004 NOTICE teamd#teammgrd: :- addLag: Start port channel PortChannel108 with teamd
2025 Feb 25 04:45:12.166314 arc-switch1004 NOTICE teamd#teammgrd: :- setLagAdminStatus: Set port channel PortChannel108 admin status to up
Root Cause:
There is a timing issue where setIntfIp executes before teamd has fully initialized the PortChannel.
Reproduction Frequency:
This issue reproduces very rarely. It was observed only once in our setup (SN2700) and occurred on a system with a weak CPU, suggesting that timing variations due to system performance may contribute to the issue.
Expected Behavior:
setIntfIp should only attempt to assign an IP address after the PortChannel has been fully initialized by teamd.
Activity