Skip to content

[swss] Orchagent terminated by SIGHUP because logrotate sent SIGHUP on boot after 202405->202411 warm upgrade #21962

Open
@volodymyrsamotiy

Description

@volodymyrsamotiy

Description
After upgrading from 202405 to 202411 image, during boot to new image, orchagent was terminated by SIGHUP.
Not sure if it is related to warm-reboot or upgrade flow, probably it is generic issue, but we reproduced it during warm upgrade.
Please note that this issue happened just once so far.

In syslog there is indication that there was some BASH error in logrotate script:

2025 Mar  2 03:40:02.291520 sonic INFO logrotate[8512]: logrotate_script: 10: [: -gt: unexpected operator

After that logrotate sent SIGHUP to orchagent:

2025 Mar  2 03:40:02.294484 sonic INFO logrotate: Sending SIGHUP to OA log_file_name: /var/log/swss/swss.rec

As a result orchagent exited:

2025 Mar  2 03:40:02.393109 sonic INFO swss#supervisord 2025-03-02 03:40:02,381 WARN exited: orchagent (terminated by SIGHUP; not expected)

Looks the error happened in the script that is defined for postrotate action in /etc/logrotate.d/rsyslog configuration file:
https://github.com/sonic-net/sonic-buildimage/blob/master/files/image_config/logrotate/rsyslog.j2#L118

    postrotate
        if [ $(echo $1 | grep -c "/var/log/swss/") -gt 0 ]; then
            # for multi asic platforms, there are multiple orchagents
            # send the SIGHUP only to the orchagent the which needs log file rotation
            PLATFORM=`sonic-cfggen -H -v DEVICE_METADATA.localhost.platform`
            ASIC_CONF=/usr/share/sonic/device/$PLATFORM/asic.conf
            if [ -f "$ASIC_CONF" ]; then
                . $ASIC_CONF
            fi
            if [ $NUM_ASIC -gt 1 ]; then
                log_file=$1
                log_file_name=${log_file#/var/log/swss/}
                logger -p syslog.info -t "logrotate" "Sending SIGHUP to OA log_file_name: $log_file_name"
                pgrep -xa orchagent | grep $log_file_name | awk '{ print $1; }' | xargs /bin/kill -HUP 2>/dev/null || true
            else
                logger -p syslog.info -t "logrotate" "Sending SIGHUP to OA log_file_name: $1"
                pgrep -x orchagent | xargs /bin/kill -HUP 2>/dev/null || true
            fi
        else
            if [ -f /var/run/rsyslogd.pid ]; then
                /bin/kill -HUP $(cat /var/run/rsyslogd.pid)
            fi
        fi
    endscript

Steps to reproduce the issue:
No specific steps to reproduce, issue happened just once so far.
It looks like generic statistical issue related to logrotate.
But we reproduced it during warm upgrade from 202404 to 202411.

Describe the results you received:
Orchagent terminated by SIGHUP because logrotate sent SIGHUP:

2025 Mar  2 03:40:02.291520 sonic INFO logrotate[8512]: logrotate_script: 10: [: -gt: unexpected operator
2025 Mar  2 03:40:02.294484 sonic INFO logrotate: Sending SIGHUP to OA log_file_name: /var/log/swss/swss.rec
2025 Mar  2 03:40:02.389481 sonic INFO systemd[1]: logrotate.service: Deactivated successfully.
2025 Mar  2 03:40:02.389594 sonic INFO systemd[1]: Finished logrotate.service - Rotate log files.
2025 Mar  2 03:40:02.393109 sonic INFO swss#supervisord 2025-03-02 03:40:02,381 WARN exited: orchagent (terminated by SIGHUP; not expected)
2025 Mar  2 03:40:02.398651 sonic INFO swss#supervisor-proc-exit-listener: Process 'orchagent' exited unexpectedly. Terminating supervisor 'swss'

Describe the results you expected:
Logrotate should not send SIGHUP to orchagent

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions