Skip to content

Commit

Permalink
[logrotate]: Separated sort and head commands to prevent broken pipe …
Browse files Browse the repository at this point in the history
…issues (#19699)

Why I did it
Currently the logrotate script is susceptible to SIGPIPE issue because of a race condition between the sort and head commands that leads to the following error:

Jun 22 03:20:00.331117 sonic-switch INFO logrotate[1494655]: sort: write failed: 'standard output': Broken pipe
Jun 22 03:20:00.331207 sonic-switch INFO logrotate[1494655]: sort: write error
This is because if the sort command has not yet finished writing to stdout and head has finished executing and closes the pipe. Subsequent writes by the previously issued sort command would fail because of the closed pipe.

Work item tracking
Microsoft ADO (number only): 28511764
How I did it
I refactored the code in a way that removes the call to head after sort.

How to verify it
I created a bash script that induces a write failed: Broken pipe error on account of a sort | head combination and verified the SIGPIPE error code. I then removed the head command and replaced it with a call to awk and re-ran the script. This time there was no SIGPIPE.
The script:

#!/usr/bin/bash

var0=$(strace -f -e execve,write bash -c 'ls -l /usr/bin/ /var/lib /usr/lib | sort | head -n 20 > /dev/null')

echo ""
echo "Removed head and modified awk:"
echo ""

var1=$(strace -f -e execve,write bash -c 'ls -l /usr/bin/ /var/lib /usr/lib | sort | awk "NR == 1 {print $2}"')

The output

execve("/usr/bin/bash", ["bash", "-c", "ls -l /usr/bin/ /var/lib /usr/li"...], 0x7ffc34b52f98 /* 17 vars */) = 0
strace: Process 137372 attached
strace: Process 137373 attached
strace: Process 137374 attached
[pid 137373] execve("/usr/bin/sort", ["sort"], 0x55be58b00350 /* 17 vars */ <unfinished ...>
[pid 137372] execve("/usr/bin/ls", ["ls", "-l", "/usr/bin/", "/var/lib", "/usr/lib"], 0x55be58b00350 /* 17 vars */ <unfinished ...>
[pid 137373] <... execve resumed>)      = 0
[pid 137372] <... execve resumed>)      = 0
[pid 137374] execve("/usr/bin/head", ["head", "-n", "20"], 0x55be58b00350 /* 17 vars */) = 0
[pid 137372] write(1, "/usr/bin/:\ntotal 341826\n-rwxr-xr"..., 4096) = 4096
[pid 137372] write(1, "     30960 Jan 20  2022 colrm\n-r"..., 4096) = 4096
[pid 137372] write(1, "-xr-x 1 root root        1866 De"..., 4096) = 4096
[pid 137372] write(1, "2022 gpgv\n-rwxr-xr-x 1 root root"..., 4096) = 4096
[pid 137372] write(1, " 1 root root      315136 Oct  2 "..., 4096) = 4096
[pid 137372] write(1, "ewgrp\n-rwxr-xr-x 1 root root    "..., 4096) = 4096
[pid 137372] write(1, "-rwxr-xr-x 1 root root        43"..., 4096) = 4096
[pid 137372] write(1, "eb 27  2021 screen\n-rwxr-xr-x 1 "..., 4096) = 4096
[pid 137372] write(1, "672 Jun 18  2023 systemd-ask-pas"..., 4096) = 4096
[pid 137372] write(1, "zselect\n-rwxr-xr-x 1 root root  "..., 4096) = 4096
[pid 137372] write(1, "r-xr-x 1 root root        2081 A"..., 4096) = 4096
[pid 137372] write(1, " 115 Jun 19 08:44 pam\ndrwxr-xr-x"..., 533) = 533
[pid 137372] +++ exited with 0 +++
[pid 137373] write(1, "\n\ndrwx------  2 root root 4096 J"..., 4096) = 4096
[pid 137373] write(1, "xrwx 1 root root          12 Jul"..., 4096) = 4096
[pid 137374] write(1, "\n\ndrwx------  2 root root 4096 J"..., 942 <unfinished ...>
[pid 137373] write(1, "Feb  2  2021 ping6 -> ping\nlrwxr"..., 4096 <unfinished ...>
[pid 137374] <... write resumed>)       = 942
[pid 137373] <... write resumed>)       = -1 EPIPE (Broken pipe)
[pid 137373] --- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=137373, si_uid=1000} ---
[pid 137374] +++ exited with 0 +++
[pid 137373] --- SIGPIPE {si_signo=SIGPIPE, si_code=SI_TKILL, si_pid=137373, si_uid=1000} ---
[pid 137373] +++ killed by SIGPIPE +++
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=137372, si_uid=1000, si_status=0, si_utime=0, si_stime=4} ---
+++ exited with 0 +++

Removed head and modified awk:

execve("/usr/bin/bash", ["bash", "-c", "ls -l /usr/bin/ /var/lib /usr/li"...], 0x7fff94a24d18 /* 17 vars */) = 0
strace: Process 137379 attached
strace: Process 137380 attached
strace: Process 137381 attached
[pid 137379] execve("/usr/bin/ls", ["ls", "-l", "/usr/bin/", "/var/lib", "/usr/lib"], 0x55d437568350 /* 17 vars */ <unfinished ...>
[pid 137380] execve("/usr/bin/sort", ["sort"], 0x55d437568350 /* 17 vars */ <unfinished ...>
[pid 137379] <... execve resumed>)      = 0
[pid 137380] <... execve resumed>)      = 0
[pid 137381] execve("/usr/bin/awk", ["awk", "NR == 1 {print }"], 0x55d437568350 /* 17 vars */) = 0
[pid 137379] write(1, "/usr/bin/:\ntotal 341826\n-rwxr-xr"..., 4096) = 4096
[pid 137379] write(1, "     30960 Jan 20  2022 colrm\n-r"..., 4096) = 4096
[pid 137379] write(1, "-xr-x 1 root root        1866 De"..., 4096) = 4096
[pid 137379] write(1, "2022 gpgv\n-rwxr-xr-x 1 root root"..., 4096) = 4096
[pid 137379] write(1, " 1 root root      315136 Oct  2 "..., 4096) = 4096
[pid 137379] write(1, "ewgrp\n-rwxr-xr-x 1 root root    "..., 4096) = 4096
[pid 137379] write(1, "-rwxr-xr-x 1 root root        43"..., 4096) = 4096
[pid 137379] write(1, "eb 27  2021 screen\n-rwxr-xr-x 1 "..., 4096) = 4096
[pid 137379] write(1, "672 Jun 18  2023 systemd-ask-pas"..., 4096) = 4096
[pid 137379] write(1, "zselect\n-rwxr-xr-x 1 root root  "..., 4096) = 4096
[pid 137379] write(1, "r-xr-x 1 root root        2081 A"..., 4096) = 4096
[pid 137379] write(1, " 115 Jun 19 08:44 pam\ndrwxr-xr-x"..., 533) = 533
[pid 137379] +++ exited with 0 +++
[pid 137380] write(1, "\n\ndrwx------  2 root root 4096 J"..., 4096) = 4096
[pid 137380] write(1, "xrwx 1 root root          12 Jul"..., 4096) = 4096
[pid 137380] write(1, "Feb  2  2021 ping6 -> ping\nlrwxr"..., 4096) = 4096
[pid 137380] write(1, "t       10803 Sep 24  2021 pod2t"..., 4096) = 4096
[pid 137380] write(1, "22 colcrt\n-rwxr-xr-x 1 root root"..., 4096) = 4096
[pid 137380] write(1, "  2022 zmore\n-rwxr-xr-x 1 root r"..., 4096) = 4096
[pid 137380] write(1, "6 Jun 19 08:47 TS\n-rwxr-xr-x 1 r"..., 4096) = 4096
[pid 137380] write(1, "oot       35048 Jan 20  2022 tas"..., 4096) = 4096
[pid 137380] write(1, "     4392 Sep 24  2021 ptargrep\n"..., 4096) = 4096
[pid 137380] write(1, "-xr-x 1 root root       56192 Se"..., 4096) = 4096
[pid 137380] write(1, "    7285 Nov 24  2016 boot-eos\n-"..., 4096) = 4096
[pid 137380] write(1, "ostname-config.sh\n-rwxr-xr-x 2 r"..., 533) = 533
[pid 137381] write(1, "\n", 1)          = 1
[pid 137380] +++ exited with 0 +++
[pid 137381] +++ exited with 0 +++
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=137379, si_uid=1000, si_status=0, si_utime=1, si_stime=5} ---
+++ exited with 0 +++

I created 30 dummy archive files in a dev environment and then ran the modified script to verify that the OLDEST_ARCHIVE_FILE is the archive file that was created first:
The setup:

admin@sonic:~/testpipe/var/log$ pwd
/home/admin/testpipe/var/log

admin@sonic:~/testpipe/var/log$ ls -alrth
total 8.0K
drwxr-xr-x 3 admin admin 4.0K Jul 25 17:12 ..
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.1.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.2.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.3.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.5.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.4.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.6.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.7.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.9.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.8.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.10.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.11.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.12.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.13.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.14.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.15.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.16.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.17.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.18.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.19.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.20.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.21.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.22.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.23.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.24.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.25.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.26.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.27.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.28.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.29.gz
-rw-r--r-- 1 admin admin    0 Jul 25 17:17 syslog.30.gz
drwxr-xr-x 2 admin admin 4.0K Jul 25 17:17 .

The script:

admin@sonic:~/testpipe$ cat sort_test.sh
#!/usr/bin/bash

SORTED_ARCHIVE_FILES=$(find ./var/log -type f -printf '%T+ %p\n' | grep -E '.+\.[0-9]+(\.gz)?$' | sort )
OLDEST_ARCHIVE_FILE=$(echo "${SORTED_ARCHIVE_FILES}" | awk 'NR == 1 { print $2; }')
echo $OLDEST_ARCHIVE_FILE



The output:

admin@sonic:~/testpipe$ ./sort_test.sh
./var/log/syslog.1.gz
  • Loading branch information
assrinivasan authored and gechiang committed Jul 27, 2024
1 parent 041d03a commit 34dffde
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion files/image_config/logrotate/rsyslog.j2
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,7 @@
if [ $USED_KB -lt $THRESHOLD_KB ]; then
break
else
OLDEST_ARCHIVE_FILE=$(find /var/log -type f -printf '%T+ %p\n' | grep -E '.+\.[0-9]+(\.gz)?$' | sort | head -n 1 | awk '{ print $2; }')
OLDEST_ARCHIVE_FILE=$(find /var/log -type f -printf '%T+ %p\n' | grep -E '.+\.[0-9]+(\.gz)?$' | sort | awk 'NR == 1 {print $2}')

if [ -z "$OLDEST_ARCHIVE_FILE" ]; then
logger -p syslog.err -t "logrotate" "No archive file to delete -- potential for filling up /var/log partition!"
Expand Down

0 comments on commit 34dffde

Please sign in to comment.