Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[logrotate]: Separated sort and head commands to prevent broken pipe …
…issues (#19699) Why I did it Currently the logrotate script is susceptible to SIGPIPE issue because of a race condition between the sort and head commands that leads to the following error: Jun 22 03:20:00.331117 sonic-switch INFO logrotate[1494655]: sort: write failed: 'standard output': Broken pipe Jun 22 03:20:00.331207 sonic-switch INFO logrotate[1494655]: sort: write error This is because if the sort command has not yet finished writing to stdout and head has finished executing and closes the pipe. Subsequent writes by the previously issued sort command would fail because of the closed pipe. Work item tracking Microsoft ADO (number only): 28511764 How I did it I refactored the code in a way that removes the call to head after sort. How to verify it I created a bash script that induces a write failed: Broken pipe error on account of a sort | head combination and verified the SIGPIPE error code. I then removed the head command and replaced it with a call to awk and re-ran the script. This time there was no SIGPIPE. The script: #!/usr/bin/bash var0=$(strace -f -e execve,write bash -c 'ls -l /usr/bin/ /var/lib /usr/lib | sort | head -n 20 > /dev/null') echo "" echo "Removed head and modified awk:" echo "" var1=$(strace -f -e execve,write bash -c 'ls -l /usr/bin/ /var/lib /usr/lib | sort | awk "NR == 1 {print $2}"') The output execve("/usr/bin/bash", ["bash", "-c", "ls -l /usr/bin/ /var/lib /usr/li"...], 0x7ffc34b52f98 /* 17 vars */) = 0 strace: Process 137372 attached strace: Process 137373 attached strace: Process 137374 attached [pid 137373] execve("/usr/bin/sort", ["sort"], 0x55be58b00350 /* 17 vars */ <unfinished ...> [pid 137372] execve("/usr/bin/ls", ["ls", "-l", "/usr/bin/", "/var/lib", "/usr/lib"], 0x55be58b00350 /* 17 vars */ <unfinished ...> [pid 137373] <... execve resumed>) = 0 [pid 137372] <... execve resumed>) = 0 [pid 137374] execve("/usr/bin/head", ["head", "-n", "20"], 0x55be58b00350 /* 17 vars */) = 0 [pid 137372] write(1, "/usr/bin/:\ntotal 341826\n-rwxr-xr"..., 4096) = 4096 [pid 137372] write(1, " 30960 Jan 20 2022 colrm\n-r"..., 4096) = 4096 [pid 137372] write(1, "-xr-x 1 root root 1866 De"..., 4096) = 4096 [pid 137372] write(1, "2022 gpgv\n-rwxr-xr-x 1 root root"..., 4096) = 4096 [pid 137372] write(1, " 1 root root 315136 Oct 2 "..., 4096) = 4096 [pid 137372] write(1, "ewgrp\n-rwxr-xr-x 1 root root "..., 4096) = 4096 [pid 137372] write(1, "-rwxr-xr-x 1 root root 43"..., 4096) = 4096 [pid 137372] write(1, "eb 27 2021 screen\n-rwxr-xr-x 1 "..., 4096) = 4096 [pid 137372] write(1, "672 Jun 18 2023 systemd-ask-pas"..., 4096) = 4096 [pid 137372] write(1, "zselect\n-rwxr-xr-x 1 root root "..., 4096) = 4096 [pid 137372] write(1, "r-xr-x 1 root root 2081 A"..., 4096) = 4096 [pid 137372] write(1, " 115 Jun 19 08:44 pam\ndrwxr-xr-x"..., 533) = 533 [pid 137372] +++ exited with 0 +++ [pid 137373] write(1, "\n\ndrwx------ 2 root root 4096 J"..., 4096) = 4096 [pid 137373] write(1, "xrwx 1 root root 12 Jul"..., 4096) = 4096 [pid 137374] write(1, "\n\ndrwx------ 2 root root 4096 J"..., 942 <unfinished ...> [pid 137373] write(1, "Feb 2 2021 ping6 -> ping\nlrwxr"..., 4096 <unfinished ...> [pid 137374] <... write resumed>) = 942 [pid 137373] <... write resumed>) = -1 EPIPE (Broken pipe) [pid 137373] --- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=137373, si_uid=1000} --- [pid 137374] +++ exited with 0 +++ [pid 137373] --- SIGPIPE {si_signo=SIGPIPE, si_code=SI_TKILL, si_pid=137373, si_uid=1000} --- [pid 137373] +++ killed by SIGPIPE +++ --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=137372, si_uid=1000, si_status=0, si_utime=0, si_stime=4} --- +++ exited with 0 +++ Removed head and modified awk: execve("/usr/bin/bash", ["bash", "-c", "ls -l /usr/bin/ /var/lib /usr/li"...], 0x7fff94a24d18 /* 17 vars */) = 0 strace: Process 137379 attached strace: Process 137380 attached strace: Process 137381 attached [pid 137379] execve("/usr/bin/ls", ["ls", "-l", "/usr/bin/", "/var/lib", "/usr/lib"], 0x55d437568350 /* 17 vars */ <unfinished ...> [pid 137380] execve("/usr/bin/sort", ["sort"], 0x55d437568350 /* 17 vars */ <unfinished ...> [pid 137379] <... execve resumed>) = 0 [pid 137380] <... execve resumed>) = 0 [pid 137381] execve("/usr/bin/awk", ["awk", "NR == 1 {print }"], 0x55d437568350 /* 17 vars */) = 0 [pid 137379] write(1, "/usr/bin/:\ntotal 341826\n-rwxr-xr"..., 4096) = 4096 [pid 137379] write(1, " 30960 Jan 20 2022 colrm\n-r"..., 4096) = 4096 [pid 137379] write(1, "-xr-x 1 root root 1866 De"..., 4096) = 4096 [pid 137379] write(1, "2022 gpgv\n-rwxr-xr-x 1 root root"..., 4096) = 4096 [pid 137379] write(1, " 1 root root 315136 Oct 2 "..., 4096) = 4096 [pid 137379] write(1, "ewgrp\n-rwxr-xr-x 1 root root "..., 4096) = 4096 [pid 137379] write(1, "-rwxr-xr-x 1 root root 43"..., 4096) = 4096 [pid 137379] write(1, "eb 27 2021 screen\n-rwxr-xr-x 1 "..., 4096) = 4096 [pid 137379] write(1, "672 Jun 18 2023 systemd-ask-pas"..., 4096) = 4096 [pid 137379] write(1, "zselect\n-rwxr-xr-x 1 root root "..., 4096) = 4096 [pid 137379] write(1, "r-xr-x 1 root root 2081 A"..., 4096) = 4096 [pid 137379] write(1, " 115 Jun 19 08:44 pam\ndrwxr-xr-x"..., 533) = 533 [pid 137379] +++ exited with 0 +++ [pid 137380] write(1, "\n\ndrwx------ 2 root root 4096 J"..., 4096) = 4096 [pid 137380] write(1, "xrwx 1 root root 12 Jul"..., 4096) = 4096 [pid 137380] write(1, "Feb 2 2021 ping6 -> ping\nlrwxr"..., 4096) = 4096 [pid 137380] write(1, "t 10803 Sep 24 2021 pod2t"..., 4096) = 4096 [pid 137380] write(1, "22 colcrt\n-rwxr-xr-x 1 root root"..., 4096) = 4096 [pid 137380] write(1, " 2022 zmore\n-rwxr-xr-x 1 root r"..., 4096) = 4096 [pid 137380] write(1, "6 Jun 19 08:47 TS\n-rwxr-xr-x 1 r"..., 4096) = 4096 [pid 137380] write(1, "oot 35048 Jan 20 2022 tas"..., 4096) = 4096 [pid 137380] write(1, " 4392 Sep 24 2021 ptargrep\n"..., 4096) = 4096 [pid 137380] write(1, "-xr-x 1 root root 56192 Se"..., 4096) = 4096 [pid 137380] write(1, " 7285 Nov 24 2016 boot-eos\n-"..., 4096) = 4096 [pid 137380] write(1, "ostname-config.sh\n-rwxr-xr-x 2 r"..., 533) = 533 [pid 137381] write(1, "\n", 1) = 1 [pid 137380] +++ exited with 0 +++ [pid 137381] +++ exited with 0 +++ --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=137379, si_uid=1000, si_status=0, si_utime=1, si_stime=5} --- +++ exited with 0 +++ I created 30 dummy archive files in a dev environment and then ran the modified script to verify that the OLDEST_ARCHIVE_FILE is the archive file that was created first: The setup: admin@sonic:~/testpipe/var/log$ pwd /home/admin/testpipe/var/log admin@sonic:~/testpipe/var/log$ ls -alrth total 8.0K drwxr-xr-x 3 admin admin 4.0K Jul 25 17:12 .. -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.1.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.2.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.3.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.5.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.4.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.6.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.7.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.9.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.8.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.10.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.11.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.12.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.13.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.14.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.15.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.16.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.17.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.18.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.19.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.20.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.21.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.22.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.23.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.24.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.25.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.26.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.27.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.28.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.29.gz -rw-r--r-- 1 admin admin 0 Jul 25 17:17 syslog.30.gz drwxr-xr-x 2 admin admin 4.0K Jul 25 17:17 . The script: admin@sonic:~/testpipe$ cat sort_test.sh #!/usr/bin/bash SORTED_ARCHIVE_FILES=$(find ./var/log -type f -printf '%T+ %p\n' | grep -E '.+\.[0-9]+(\.gz)?$' | sort ) OLDEST_ARCHIVE_FILE=$(echo "${SORTED_ARCHIVE_FILES}" | awk 'NR == 1 { print $2; }') echo $OLDEST_ARCHIVE_FILE The output: admin@sonic:~/testpipe$ ./sort_test.sh ./var/log/syslog.1.gz
- Loading branch information