[lldb][Linux] Parse, but don't store "comm" from /proc/stat file #100387

DavidSpickett · 2024-07-24T14:26:35Z

As reported in #89710, the %s code used for comm could and probably does, overflow the buffer. Likely we haven't seen it cause problems because the following data is overwritten right afterwards.

Also scanf isn't a great choice here as this comm can include many characters that might trip up %s.

We don't actually use comm, so parse but don't store it so we're not overflowing anything.

As reported in llvm#89710, the %s code used for `comm` could and probably does, overflow the buffer. Likely we haven't seen it cause problems because the following data is overwritten right afterwards. Also scanf isn't a great choice here as this `comm` can include many characters that might trip up %s. We don't actually use `comm`, so parse but don't store it so we're not overflowing anything.

llvmbot · 2024-07-24T14:27:11Z

@llvm/pr-subscribers-lldb

Author: David Spickett (DavidSpickett)

Changes

As reported in #89710, the %s code used for comm could and probably does, overflow the buffer. Likely we haven't seen it cause problems because the following data is overwritten right afterwards.

Also scanf isn't a great choice here as this comm can include many characters that might trip up %s.

We don't actually use comm, so parse but don't store it so we're not overflowing anything.

Full diff: https://github.com/llvm/llvm-project/pull/100387.diff

1 Files Affected:

(modified) lldb/source/Host/linux/Host.cpp (+3-5)

diff --git a/lldb/source/Host/linux/Host.cpp b/lldb/source/Host/linux/Host.cpp
index 5545f9ef4d70e..0bc736d90ea76 100644
--- a/lldb/source/Host/linux/Host.cpp
+++ b/lldb/source/Host/linux/Host.cpp
@@ -51,11 +51,9 @@ enum class ProcessState {
   Zombie,
 };
 
-constexpr int task_comm_len = 16;
-
 struct StatFields {
   ::pid_t pid = LLDB_INVALID_PROCESS_ID;
-  char comm[task_comm_len];
+  // comm
   char state;
   ::pid_t ppid = LLDB_INVALID_PROCESS_ID;
   ::pid_t pgrp = LLDB_INVALID_PROCESS_ID;
@@ -100,8 +98,8 @@ static bool GetStatusInfo(::pid_t Pid, ProcessInstanceInfo &ProcessInfo,
   StatFields stat_fields;
   if (sscanf(
           Rest.data(),
-          "%d %s %c %d %d %d %d %d %u %lu %lu %lu %lu %lu %lu %ld %ld %ld %ld",
-          &stat_fields.pid, stat_fields.comm, &stat_fields.state,
+          "%d %*s %c %d %d %d %d %d %u %lu %lu %lu %lu %lu %lu %ld %ld %ld %ld",
+          &stat_fields.pid, /* comm, */ &stat_fields.state,
           &stat_fields.ppid, &stat_fields.pgrp, &stat_fields.session,
           &stat_fields.tty_nr, &stat_fields.tpgid, &stat_fields.flags,
           &stat_fields.minflt, &stat_fields.cminflt, &stat_fields.majflt,

github-actions · 2024-07-24T14:31:15Z

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:

git-clang-format --diff 929b474991ce643cff01eeeec830b29735460db4 0e65e0e28cc4a20fbe91d3a7a76b7a0ed671e0cc --extensions cpp -- lldb/source/Host/linux/Host.cpp

View the diff from clang-format here.

diff --git a/lldb/source/Host/linux/Host.cpp b/lldb/source/Host/linux/Host.cpp
index 0bc736d90e..5a12fa4d01 100644
--- a/lldb/source/Host/linux/Host.cpp
+++ b/lldb/source/Host/linux/Host.cpp
@@ -99,13 +99,13 @@ static bool GetStatusInfo(::pid_t Pid, ProcessInstanceInfo &ProcessInfo,
   if (sscanf(
           Rest.data(),
           "%d %*s %c %d %d %d %d %d %u %lu %lu %lu %lu %lu %lu %ld %ld %ld %ld",
-          &stat_fields.pid, /* comm, */ &stat_fields.state,
-          &stat_fields.ppid, &stat_fields.pgrp, &stat_fields.session,
-          &stat_fields.tty_nr, &stat_fields.tpgid, &stat_fields.flags,
-          &stat_fields.minflt, &stat_fields.cminflt, &stat_fields.majflt,
-          &stat_fields.cmajflt, &stat_fields.utime, &stat_fields.stime,
-          &stat_fields.cutime, &stat_fields.cstime,
-          &stat_fields.realtime_priority, &stat_fields.priority) < 0) {
+          &stat_fields.pid, /* comm, */ &stat_fields.state, &stat_fields.ppid,
+          &stat_fields.pgrp, &stat_fields.session, &stat_fields.tty_nr,
+          &stat_fields.tpgid, &stat_fields.flags, &stat_fields.minflt,
+          &stat_fields.cminflt, &stat_fields.majflt, &stat_fields.cmajflt,
+          &stat_fields.utime, &stat_fields.stime, &stat_fields.cutime,
+          &stat_fields.cstime, &stat_fields.realtime_priority,
+          &stat_fields.priority) < 0) {
     return false;
   }

DavidSpickett · 2024-07-24T14:35:47Z

Ignoring the formatter to keep the change intent clear.

The fields we actually use from this struct are:

  char state;
  ::pid_t ppid = LLDB_INVALID_PROCESS_ID;
  ::pid_t pgrp = LLDB_INVALID_PROCESS_ID;
  ::pid_t session = LLDB_INVALID_PROCESS_ID;
  long unsigned utime;
  long unsigned stime;
  long cutime;
  long cstime;
  long priority;

And I think the some of them are set in process info, but nothing is done with them. So I'll see if we can rely purely on proc/status, for a follow up PR.

DavidSpickett · 2024-07-24T14:42:58Z

We cannot rely purely on status because we need the times, for work being done by @feg208. I could mark a bunch of them with * so we don't store them but it ends up harder to read in the end.

I think we can make the parsing up to comm better though.

DavidSpickett · 2024-07-24T14:45:21Z

Actually, @feg208 can you to look at that? An example that I think will break the %s is:

$ cat /proc/$(pidof "test with spaces.o")/stat
1682700 (test with space) R 811184 1682700 811184 34817 1682700 4194304 58 0 0 0 71618 2 0 0 20 0 1 0 5328289154 1830912 103 18446744073709551615 187650816561152 187650816563420 281473911569280 0 0 0 0 0 0 0 0 0 17 89 0 0 0 0 0 187650816630152 187650816630800 187651682521088 281473911572039 281473911572063 281473911572063 281473911574496 0

You could parse the pid and comm manually, then scanf the rest. comm can include anything a file name can I think, spaces and brackets certianly. So (test (with) brackets) is another one to consider.

labath

Yikes, I didn't realize this was new code. @feg208, we need to fix this soon.

I'm approving this because it strictly improves upon the current situation, but I don't think it's not a complete fix. I think we can skip over the comm field with a rsplit(')'). The rest could be parsed with scanf, as it's kinda convenient -- even if old-fashioned. (A more modern approach would be either with std::istringstream or some combination of StringRef::consumeInteger, StringRef::ltrim, llvm::split and friends

…0387) Summary: As reported in #89710, the %s code used for `comm` could and probably does, overflow the buffer. Likely we haven't seen it cause problems because the following data is overwritten right afterwards. Also scanf isn't a great choice here as this `comm` can include many characters that might trip up %s. We don't actually use `comm`, so parse but don't store it so we're not overflowing anything. Test Plan: Reviewers: Subscribers: Tasks: Tags: Differential Revision: https://phabricator.intern.facebook.com/D60250556

DavidSpickett requested a review from JDevlieghere as a code owner July 24, 2024 14:26

llvmbot added the lldb label Jul 24, 2024

DavidSpickett requested a review from labath July 24, 2024 14:35

DavidSpickett mentioned this pull request Jul 24, 2024

[lldb] Correct format specifier for sscanf to prevent buffer overflow (NFC) #94783

Closed

DavidSpickett requested a review from feg208 July 24, 2024 14:41

labath approved these changes Jul 25, 2024

View reviewed changes

DavidSpickett merged commit f48c166 into llvm:main Jul 25, 2024
7 of 8 checks passed

DavidSpickett deleted the lldb-scanf branch July 25, 2024 08:15

DavidSpickett mentioned this pull request Jul 25, 2024

lldb/source/Host/linux/Host.cpp:94: Possible missing field width in scanf %s ? #89710

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[lldb][Linux] Parse, but don't store "comm" from /proc/stat file #100387

[lldb][Linux] Parse, but don't store "comm" from /proc/stat file #100387

DavidSpickett commented Jul 24, 2024

llvmbot commented Jul 24, 2024

github-actions bot commented Jul 24, 2024

DavidSpickett commented Jul 24, 2024

DavidSpickett commented Jul 24, 2024 •

edited

Loading

DavidSpickett commented Jul 24, 2024 •

edited

Loading

labath left a comment

[lldb][Linux] Parse, but don't store "comm" from /proc/stat file #100387

[lldb][Linux] Parse, but don't store "comm" from /proc/stat file #100387

Conversation

DavidSpickett commented Jul 24, 2024

llvmbot commented Jul 24, 2024

github-actions bot commented Jul 24, 2024

DavidSpickett commented Jul 24, 2024

DavidSpickett commented Jul 24, 2024 • edited Loading

DavidSpickett commented Jul 24, 2024 • edited Loading

labath left a comment

Choose a reason for hiding this comment

DavidSpickett commented Jul 24, 2024 •

edited

Loading

DavidSpickett commented Jul 24, 2024 •

edited

Loading