Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[lldb][Linux] Parse, but don't store "comm" from /proc/stat file #100387

Merged
merged 1 commit into from
Jul 25, 2024

Conversation

DavidSpickett
Copy link
Collaborator

As reported in #89710, the %s code used for comm could and probably does, overflow the buffer. Likely we haven't seen it cause problems because the following data is overwritten right afterwards.

Also scanf isn't a great choice here as this comm can include many characters that might trip up %s.

We don't actually use comm, so parse but don't store it so we're not overflowing anything.

As reported in llvm#89710, the %s
code used for `comm` could and probably does, overflow the buffer.
Likely we haven't seen it cause problems because the following data
is overwritten right afterwards.

Also scanf isn't a great choice here as this `comm` can include many characters
that might trip up %s.

We don't actually use `comm`, so parse but don't store it so we're
not overflowing anything.
@llvmbot
Copy link
Member

llvmbot commented Jul 24, 2024

@llvm/pr-subscribers-lldb

Author: David Spickett (DavidSpickett)

Changes

As reported in #89710, the %s code used for comm could and probably does, overflow the buffer. Likely we haven't seen it cause problems because the following data is overwritten right afterwards.

Also scanf isn't a great choice here as this comm can include many characters that might trip up %s.

We don't actually use comm, so parse but don't store it so we're not overflowing anything.


Full diff: https://github.com/llvm/llvm-project/pull/100387.diff

1 Files Affected:

  • (modified) lldb/source/Host/linux/Host.cpp (+3-5)
diff --git a/lldb/source/Host/linux/Host.cpp b/lldb/source/Host/linux/Host.cpp
index 5545f9ef4d70e..0bc736d90ea76 100644
--- a/lldb/source/Host/linux/Host.cpp
+++ b/lldb/source/Host/linux/Host.cpp
@@ -51,11 +51,9 @@ enum class ProcessState {
   Zombie,
 };
 
-constexpr int task_comm_len = 16;
-
 struct StatFields {
   ::pid_t pid = LLDB_INVALID_PROCESS_ID;
-  char comm[task_comm_len];
+  // comm
   char state;
   ::pid_t ppid = LLDB_INVALID_PROCESS_ID;
   ::pid_t pgrp = LLDB_INVALID_PROCESS_ID;
@@ -100,8 +98,8 @@ static bool GetStatusInfo(::pid_t Pid, ProcessInstanceInfo &ProcessInfo,
   StatFields stat_fields;
   if (sscanf(
           Rest.data(),
-          "%d %s %c %d %d %d %d %d %u %lu %lu %lu %lu %lu %lu %ld %ld %ld %ld",
-          &stat_fields.pid, stat_fields.comm, &stat_fields.state,
+          "%d %*s %c %d %d %d %d %d %u %lu %lu %lu %lu %lu %lu %ld %ld %ld %ld",
+          &stat_fields.pid, /* comm, */ &stat_fields.state,
           &stat_fields.ppid, &stat_fields.pgrp, &stat_fields.session,
           &stat_fields.tty_nr, &stat_fields.tpgid, &stat_fields.flags,
           &stat_fields.minflt, &stat_fields.cminflt, &stat_fields.majflt,

Copy link

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:
git-clang-format --diff 929b474991ce643cff01eeeec830b29735460db4 0e65e0e28cc4a20fbe91d3a7a76b7a0ed671e0cc --extensions cpp -- lldb/source/Host/linux/Host.cpp
View the diff from clang-format here.
diff --git a/lldb/source/Host/linux/Host.cpp b/lldb/source/Host/linux/Host.cpp
index 0bc736d90e..5a12fa4d01 100644
--- a/lldb/source/Host/linux/Host.cpp
+++ b/lldb/source/Host/linux/Host.cpp
@@ -99,13 +99,13 @@ static bool GetStatusInfo(::pid_t Pid, ProcessInstanceInfo &ProcessInfo,
   if (sscanf(
           Rest.data(),
           "%d %*s %c %d %d %d %d %d %u %lu %lu %lu %lu %lu %lu %ld %ld %ld %ld",
-          &stat_fields.pid, /* comm, */ &stat_fields.state,
-          &stat_fields.ppid, &stat_fields.pgrp, &stat_fields.session,
-          &stat_fields.tty_nr, &stat_fields.tpgid, &stat_fields.flags,
-          &stat_fields.minflt, &stat_fields.cminflt, &stat_fields.majflt,
-          &stat_fields.cmajflt, &stat_fields.utime, &stat_fields.stime,
-          &stat_fields.cutime, &stat_fields.cstime,
-          &stat_fields.realtime_priority, &stat_fields.priority) < 0) {
+          &stat_fields.pid, /* comm, */ &stat_fields.state, &stat_fields.ppid,
+          &stat_fields.pgrp, &stat_fields.session, &stat_fields.tty_nr,
+          &stat_fields.tpgid, &stat_fields.flags, &stat_fields.minflt,
+          &stat_fields.cminflt, &stat_fields.majflt, &stat_fields.cmajflt,
+          &stat_fields.utime, &stat_fields.stime, &stat_fields.cutime,
+          &stat_fields.cstime, &stat_fields.realtime_priority,
+          &stat_fields.priority) < 0) {
     return false;
   }
 

@DavidSpickett
Copy link
Collaborator Author

Ignoring the formatter to keep the change intent clear.

The fields we actually use from this struct are:

  char state;
  ::pid_t ppid = LLDB_INVALID_PROCESS_ID;
  ::pid_t pgrp = LLDB_INVALID_PROCESS_ID;
  ::pid_t session = LLDB_INVALID_PROCESS_ID;
  long unsigned utime;
  long unsigned stime;
  long cutime;
  long cstime;
  long priority;

And I think the some of them are set in process info, but nothing is done with them. So I'll see if we can rely purely on proc/status, for a follow up PR.

@DavidSpickett
Copy link
Collaborator Author

DavidSpickett commented Jul 24, 2024

We cannot rely purely on status because we need the times, for work being done by @feg208. I could mark a bunch of them with * so we don't store them but it ends up harder to read in the end.

I think we can make the parsing up to comm better though.

@DavidSpickett
Copy link
Collaborator Author

DavidSpickett commented Jul 24, 2024

Actually, @feg208 can you to look at that? An example that I think will break the %s is:

$ cat /proc/$(pidof "test with spaces.o")/stat
1682700 (test with space) R 811184 1682700 811184 34817 1682700 4194304 58 0 0 0 71618 2 0 0 20 0 1 0 5328289154 1830912 103 18446744073709551615 187650816561152 187650816563420 281473911569280 0 0 0 0 0 0 0 0 0 17 89 0 0 0 0 0 187650816630152 187650816630800 187651682521088 281473911572039 281473911572063 281473911572063 281473911574496 0

You could parse the pid and comm manually, then scanf the rest. comm can include anything a file name can I think, spaces and brackets certianly. So (test (with) brackets) is another one to consider.

Copy link
Collaborator

@labath labath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yikes, I didn't realize this was new code. @feg208, we need to fix this soon.

I'm approving this because it strictly improves upon the current situation, but I don't think it's not a complete fix. I think we can skip over the comm field with a rsplit(')'). The rest could be parsed with scanf, as it's kinda convenient -- even if old-fashioned. (A more modern approach would be either with std::istringstream or some combination of StringRef::consumeInteger, StringRef::ltrim, llvm::split and friends

@DavidSpickett DavidSpickett merged commit f48c166 into llvm:main Jul 25, 2024
7 of 8 checks passed
@DavidSpickett DavidSpickett deleted the lldb-scanf branch July 25, 2024 08:15
yuxuanchen1997 pushed a commit that referenced this pull request Jul 25, 2024
…0387)

Summary:
As reported in #89710, the %s
code used for `comm` could and probably does, overflow the buffer.
Likely we haven't seen it cause problems because the following data is
overwritten right afterwards.

Also scanf isn't a great choice here as this `comm` can include many
characters that might trip up %s.

We don't actually use `comm`, so parse but don't store it so we're not
overflowing anything.

Test Plan: 

Reviewers: 

Subscribers: 

Tasks: 

Tags: 


Differential Revision: https://phabricator.intern.facebook.com/D60250556
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants