Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[lldb] Multithreading lldb-server works on Windows now; fixed gdb port mapping #100670

Closed
wants to merge 2 commits into from

Conversation

slydiman
Copy link
Contributor

@slydiman slydiman commented Jul 25, 2024

The main purpose of this patch is to fix all issues related to the gdb ports mapping in lldb-server. We need the common gdb ports map which is available in platform connection handlers. Currently lldb-server platform forks the child process to handle the incomming connection. This child process cannot request the gdb port from the common map. The optimal solution is to use multithreading lldb-server in the platform mode. lldb-server in the gdbserver mode remains the same.

This solution also gave a new advantage - lldb-server platform --server works on Windows now.

Note multithreading connection handlers require an own virtual working directory per thread. The most lldb-server code is using lldb::FileSystem and already resolves pathes. llvm::vfs::createPhysicalFileSystem() can be used as lldb::FileSystem with minimal updates.

Fixes #97537, fixes #90923, fixes #56346, fixes #101475.

lldb-server has been tested on Windows with 50 connections and 100 processes launched simultaneously. Tested also the cross build with Linux x86_64 host and Linux Aarch64 target (8 concurrent threads/connections).

@llvmbot
Copy link
Member

llvmbot commented Jul 25, 2024

@llvm/pr-subscribers-lldb

Author: Dmitry Vasilyev (slydiman)

Changes

Removed fork(). Used threads and the common thread-safe port map for all platform connections.

Updated lldb::FileSystem to use llvm::vfs::createPhysicalFileSystem() with an own virtual working directory per thread.

This patch depends on #100659, #100666.

This patch fixes #97537, #90923, #56346.

lldb-server has been tested on Windows with 50 connections and 100 processes launched simultaneously. Tested also the cross build with Linux x86_64 host and Linux Aarch64 target.


Patch is 33.09 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/100670.diff

8 Files Affected:

  • (modified) lldb/include/lldb/Host/FileSystem.h (+7)
  • (modified) lldb/source/Host/common/FileSystem.cpp (+8)
  • (modified) lldb/source/Host/posix/PipePosix.cpp (+12)
  • (modified) lldb/source/Plugins/Process/gdb-remote/GDBRemoteCommunicationServerCommon.cpp (+13-2)
  • (modified) lldb/source/Plugins/Process/gdb-remote/GDBRemoteCommunicationServerPlatform.cpp (+217-92)
  • (modified) lldb/source/Plugins/Process/gdb-remote/GDBRemoteCommunicationServerPlatform.h (+19-12)
  • (modified) lldb/tools/lldb-server/LLDBServerUtilities.cpp (+2)
  • (modified) lldb/tools/lldb-server/lldb-platform.cpp (+22-77)
diff --git a/lldb/include/lldb/Host/FileSystem.h b/lldb/include/lldb/Host/FileSystem.h
index 640f3846e448c..5e25414a894d3 100644
--- a/lldb/include/lldb/Host/FileSystem.h
+++ b/lldb/include/lldb/Host/FileSystem.h
@@ -47,6 +47,12 @@ class FileSystem {
 
   static FileSystem &Instance();
 
+  static void InitializePerThread() {
+    lldbassert(!InstancePerThread() && "Already initialized.");
+    InstancePerThread().emplace(llvm::IntrusiveRefCntPtr<llvm::vfs::FileSystem>(
+        llvm::vfs::createPhysicalFileSystem().release()));
+  }
+
   template <class... T> static void Initialize(T &&...t) {
     lldbassert(!InstanceImpl() && "Already initialized.");
     InstanceImpl().emplace(std::forward<T>(t)...);
@@ -206,6 +212,7 @@ class FileSystem {
 
 private:
   static std::optional<FileSystem> &InstanceImpl();
+  static std::optional<FileSystem> &InstancePerThread();
   llvm::IntrusiveRefCntPtr<llvm::vfs::FileSystem> m_fs;
   std::unique_ptr<TildeExpressionResolver> m_tilde_resolver;
   std::string m_home_directory;
diff --git a/lldb/source/Host/common/FileSystem.cpp b/lldb/source/Host/common/FileSystem.cpp
index 5153a0a9ec513..cb76086616d6b 100644
--- a/lldb/source/Host/common/FileSystem.cpp
+++ b/lldb/source/Host/common/FileSystem.cpp
@@ -49,7 +49,15 @@ void FileSystem::Terminate() {
   InstanceImpl().reset();
 }
 
+std::optional<FileSystem> &FileSystem::InstancePerThread() {
+  static thread_local std::optional<FileSystem> t_fs;
+  return t_fs;
+}
+
 std::optional<FileSystem> &FileSystem::InstanceImpl() {
+  std::optional<FileSystem> &fs = InstancePerThread();
+  if (fs)
+    return fs;
   static std::optional<FileSystem> g_fs;
   return g_fs;
 }
diff --git a/lldb/source/Host/posix/PipePosix.cpp b/lldb/source/Host/posix/PipePosix.cpp
index f35c348990df6..1aa02efe86610 100644
--- a/lldb/source/Host/posix/PipePosix.cpp
+++ b/lldb/source/Host/posix/PipePosix.cpp
@@ -324,6 +324,18 @@ Status PipePosix::ReadWithTimeout(void *buf, size_t size,
         bytes_read += result;
         if (bytes_read == size || result == 0)
           break;
+
+        // This is the workaround for the following bug in Linux multithreading
+        // select() https://bugzilla.kernel.org/show_bug.cgi?id=546
+        // ReadWithTimeout() with a non-zero timeout is used only to
+        // read the port number from the gdbserver pipe
+        // in GDBRemoteCommunication::StartDebugserverProcess().
+        // The port number may be "1024\0".."65535\0".
+        if (timeout.count() > 0 && size == 6 && bytes_read == 5 &&
+            static_cast<char *>(buf)[4] == '\0') {
+          break;
+        }
+
       } else if (errno == EINTR) {
         continue;
       } else {
diff --git a/lldb/source/Plugins/Process/gdb-remote/GDBRemoteCommunicationServerCommon.cpp b/lldb/source/Plugins/Process/gdb-remote/GDBRemoteCommunicationServerCommon.cpp
index f9d37490e16ae..cef836e001adf 100644
--- a/lldb/source/Plugins/Process/gdb-remote/GDBRemoteCommunicationServerCommon.cpp
+++ b/lldb/source/Plugins/Process/gdb-remote/GDBRemoteCommunicationServerCommon.cpp
@@ -646,7 +646,9 @@ GDBRemoteCommunicationServerCommon::Handle_vFile_Size(
   packet.GetHexByteString(path);
   if (!path.empty()) {
     uint64_t Size;
-    if (llvm::sys::fs::file_size(path, Size))
+    FileSpec file_spec(path);
+    FileSystem::Instance().Resolve(file_spec);
+    if (llvm::sys::fs::file_size(file_spec.GetPath(), Size))
       return SendErrorResponse(5);
     StreamString response;
     response.PutChar('F');
@@ -725,7 +727,9 @@ GDBRemoteCommunicationServerCommon::Handle_vFile_unlink(
   packet.SetFilePos(::strlen("vFile:unlink:"));
   std::string path;
   packet.GetHexByteString(path);
-  Status error(llvm::sys::fs::remove(path));
+  FileSpec file_spec(path);
+  FileSystem::Instance().Resolve(file_spec);
+  Status error(llvm::sys::fs::remove(file_spec.GetPath()));
   StreamString response;
   response.Printf("F%x,%x", error.GetError(), error.GetError());
   return SendPacketNoLock(response.GetString());
@@ -744,6 +748,13 @@ GDBRemoteCommunicationServerCommon::Handle_qPlatform_shell(
       // uint32_t timeout = packet.GetHexMaxU32(false, 32);
       if (packet.GetChar() == ',')
         packet.GetHexByteString(working_dir);
+      else {
+        auto cwd = FileSystem::Instance()
+                       .GetVirtualFileSystem()
+                       ->getCurrentWorkingDirectory();
+        if (cwd)
+          working_dir = *cwd;
+      }
       int status, signo;
       std::string output;
       FileSpec working_spec(working_dir);
diff --git a/lldb/source/Plugins/Process/gdb-remote/GDBRemoteCommunicationServerPlatform.cpp b/lldb/source/Plugins/Process/gdb-remote/GDBRemoteCommunicationServerPlatform.cpp
index 65f1cc12ba307..6e3b7b4a351e0 100644
--- a/lldb/source/Plugins/Process/gdb-remote/GDBRemoteCommunicationServerPlatform.cpp
+++ b/lldb/source/Plugins/Process/gdb-remote/GDBRemoteCommunicationServerPlatform.cpp
@@ -18,13 +18,13 @@
 #include <sstream>
 #include <thread>
 
-#include "llvm/Support/FileSystem.h"
 #include "llvm/Support/JSON.h"
 #include "llvm/Support/Threading.h"
 
 #include "lldb/Host/Config.h"
 #include "lldb/Host/ConnectionFileDescriptor.h"
 #include "lldb/Host/FileAction.h"
+#include "lldb/Host/FileSystem.h"
 #include "lldb/Host/Host.h"
 #include "lldb/Host/HostInfo.h"
 #include "lldb/Interpreter/CommandCompletions.h"
@@ -44,8 +44,17 @@ using namespace lldb;
 using namespace lldb_private::process_gdb_remote;
 using namespace lldb_private;
 
+// Copy assignment operator to avoid copying m_mutex
+GDBRemoteCommunicationServerPlatform::PortMap &
+GDBRemoteCommunicationServerPlatform::PortMap::operator=(
+    const GDBRemoteCommunicationServerPlatform::PortMap &o) {
+  m_port_map = std::move(o.m_port_map);
+  return *this;
+}
+
 GDBRemoteCommunicationServerPlatform::PortMap::PortMap(uint16_t min_port,
-                                                       uint16_t max_port) {
+                                                       uint16_t max_port)
+    : m_mutex() {
   assert(min_port);
   for (; min_port < max_port; ++min_port)
     m_port_map[min_port] = LLDB_INVALID_PROCESS_ID;
@@ -54,11 +63,13 @@ GDBRemoteCommunicationServerPlatform::PortMap::PortMap(uint16_t min_port,
 void GDBRemoteCommunicationServerPlatform::PortMap::AllowPort(uint16_t port) {
   assert(port);
   // Do not modify existing mappings
+  std::lock_guard<std::mutex> guard(m_mutex);
   m_port_map.insert({port, LLDB_INVALID_PROCESS_ID});
 }
 
 llvm::Expected<uint16_t>
 GDBRemoteCommunicationServerPlatform::PortMap::GetNextAvailablePort() {
+  std::lock_guard<std::mutex> guard(m_mutex);
   if (m_port_map.empty())
     return 0; // Bind to port zero and get a port, we didn't have any
               // limitations
@@ -75,6 +86,7 @@ GDBRemoteCommunicationServerPlatform::PortMap::GetNextAvailablePort() {
 
 bool GDBRemoteCommunicationServerPlatform::PortMap::AssociatePortWithProcess(
     uint16_t port, lldb::pid_t pid) {
+  std::lock_guard<std::mutex> guard(m_mutex);
   auto pos = m_port_map.find(port);
   if (pos != m_port_map.end()) {
     pos->second = pid;
@@ -84,6 +96,7 @@ bool GDBRemoteCommunicationServerPlatform::PortMap::AssociatePortWithProcess(
 }
 
 bool GDBRemoteCommunicationServerPlatform::PortMap::FreePort(uint16_t port) {
+  std::lock_guard<std::mutex> guard(m_mutex);
   std::map<uint16_t, lldb::pid_t>::iterator pos = m_port_map.find(port);
   if (pos != m_port_map.end()) {
     pos->second = LLDB_INVALID_PROCESS_ID;
@@ -94,6 +107,7 @@ bool GDBRemoteCommunicationServerPlatform::PortMap::FreePort(uint16_t port) {
 
 bool GDBRemoteCommunicationServerPlatform::PortMap::FreePortForProcess(
     lldb::pid_t pid) {
+  std::lock_guard<std::mutex> guard(m_mutex);
   if (!m_port_map.empty()) {
     for (auto &pair : m_port_map) {
       if (pair.second == pid) {
@@ -106,15 +120,22 @@ bool GDBRemoteCommunicationServerPlatform::PortMap::FreePortForProcess(
 }
 
 bool GDBRemoteCommunicationServerPlatform::PortMap::empty() const {
+  std::lock_guard<std::mutex> guard(m_mutex);
   return m_port_map.empty();
 }
 
+GDBRemoteCommunicationServerPlatform::PortMap
+    GDBRemoteCommunicationServerPlatform::g_port_map;
+std::set<lldb::pid_t> GDBRemoteCommunicationServerPlatform::g_spawned_pids;
+std::mutex GDBRemoteCommunicationServerPlatform::g_spawned_pids_mutex;
+
 // GDBRemoteCommunicationServerPlatform constructor
 GDBRemoteCommunicationServerPlatform::GDBRemoteCommunicationServerPlatform(
-    const Socket::SocketProtocol socket_protocol, const char *socket_scheme)
-    : GDBRemoteCommunicationServerCommon(),
-      m_socket_protocol(socket_protocol), m_socket_scheme(socket_scheme),
-      m_spawned_pids_mutex(), m_port_map(), m_port_offset(0) {
+    const Socket::SocketProtocol socket_protocol, const char *socket_scheme,
+    const lldb_private::Args &args, uint16_t port_offset)
+    : GDBRemoteCommunicationServerCommon(), m_socket_protocol(socket_protocol),
+      m_socket_scheme(socket_scheme), m_inferior_arguments(args),
+      m_port_offset(port_offset) {
   m_pending_gdb_server.pid = LLDB_INVALID_PROCESS_ID;
   m_pending_gdb_server.port = 0;
 
@@ -159,11 +180,72 @@ GDBRemoteCommunicationServerPlatform::GDBRemoteCommunicationServerPlatform(
 GDBRemoteCommunicationServerPlatform::~GDBRemoteCommunicationServerPlatform() =
     default;
 
+lldb::thread_result_t GDBRemoteCommunicationServerPlatform::ThreadProc() {
+  // We need a virtual working directory per thread.
+  FileSystem::InitializePerThread();
+
+  Log *log = GetLog(LLDBLog::Platform);
+
+  if (IsConnected()) {
+    LLDB_LOGF(log,
+              "GDBRemoteCommunicationServerPlatform::%s() "
+              "Thread started...",
+              __FUNCTION__);
+
+    if (m_inferior_arguments.GetArgumentCount() > 0) {
+      lldb::pid_t pid = LLDB_INVALID_PROCESS_ID;
+      std::optional<uint16_t> port;
+      std::string socket_name;
+      Status error = LaunchGDBServer(m_inferior_arguments,
+                                     "", // hostname
+                                     pid, port, socket_name);
+      if (error.Success())
+        SetPendingGdbServer(pid, *port, socket_name);
+    }
+
+    bool interrupt = false;
+    bool done = false;
+    Status error;
+    while (!interrupt && !done) {
+      if (GetPacketAndSendResponse(std::nullopt, error, interrupt, done) !=
+          GDBRemoteCommunication::PacketResult::Success)
+        break;
+    }
+
+    if (error.Fail()) {
+      LLDB_LOGF(log,
+                "GDBRemoteCommunicationServerPlatform::%s() "
+                "GetPacketAndSendResponse: %s",
+                __FUNCTION__, error.AsCString());
+    }
+  }
+
+  LLDB_LOGF(log,
+            "GDBRemoteCommunicationServerPlatform::%s() "
+            "Disconnected. Killing child processes...",
+            __FUNCTION__);
+  for (lldb::pid_t pid : m_spawned_pids)
+    KillSpawnedProcess(pid);
+
+  // Do do not wait for child processes. See comments in
+  // DebugserverProcessReaped() for details.
+
+  FileSystem::Terminate();
+
+  LLDB_LOGF(log,
+            "GDBRemoteCommunicationServerPlatform::%s() "
+            "Thread exited.",
+            __FUNCTION__);
+
+  delete this;
+  return {};
+}
+
 Status GDBRemoteCommunicationServerPlatform::LaunchGDBServer(
     const lldb_private::Args &args, std::string hostname, lldb::pid_t &pid,
     std::optional<uint16_t> &port, std::string &socket_name) {
   if (!port) {
-    llvm::Expected<uint16_t> available_port = m_port_map.GetNextAvailablePort();
+    llvm::Expected<uint16_t> available_port = g_port_map.GetNextAvailablePort();
     if (available_port)
       port = *available_port;
     else
@@ -181,23 +263,25 @@ Status GDBRemoteCommunicationServerPlatform::LaunchGDBServer(
   if (hostname.empty())
     hostname = "127.0.0.1";
 
-  Log *log = GetLog(LLDBLog::Platform);
-  LLDB_LOGF(log, "Launching debugserver with: %s:%u...", hostname.c_str(),
-            *port);
+  auto cwd = FileSystem::Instance()
+                 .GetVirtualFileSystem()
+                 ->getCurrentWorkingDirectory();
+  if (cwd)
+    debugserver_launch_info.SetWorkingDirectory(FileSpec(*cwd));
 
   // Do not run in a new session so that it can not linger after the platform
   // closes.
   debugserver_launch_info.SetLaunchInSeparateProcessGroup(false);
   debugserver_launch_info.SetMonitorProcessCallback(
-      std::bind(&GDBRemoteCommunicationServerPlatform::DebugserverProcessReaped,
-                this, std::placeholders::_1));
+      &GDBRemoteCommunicationServerPlatform::DebugserverProcessReaped);
 
   std::ostringstream url;
 // debugserver does not accept the URL scheme prefix.
 #if !defined(__APPLE__)
   url << m_socket_scheme << "://";
 #endif
-  uint16_t *port_ptr = &*port;
+  uint16_t child_port = *port;
+  uint16_t *port_ptr = &child_port;
   if (m_socket_protocol == Socket::ProtocolTcp) {
     std::string platform_uri = GetConnection()->GetURI();
     std::optional<URI> parsed_uri = URI::Parse(platform_uri);
@@ -208,19 +292,44 @@ Status GDBRemoteCommunicationServerPlatform::LaunchGDBServer(
     port_ptr = nullptr;
   }
 
+  Log *log = GetLog(LLDBLog::Platform);
+  LLDB_LOGF(log,
+            "GDBRemoteCommunicationServerPlatform::%s() "
+            "Host %s launching debugserver with: %s...",
+            __FUNCTION__, hostname.c_str(), url.str().c_str());
+
   Status error = StartDebugserverProcess(
       url.str().c_str(), nullptr, debugserver_launch_info, port_ptr, &args, -1);
 
   pid = debugserver_launch_info.GetProcessID();
+
+  if (error.Success()) {
+    LLDB_LOGF(log,
+              "GDBRemoteCommunicationServerPlatform::%s() "
+              "debugserver launched successfully as pid %" PRIu64,
+              __FUNCTION__, pid);
+  } else {
+    LLDB_LOGF(log,
+              "GDBRemoteCommunicationServerPlatform::%s() "
+              "debugserver launch failed: %s",
+              __FUNCTION__, error.AsCString());
+  }
+
+  // TODO: Be sure gdbserver uses the requested port.
+  // assert(!port_ptr || *port == 0 || *port == child_port)
+  // Use only the original *port returned by GetNextAvailablePort()
+  // for AssociatePortWithProcess() or FreePort() below.
+
   if (pid != LLDB_INVALID_PROCESS_ID) {
-    std::lock_guard<std::recursive_mutex> guard(m_spawned_pids_mutex);
-    m_spawned_pids.insert(pid);
+    AddSpawnedProcess(pid);
     if (*port > 0)
-      m_port_map.AssociatePortWithProcess(*port, pid);
+      g_port_map.AssociatePortWithProcess(*port, pid);
   } else {
     if (*port > 0)
-      m_port_map.FreePort(*port);
+      g_port_map.FreePort(*port);
   }
+  if (port_ptr)
+    *port = child_port;
   return error;
 }
 
@@ -230,10 +339,6 @@ GDBRemoteCommunicationServerPlatform::Handle_qLaunchGDBServer(
   // Spawn a local debugserver as a platform so we can then attach or launch a
   // process...
 
-  Log *log = GetLog(LLDBLog::Platform);
-  LLDB_LOGF(log, "GDBRemoteCommunicationServerPlatform::%s() called",
-            __FUNCTION__);
-
   ConnectionFileDescriptor file_conn;
   std::string hostname;
   packet.SetFilePos(::strlen("qLaunchGDBServer;"));
@@ -255,18 +360,9 @@ GDBRemoteCommunicationServerPlatform::Handle_qLaunchGDBServer(
   Status error =
       LaunchGDBServer(Args(), hostname, debugserver_pid, port, socket_name);
   if (error.Fail()) {
-    LLDB_LOGF(log,
-              "GDBRemoteCommunicationServerPlatform::%s() debugserver "
-              "launch failed: %s",
-              __FUNCTION__, error.AsCString());
     return SendErrorResponse(9);
   }
 
-  LLDB_LOGF(log,
-            "GDBRemoteCommunicationServerPlatform::%s() debugserver "
-            "launched successfully as pid %" PRIu64,
-            __FUNCTION__, debugserver_pid);
-
   StreamGDBRemote response;
   assert(port);
   response.Printf("pid:%" PRIu64 ";port:%u;", debugserver_pid,
@@ -317,28 +413,45 @@ GDBRemoteCommunicationServerPlatform::Handle_qKillSpawnedProcess(
 
   lldb::pid_t pid = packet.GetU64(LLDB_INVALID_PROCESS_ID);
 
+  if (SpawnedProcessFinished(pid))
+    m_spawned_pids.erase(pid);
+
   // verify that we know anything about this pid. Scope for locker
-  {
-    std::lock_guard<std::recursive_mutex> guard(m_spawned_pids_mutex);
-    if (m_spawned_pids.find(pid) == m_spawned_pids.end()) {
-      // not a pid we know about
-      return SendErrorResponse(10);
-    }
+  if ((m_spawned_pids.find(pid) == m_spawned_pids.end())) {
+    // not a pid we know about
+    return SendErrorResponse(10); // ECHILD
   }
 
   // go ahead and attempt to kill the spawned process
-  if (KillSpawnedProcess(pid))
+  if (KillSpawnedProcess(pid)) {
+    m_spawned_pids.erase(pid);
     return SendOKResponse();
-  else
-    return SendErrorResponse(11);
+  } else
+    return SendErrorResponse(11); // EDEADLK
+}
+
+void GDBRemoteCommunicationServerPlatform::AddSpawnedProcess(lldb::pid_t pid) {
+  std::lock_guard<std::mutex> guard(g_spawned_pids_mutex);
+
+  // If MonitorChildProcessThreadFunction() failed hope the system will not
+  // reuse pid of zombie processes.
+  // assert(g_spawned_pids.find(pid) == g_spawned_pids.end());
+
+  g_spawned_pids.insert(pid);
+  m_spawned_pids.insert(pid);
+}
+
+bool GDBRemoteCommunicationServerPlatform::SpawnedProcessFinished(
+    lldb::pid_t pid) {
+  std::lock_guard<std::mutex> guard(g_spawned_pids_mutex);
+  return (g_spawned_pids.find(pid) == g_spawned_pids.end());
 }
 
 bool GDBRemoteCommunicationServerPlatform::KillSpawnedProcess(lldb::pid_t pid) {
   // make sure we know about this process
-  {
-    std::lock_guard<std::recursive_mutex> guard(m_spawned_pids_mutex);
-    if (m_spawned_pids.find(pid) == m_spawned_pids.end())
-      return false;
+  if (SpawnedProcessFinished(pid)) {
+    // it seems the process has been finished recently
+    return true;
   }
 
   // first try a SIGTERM (standard kill)
@@ -346,46 +459,30 @@ bool GDBRemoteCommunicationServerPlatform::KillSpawnedProcess(lldb::pid_t pid) {
 
   // check if that worked
   for (size_t i = 0; i < 10; ++i) {
-    {
-      std::lock_guard<std::recursive_mutex> guard(m_spawned_pids_mutex);
-      if (m_spawned_pids.find(pid) == m_spawned_pids.end()) {
-        // it is now killed
-        return true;
-      }
+    if (SpawnedProcessFinished(pid)) {
+      // it is now killed
+      return true;
     }
     std::this_thread::sleep_for(std::chrono::milliseconds(10));
   }
 
-  {
-    std::lock_guard<std::recursive_mutex> guard(m_spawned_pids_mutex);
-    if (m_spawned_pids.find(pid) == m_spawned_pids.end())
-      return true;
-  }
+  if (SpawnedProcessFinished(pid))
+    return true;
 
   // the launched process still lives.  Now try killing it again, this time
   // with an unblockable signal.
   Host::Kill(pid, SIGKILL);
 
   for (size_t i = 0; i < 10; ++i) {
-    {
-      std::lock_guard<std::recursive_mutex> guard(m_spawned_pids_mutex);
-      if (m_spawned_pids.find(pid) == m_spawned_pids.end()) {
-        // it is now killed
-        return true;
-      }
+    if (SpawnedProcessFinished(pid)) {
+      // it is now killed
+      return true;
     }
     std::this_thread::sleep_for(std::chrono::milliseconds(10));
   }
 
   // check one more time after the final sleep
-  {
-    std::lock_guard<std::recursive_mutex> guard(m_spawned_pids_mutex);
-    if (m_spawned_pids.find(pid) == m_spawned_pids.end())
-      return true;
-  }
-
-  // no luck - the process still lives
-  return false;
+  return SpawnedProcessFinished(pid);
 }
 
 GDBRemoteCommunication::PacketResult
@@ -442,12 +539,14 @@ GDBRemoteCommunication::PacketResult
 GDBRemoteCommunicationServerPlatform::Handle_qGetWorkingDir(
     StringExtractorGDBRemote &packet) {
 
-  llvm::SmallString<64> cwd;
-  if (std::error_code ec = llvm::sys::fs::current_path(cwd))
-    return SendErrorResponse(ec.value());
+  auto cwd = FileSystem::Instance()
+                 .GetVirtualFileSystem()
+                 ->getCurrentWorkingDirectory();
+  if (!cwd)
+    return SendErrorResponse(cwd.getError());
 
   StreamString response;
-  response.PutBytesAsRawHex8(cwd.data(), cwd.size());
+  response.PutBytesAsRawHex8(cwd->data(), cwd->size());
   return SendPacketNoLock(response.GetString());
 }
 
@@ -458,7 +557,9 @@ GDBRemoteCommunicationServerPlatform::Handle_QSetWorkingDir(
   std::string path;
   packet.GetHexByteString(path);
 
-  if (std::error_code ec = llvm::sys::fs::set_current_path(path))
+  if (std::error_code ec = FileSystem::Instance()
+                               .GetVirtualFileSystem()
+                            ...
[truncated]

@DavidSpickett
Copy link
Collaborator

Just edited the description to make the fixes links work automatically (https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword). Will review shortly.

Copy link
Collaborator

@labath labath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to change this code to use threads for a long time, but the idea of using a per-thread virtual file system absolutely terrifies me. That only works is all of the accesses go through the file system (no direct syscalls) and if we're very strict about which code executes on which thread. Right now, I don't think we're either, and I don't see how we could ever achieve it. (That's not exactly true for the code in the lldb-server, but we also don't really have enforcable rules for which code can be run in the lldb-server, and the code this is changing affects all of lldb).

(Also, I guess this is the reason for the ETXTBSY workaround)

One obvious and safe (though less performant and less satisfying) alternative is to replace the fork with spawning a new process (fork+execve) instead. Have you considered that?

Comment on lines +328 to +337
// This is the workaround for the following bug in Linux multithreading
// select() https://bugzilla.kernel.org/show_bug.cgi?id=546
// ReadWithTimeout() with a non-zero timeout is used only to
// read the port number from the gdbserver pipe
// in GDBRemoteCommunication::StartDebugserverProcess().
// The port number may be "1024\0".."65535\0".
if (timeout.count() > 0 && size == 6 && bytes_read == 5 &&
static_cast<char *>(buf)[4] == '\0') {
break;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It sounds like the consensus on that bug is that this is the application's (i.e. our) fault. We should fix the issue instead.

Copy link
Contributor Author

@slydiman slydiman Jul 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so. Note select() works fine with the pipe closed on other side in the single thread. select() hangs when called simultaneously from multiple threads. I tried multiple simultaneous connections to lldb-server platform to launch lldb-server gdbserver. It worked 50/50 in case of 2 connections and 100% failed in case of 3+ connections. Instead of using select() I tried

  • use poll()
  • use read(size = 0)
  • use non blocked pipe and call read() w/o select() or poll()
  • change pipe buffer size
    Nothing helped. It is the bug in the kernel. read() will hang too if the pipe is closed on the other side. Non blocking read() will return EAGAIN instead of 0. The system just does not recognize the closed pipe in case of multithreading.
    So, I don't see a way to fix this on Linux. The only way is a workaround.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry, but I find this extremely hard to believe. Pipes and the ability to read them until EOF are as old as UNIX. We're not doing anything special here. It's going to take a lot more than a reference to a 20 year old REJECTED INVALID bug to convince me this is a kernel issue.

Also note that the situation mentioned in that bug is different from what (I think) you're describing here. In their case, the pipe is NOT closed on the other side. The pipe is closed on the side that's doing the selecting:

Consider a multithreaded program (pthreads). One thread reads from
a file descriptor (typically a TCP or UDP socket). At some time,
another thread decides to close(2) the file descriptor while the
first thread is blocked, waiting for input data.

In other words, this is your basic race in the application code, and it's the applications (ours) responsibility to fix it. While I'm not a kernel maintainer, I think I have a pretty good idea why they said the application is buggy, and why they didn't want to fix it -- it's because fixing it probably will not make the application correct.

The problem there is that the application has calls (select (or friends) and close) on two threads with no synchronization between them. Now if select happens to run first, then it's not completely unreasonable to expect that close will terminate that select, and the kernel could in theory make sure it does that (apparently, some operating systems do just that). The problem is what happens if select does not run first. What if close does ? In this case, select will return an error (as the bug reporter expects), but only if the FD hasn't been reused in the mean time. And since linux always assigns the lowest FD available (I think POSIX mandates that), it's very likely that the very next operation (perhaps on a third thread) which creates an fd will get the same FD as we've just closed. If that happens, then the select will NOT return an error (the kernel has no way to know that it's referring to the old fd) and will happily start listening on the new fd. Since you usually aren't able to control all operations that could possibly create a new FD, this kind of pattern would be buggy except in extremely limited circumstances.

@labath
Copy link
Collaborator

labath commented Jul 26, 2024

One way I can imagine this happening is if the FileSystem instance was local to a GDBRemoteCommunicationServerPlatform instance -- rather than the thread it happens to be (mostly) running on. This will require more changes to, basically, plumb the filesystem instance to every place that needs to be used from the platform object, but I think that's actually a good thing. It will give us a record of which code is prepared to deal with virtualized filesystems and which one isn't. I just don't think we can assume that all our code (present and future) will handle the per-thread filesystem situation correctly.

(That said, it still may be better to just spawn a new process instead. I don't think this is a particularly hot code path (I think the test suite is basically the only user of multithreaded platforms), and it will make sure we don't hit the ETXTBSY issue).

@slydiman
Copy link
Contributor Author

slydiman commented Jul 26, 2024

@labath

I wanted to change this code to use threads for a long time, but the idea of using a per-thread virtual file system absolutely terrifies me. That only works is all of the accesses go through the file system (no direct syscalls) and if we're very strict about which code executes on which thread. Right now, I don't think we're either, and I don't see how we could ever achieve it. (That's not exactly true for the code in the lldb-server, but we also don't really have enforcable rules for which code can be run in the lldb-server, and the code this is changing affects all of lldb).

We need the virtual working directory only for few requests via the platform protocol. It is necessary to just resolve the path. Note I have added only 2 updates for that. All the rest code already uses lldb::FileSystem correctly.
Once a process (lldb-server gdbserver or test app) is started, its process uses its own working directory.

(Also, I guess this is the reason for the ETXTBSY workaround)

I don't see the connection. See more details in #100659 discussion.

to replace the fork with spawning a new process (fork+execve) instead. Have you considered that?

To fix all gdb port mapping issues we need a common port mapping available in all platform connection handlers. It is possible only with the multithreading. And I don't see a way to delegate the accepted incoming connection to a spawned process on Windows.

One way I can imagine this happening is if the FileSystem instance was local to a GDBRemoteCommunicationServerPlatform instance -- rather than the thread it happens to be (mostly) running on. This will require more changes to, basically, plumb the filesystem instance to every place that needs to be used from the platform object, but I think that's actually a good thing. It will give us a record of which code is prepared to deal with virtualized filesystems and which one isn't. I just don't think we can assume that all our code (present and future) will handle the per-thread filesystem situation correctly.

GDBRemoteCommunicationServerPlatform extends GDBRemoteCommunicationServerCommon, GDBRemoteCommunication, etc. The working directory may be used for own tasks (for example load/save settings) and to handle request with a relative path. Most such requests are handled in GDBRemoteCommunicationServerCommon. It seems everything works correctly since all tests passed.

(That said, it still may be better to just spawn a new process instead. I don't think this is a particularly hot code path (I think the test suite is basically the only user of multithreaded platforms), and it will make sure we don't hit the ETXTBSY issue).

How will the spawned process help? I think the only way to fix ETXTBSY issue is to copy the executable to the target and launch it from the same process. It seems MacOS uses system's gdbserver instead of lldb-server gdbserver. Please correct me if I'm wrong. So ETXTBSY issue is the design issue. Probably it is necessary to move vFile:open, vFile:pwrite, vFile:close to gdbserver somehow to fix ETXTBSY issue.

@labath
Copy link
Collaborator

labath commented Jul 29, 2024

@labath

I wanted to change this code to use threads for a long time, but the idea of using a per-thread virtual file system absolutely terrifies me. That only works is all of the accesses go through the file system (no direct syscalls) and if we're very strict about which code executes on which thread. Right now, I don't think we're either, and I don't see how we could ever achieve it. (That's not exactly true for the code in the lldb-server, but we also don't really have enforcable rules for which code can be run in the lldb-server, and the code this is changing affects all of lldb).

We need the virtual working directory only for few requests via the platform protocol. It is necessary to just resolve the path. Note I have added only 2 updates for that. All the rest code already uses lldb::FileSystem correctly.

I believe that the code that's necessary to run lldb-server platform uses the filesystem correctly. I trust you've made sure of that by running the test suite. However, I don't think you can claim that for every usage of the FileSystem class in everywhere in lldb, because for most of those you can't even tell whether they'd want to use a "thread-local" filesystem or a global one.

I certainly can't do it, and that's kind of my point: I think this is a bad abstraction. Outside of a very limited use case, it's impossible to reason about this and/or prove that introducing a thread-local cwd is safe and correct. We're often optimizing code by moving it from one thread to another, or farming it out to a thread pool, and we also have code that can run on several threads in different contexts. All of these things could cause "correct" code to suddenly stop working because a completely unrelated code has been changed to run on a different thread.

If this is only really used for a "only for few requests via the platform protocol", then why not make the CWD a property of the platform object? (Either through a virtual filesystem, or just by having it as a string, and resolving things explicitly)

Once a process (lldb-server gdbserver or test app) is started, its process uses its own working directory.

Ack.

(Also, I guess this is the reason for the ETXTBSY workaround)

I don't see the connection. See more details in #100659 discussion.

to replace the fork with spawning a new process (fork+execve) instead. Have you considered that?

To fix all gdb port mapping issues we need a common port mapping available in all platform connection handlers. It is possible only with the multithreading.

I think that using threads for the management of a common port resource is a very elegant solution. I did not realize you're trying to solve that, for which I apologise. That said, I think it's a stretch to say this is the only way to solve this issue. I can think at least four other potential solutions (with different tradeoffs) right now. This is my favourite:

Don't use port mappings (at least in the multi-process --server mode). AFAICT, the only reason it exists is because lldb uses an FTP-like connection that's incompatible with firewals/NATs/etc. This means that every user has to first discover this problem, then learn about the port map flags to lldb, and then to configure their firewall to let these ports through. We get questions about this regularly. Everything would be a lot simpler if everything went through a single port.

The way this would work is by letting the platform instance delegate/upgrate/convert the platform connection into a gdbserver one. The way this would work would be something like this:

  1. lldb-server platform would advertise (say in qSupported) its support for this new mode.
  2. Before asking the platform to launch a new gdb server, lldb would query this feature. If present, instead of the usual action (qLaunchGDBServer), it would create another platform connection, using the same port as the original one. As we're using the same port, we'd go through all the nats just like the original connection.
  3. On this new connection it would send a new special packet (let's call it qUpgradeToGdbConnection)
  4. lldb server platform would launch an gdbserver instance and everything else would proceed as before.

On non-darwin platform (darwin uses debugserver) we could optimize to avoid spawning a new process, and just call the relevant gdb-server code directly. This might be nice cause then we could use execve (unsupported on windows) for calling the debugserver on darwin, and all other platforms would work the same way as windows. But this is still just an optimization, and passing the socket through fork+exec/CreateProcess could still work everywhere.

And I don't see a way to delegate the accepted incoming connection to a spawned process on Windows.

Though I'm not a windows expert, I'm pretty sure that's possible. Windows has the concept of inheritable HANDLEs and https://learn.microsoft.com/en-us/windows/win32/sysinfo/handle-inheritance explicitly lists sockets as one of the inheritable handle types.

One way I can imagine this happening is if the FileSystem instance was local to a GDBRemoteCommunicationServerPlatform instance -- rather than the thread it happens to be (mostly) running on. This will require more changes to, basically, plumb the filesystem instance to every place that needs to be used from the platform object, but I think that's actually a good thing. It will give us a record of which code is prepared to deal with virtualized filesystems and which one isn't. I just don't think we can assume that all our code (present and future) will handle the per-thread filesystem situation correctly.

GDBRemoteCommunicationServerPlatform extends GDBRemoteCommunicationServerCommon, GDBRemoteCommunication, etc. The working directory may be used for own tasks (for example load/save settings) and to handle request with a relative path.

That may be the case, and if that's true, then I think the base class should have a filesystem parameter as well. Either way, we're imposing some sort of a requirement on the code around us. In one case it's "a path string needs to be resolved relative the a certain filesystem", in the other "a path string need to be resolved on a specific thread". I think the first one is better because it's more explicit -- you know which code is prepared to handle "alternate" filesystems by seeing if it accepts a filesystem argument -- and flexible -- you can pass paths across threads by making sure the filesystem travels along with it.

(That said, it still may be better to just spawn a new process instead. I don't think this is a particularly hot code path (I think the test suite is basically the only user of multithreaded platforms), and it will make sure we don't hit the ETXTBSY issue).

How will the spawned process help? I think the only way to fix ETXTBSY issue is to copy the executable to the target and launch it from the same process. It seems MacOS uses system's gdbserver instead of lldb-server gdbserver. Please correct me if I'm wrong. So ETXTBSY issue is the design issue. Probably it is necessary to move vFile:open, vFile:pwrite, vFile:close to gdbserver somehow to fix ETXTBSY issue.

That would certainly help, but I don't think its necessary. I've explained what I think the problem is on the other PR.

@slydiman
Copy link
Contributor Author

If this is only really used for a "only for few requests via the platform protocol", then why not make the CWD a property of the platform object? (Either through a virtual filesystem, or just by having it as a string, and resolving things explicitly)

It is possible to store an own FileSystem object in the platform handler, but it requires to update 80% of GDBRemoteCommunicationServerCommon.cpp and implement some behavior switch in inherited classes.

I tried to minimize changes. I have added the new FileSystem::InitializePerThread() which is used only in GDBRemoteCommunicationServerPlatform and its base clases in case of multithreading. All other code uses the same FileSystem, nothing changed. FileSystem::InitializePerThread() uses the CWD of the app. So the behavior for the thread is the same as for a forked child process.

I don't see any other threads where FileSystem is used. lldb-server platform creates only one additional thread to monitor a child process. But it does not use any file system operations.

Anyway if FileSystem::InitializePerThread() was not called, any new thread uses the common app's FileSystem. It is safe.

@labath
Copy link
Collaborator

labath commented Jul 29, 2024

If this is only really used for a "only for few requests via the platform protocol", then why not make the CWD a property of the platform object? (Either through a virtual filesystem, or just by having it as a string, and resolving things explicitly)

It is possible to store an own FileSystem object in the platform handler, but it requires to update 80% of GDBRemoteCommunicationServerCommon.cpp and implement some behavior switch in inherited classes.

That does not worry me. In fact, I would say that if all we need to update is GDBRemoteCommunicationServerCommon and its subclasses, then we're pretty good.

I tried to minimize changes. I have added the new FileSystem::InitializePerThread() which is used only in GDBRemoteCommunicationServerPlatform and its base clases in case of multithreading. All other code uses the same FileSystem, nothing changed. FileSystem::InitializePerThread() uses the CWD of the app. So the behavior for the thread is the same as for a forked child process.

I don't see any other threads where FileSystem is used. lldb-server platform creates only one additional thread to monitor a child process. But it does not use any file system operations.

Anyway if FileSystem::InitializePerThread() was not called, any new thread uses the common app's FileSystem. It is safe.

I realize all that, but I still don't think it's a good tradeoff -- complicating one low-level library (which is pretty complicated on its own), for the sake of one very specific user. I can see how someone might view it differently and you're welcome to find those people and get them on board with your approach. I just don't think I'm going to be one of them.

@slydiman
Copy link
Contributor Author

The way this would work is by letting the platform instance delegate/upgrate/convert the platform connection into a gdbserver one. The way this would work would be something like this:

  1. lldb-server platform would advertise (say in qSupported) its support for this new mode.
  2. Before asking the platform to launch a new gdb server, lldb would query this feature. If present, instead of the usual action (qLaunchGDBServer), it would create another platform connection, using the same port as the original one. As we're using the same port, we'd go through all the nats just like the original connection.
  3. On this new connection it would send a new special packet (let's call it qUpgradeToGdbConnection)
  4. lldb server platform would launch an gdbserver instance and everything else would proceed as before.

On non-darwin platform (darwin uses debugserver) we could optimize to avoid spawning a new process, and just call the relevant gdb-server code directly.

But debugserver on darwin will not support this feauture. It will still require the port map or disabling firewall.

@labath
Copy link
Collaborator

labath commented Jul 29, 2024

The way this would work is by letting the platform instance delegate/upgrate/convert the platform connection into a gdbserver one. The way this would work would be something like this:

  1. lldb-server platform would advertise (say in qSupported) its support for this new mode.
  2. Before asking the platform to launch a new gdb server, lldb would query this feature. If present, instead of the usual action (qLaunchGDBServer), it would create another platform connection, using the same port as the original one. As we're using the same port, we'd go through all the nats just like the original connection.
  3. On this new connection it would send a new special packet (let's call it qUpgradeToGdbConnection)
  4. lldb server platform would launch an gdbserver instance and everything else would proceed as before.

On non-darwin platform (darwin uses debugserver) we could optimize to avoid spawning a new process, and just call the relevant gdb-server code directly.

But debugserver on darwin will not support this feauture. It will still require the port map or disabling firewall.

I think it should already support that. If it doesn't, and it comes down to this, I volunteer to help you with making sure that works.

@slydiman
Copy link
Contributor Author

slydiman commented Jul 29, 2024

@labath It seems we do not need qSupported and qUpgradeToGdbConnection. We can run
lldb-server platform --server --listen 1234 --gdbserver-port 1235

Option 1:
On receiving qLaunchGDBServer we can

  • fork the child process to know the new pid
  • send the response with the pid and port 1235
  • wait for the connection on the port 1235 for 10 seconds
    • kill the child process if no connection accepted in 10 seconds
    • execve lldb-server gdbserver --fd x where x is fd of the accepted connection

We need a pipe for communication between the main lldb-server platform process and the forked child process within 10 secconds. But unfortunately this scenario is impossible on Windows because fork() is missing.

Option 2:
It seems PlatformRemoteGDBServer uses debugserver_pid only to kill the gdbserver process if connection failed.
We can just do nothing and respond pid:0;port:1235; to qLaunchGDBServer. Then listen the port 1235 and launch lldb-server gdbserver --fd x when a connection is accepted. lldb-server gdbserver must exit if the connection x is closed.

And we need to update lldb-server platform to support socket sharing on Windows first. For example we can spawn the process like
lldb-server platform --accept <some_id> --gdbserver-port 1235.

Though I'm not a windows expert, I'm pretty sure that's possible. Windows has the concept of inheritable HANDLEs and https://learn.microsoft.com/en-us/windows/win32/sysinfo/handle-inheritance explicitly lists sockets as one of the inheritable handle types.

I don't see any sockets here https://learn.microsoft.com/en-us/windows/win32/procthread/inheritance I'm not sure sockets may be inherited in Windows. But here is the example how to share the socket to a child process using WSADuplicateSocket and memory mapping files
https://github.com/pauldotknopf/WindowsSDK7-Samples/blob/master/netds/winsock/sockdup/server/Sockdups.c
I believe we can use a pipe to write WSAPROTOCOL_INFO structure to the child process.

@labath
Copy link
Collaborator

labath commented Jul 30, 2024

I like how you're thinking.

@labath It seems we do not need qSupported and qUpgradeToGdbConnection. We can run lldb-server platform --server --listen 1234 --gdbserver-port 1235

Option 1: On receiving qLaunchGDBServer we can

* fork the child process to know the new pid

* send the response with the pid and port 1235

* wait for the connection on the port 1235 for 10 seconds
  
  * kill the child process if no connection accepted in 10 seconds
  * execve `lldb-server gdbserver --fd x` where x is fd of the accepted connection

We need a pipe for communication between the main lldb-server platform process and the forked child process within 10 secconds. But unfortunately this scenario is impossible on Windows because fork() is missing.

An interesting idea. I think we could find a way to implement this on windows, but the thing I don't like here is that it limits us to a single in-flight connection (otherwise, we wouldn't be able to match up the incoming connections). If anything goes wrong with one connection, we have to wait 10 seconds (maybe more -- I'm not sure if 10s is enough of a timeout) to serve the next one.

Option 2: It seems PlatformRemoteGDBServer uses debugserver_pid only to kill the gdbserver process if connection failed. We can just do nothing and respond pid:0;port:1235; to qLaunchGDBServer. Then listen the port 1235 and launch lldb-server gdbserver --fd x when a connection is accepted. lldb-server gdbserver must exit if the connection x is closed.

I like this. It still means that the user has to forward two ports (and the gdb port needs to be forwarded to the same value), but at least it's not N ports, and we can get out of this port map business.

I mean, I still like my idea more, but if you're going to be implementing this, I think you can choose how to go about it.

One thing I realized now (and this is a common problem for all of the implementations proposed here) is that if we can't match gdb-server and platform connections, we can't rely on the gdb server inheriting things from the platform process. I think the only piece of information that matters here (maybe) is the CWD, but I think we can make sure that works either by resetting the gdb server cwd explicitly, or by using absolute paths.

And we need to update lldb-server platform to support socket sharing on Windows first. For example we can spawn the process like lldb-server platform --accept <some_id> --gdbserver-port 1235.

Though I'm not a windows expert, I'm pretty sure that's possible. Windows has the concept of inheritable HANDLEs and https://learn.microsoft.com/en-us/windows/win32/sysinfo/handle-inheritance explicitly lists sockets as one of the inheritable handle types.

I don't see any sockets here https://learn.microsoft.com/en-us/windows/win32/procthread/inheritance I'm not sure sockets may be inherited in Windows. But here is the example how to share the socket to a child process using WSADuplicateSocket and memory mapping files https://github.com/pauldotknopf/WindowsSDK7-Samples/blob/master/netds/winsock/sockdup/server/Sockdups.c I believe we can use a pipe to write WSAPROTOCOL_INFO structure to the child process.

Interesting. If that's what it takes, then ok -- I guess. However, I'd still try to check if it's possible to pass it directly, as that page doesn't say sockets can't be inherited either, and this page mentions the WSA_FLAG_NO_HANDLE_INHERIT flag, which wouldn't make sense, I think, if the handle was not inheritable. I may try to fire up my windows machine later to give this a shot.


BTW, I think i've have figured out what would be the best solution to the leaked FD problem (I don't claim to have invented this, I've seen this implementation in several places already): Create a "forkserver" process. Basically, we fork a new process at startup, and then use this process to perform any additional fork operations (which would be requested via IPC). Since we fork the process before opening any other FDs, it will not accidentally inherit any FDs except the ones we pass it explicitly.

(This idea is mostly relevant for the multithreading implementation, but in principle, it could be interesting for the lldb binary as well, as that is also multithreaded (but doesn't do much execing, and cares less about FD leakage.)

@slydiman
Copy link
Contributor Author

slydiman commented Jul 30, 2024

@labath

I like this. It still means that the user has to forward two ports (and the gdb port needs to be forwarded to the same value), but at least it's not N ports, and we can get out of this port map business.
I mean, I still like my idea more, but if you're going to be implementing this, I think you can choose how to go about it.

Ok. First, I will prepare a new patch to support lldb-platform --server on Windows. I will implement the socket sharing on Windows and replace fork() with CreateProcess() for Windows version. Let's see how it will work.

If the socket sharing will work on Windows well enough, I will prepare a second patch to listen the single gdb port and provide the accepted connection to lldb-server gdbserver --fd x.

However, I'd still try to check if it's possible to pass it directly, as that page doesn't say sockets can't be inherited either, and this page mentions the WSA_FLAG_NO_HANDLE_INHERIT flag, which wouldn't make sense, I think, if the handle was not inheritable. I may try to fire up my windows machine later to give this a shot.

See this article. BTW, here is the good question - what does the socket handle mean in Windows?

I will try to use lldb::Pipe to provide the socket to the child process on Windows.

slydiman added a commit to slydiman/llvm-project that referenced this pull request Jul 31, 2024
…t on Windows

`lldb-server --server` works on Windows now w/o multithreading. The rest functionality remains unchanged.

Added also PipeWindows::WriteWithTimeout(), fixed PipeWindows::ReadWithTimeout() and missing initialization of  m_read_overlapped.hEvent in the constructor PipeWindows(lldb::pipe_t read, lldb::pipe_t write).

Fixes llvm#90923, fixes llvm#56346.

This is the part 1 of the replacement of llvm#100670.

In the part 2 I plan to switch `lldb-server gdbserver` to use --fd and listen a common gdb port for all gdbserver connections. Then we can remove gdb port mapping to fix llvm#97537.
slydiman added a commit to slydiman/llvm-project that referenced this pull request Jul 31, 2024
…t on Windows

`lldb-server --server` works on Windows now w/o multithreading. The rest functionality remains unchanged.

Added also PipeWindows::WriteWithTimeout(), fixed PipeWindows::ReadWithTimeout() and missing initialization of  m_read_overlapped.hEvent in the constructor PipeWindows(lldb::pipe_t read, lldb::pipe_t write).

Fixes llvm#90923, fixes llvm#56346.

This is the part 1 of the replacement of llvm#100670.

In the part 2 I plan to switch `lldb-server gdbserver` to use --fd and listen a common gdb port for all gdbserver connections. Then we can remove gdb port mapping to fix llvm#97537.
slydiman added a commit to slydiman/llvm-project that referenced this pull request Jul 31, 2024
…t on Windows

`lldb-server platform --server` works on Windows now w/o multithreading. The rest functionality remains unchanged.

Added also PipeWindows::WriteWithTimeout(), fixed PipeWindows::ReadWithTimeout() and missing initialization of  m_read_overlapped.hEvent in the constructor PipeWindows(lldb::pipe_t read, lldb::pipe_t write).

Fixes llvm#90923, fixes llvm#56346.

This is the part 1 of the replacement of llvm#100670.

In the part 2 I plan to switch `lldb-server gdbserver` to use `--fd` and listen a common gdb port for all gdbserver connections. Then we can remove gdb port mapping to fix llvm#97537.
slydiman added a commit to slydiman/llvm-project that referenced this pull request Jul 31, 2024
…t on Windows

`lldb-server platform --server` works on Windows now w/o multithreading. The rest functionality remains unchanged.

Added also PipeWindows::WriteWithTimeout(), fixed PipeWindows::ReadWithTimeout() and missing initialization of  m_read_overlapped.hEvent in the constructor PipeWindows(lldb::pipe_t read, lldb::pipe_t write).

Fixes llvm#90923, fixes llvm#56346.

Depends on llvm#101326.

This is the part 1 of the replacement of llvm#100670.

In the part 2 I plan to switch `lldb-server gdbserver` to use `--fd` and listen a common gdb port for all gdbserver connections. Then we can remove gdb port mapping to fix llvm#97537.
slydiman added a commit to slydiman/llvm-project that referenced this pull request Jul 31, 2024
…t on Windows

`lldb-server platform --server` works on Windows now w/o multithreading. The rest functionality remains unchanged.

Added also PipeWindows::WriteWithTimeout(), fixed PipeWindows::ReadWithTimeout() and missing initialization of  m_read_overlapped.hEvent in the constructor PipeWindows(lldb::pipe_t read, lldb::pipe_t write).

Fixes llvm#90923, fixes llvm#56346.

Depends on llvm#101326.

This is the part 1 of the replacement of llvm#100670.

In the part 2 I plan to switch `lldb-server gdbserver` to use `--fd` and listen a common gdb port for all gdbserver connections. Then we can remove gdb port mapping to fix llvm#97537.
slydiman added a commit to slydiman/llvm-project that referenced this pull request Jul 31, 2024
…t on Windows

`lldb-server platform --server` works on Windows now w/o multithreading. The rest functionality remains unchanged.

Depends on llvm#101383.

Fixes llvm#90923, fixes llvm#56346.

This is the part 1 of the replacement of llvm#100670.

In the part 2 I plan to switch `lldb-server gdbserver` to use `--fd` and listen a common gdb port for all gdbserver connections. Then we can remove gdb port mapping to fix llvm#97537.
…t mapping

Removed fork(). Used threads and the common thread-safe port map for all platform connections.

Updated lldb::FileSystem to use llvm::vfs::createPhysicalFileSystem() with an own virtual working directory per thread.

This patch depends on llvm#100659, llvm#100666.

This patch fixes llvm#97537, llvm#90923, llvm#56346.

lldb-server has been tested on Windows with 50 connections and 100 processes launched simultaneously. Tested also the cross build with Linux x86_64 host and Linux Aarch64 target.
@slydiman slydiman force-pushed the lldb-server-multithreading branch from e69a04e to 17916b6 Compare August 1, 2024 11:48
slydiman added a commit to slydiman/llvm-project that referenced this pull request Aug 6, 2024
`lldb-server platform --server` works on Windows now w/o multithreading. The rest functionality remains unchanged.

Fixes llvm#90923, fixes llvm#56346.

This is the part 1 of the replacement of llvm#100670.

In the part 2 I plan to switch `lldb-server gdbserver` to use `--fd` and listen a common gdb port for all gdbserver connections. Then we can remove gdb port mapping to fix llvm#97537.
slydiman added a commit to slydiman/llvm-project that referenced this pull request Aug 6, 2024
`lldb-server platform --server` works on Windows now w/o multithreading. The rest functionality remains unchanged.

Fixes llvm#90923, fixes llvm#56346.

This is the part 1 of the replacement of llvm#100670.

In the part 2 I plan to switch `lldb-server gdbserver` to use `--fd` and listen a common gdb port for all gdbserver connections. Then we can remove gdb port mapping to fix llvm#97537.
slydiman added a commit that referenced this pull request Aug 12, 2024
#101283)

`lldb-server platform --server` works on Windows now w/o multithreading.
The rest functionality remains unchanged.

Fixes #90923, fixes #56346.

This is the part 1 of the replacement of #100670.

In the part 2 I plan to switch `lldb-server gdbserver` to use `--fd` and
listen a common gdb port for all gdbserver connections. Then we can
remove gdb port mapping to fiх #97537.
@slydiman
Copy link
Contributor Author

See the replacement #101283.

@slydiman slydiman closed this Aug 13, 2024
bwendling pushed a commit to bwendling/llvm-project that referenced this pull request Aug 15, 2024
llvm#101283)

`lldb-server platform --server` works on Windows now w/o multithreading.
The rest functionality remains unchanged.

Fixes llvm#90923, fixes llvm#56346.

This is the part 1 of the replacement of llvm#100670.

In the part 2 I plan to switch `lldb-server gdbserver` to use `--fd` and
listen a common gdb port for all gdbserver connections. Then we can
remove gdb port mapping to fiх llvm#97537.
adrian-prantl pushed a commit to adrian-prantl/llvm-project that referenced this pull request Oct 5, 2024
llvm#101283)

`lldb-server platform --server` works on Windows now w/o multithreading.
The rest functionality remains unchanged.

Fixes llvm#90923, fixes llvm#56346.

This is the part 1 of the replacement of llvm#100670.

In the part 2 I plan to switch `lldb-server gdbserver` to use `--fd` and
listen a common gdb port for all gdbserver connections. Then we can
remove gdb port mapping to fiх llvm#97537.

(cherry picked from commit 82ee31f)
adrian-prantl pushed a commit to adrian-prantl/llvm-project that referenced this pull request Oct 11, 2024
llvm#101283)

`lldb-server platform --server` works on Windows now w/o multithreading.
The rest functionality remains unchanged.

Fixes llvm#90923, fixes llvm#56346.

This is the part 1 of the replacement of llvm#100670.

In the part 2 I plan to switch `lldb-server gdbserver` to use `--fd` and
listen a common gdb port for all gdbserver connections. Then we can
remove gdb port mapping to fiх llvm#97537.

(cherry picked from commit 82ee31f)
adrian-prantl pushed a commit to adrian-prantl/llvm-project that referenced this pull request Oct 11, 2024
llvm#101283)

`lldb-server platform --server` works on Windows now w/o multithreading.
The rest functionality remains unchanged.

Fixes llvm#90923, fixes llvm#56346.

This is the part 1 of the replacement of llvm#100670.

In the part 2 I plan to switch `lldb-server gdbserver` to use `--fd` and
listen a common gdb port for all gdbserver connections. Then we can
remove gdb port mapping to fiх llvm#97537.

(cherry picked from commit 82ee31f)
adrian-prantl pushed a commit to adrian-prantl/llvm-project that referenced this pull request Oct 11, 2024
llvm#101283)

`lldb-server platform --server` works on Windows now w/o multithreading.
The rest functionality remains unchanged.

Fixes llvm#90923, fixes llvm#56346.

This is the part 1 of the replacement of llvm#100670.

In the part 2 I plan to switch `lldb-server gdbserver` to use `--fd` and
listen a common gdb port for all gdbserver connections. Then we can
remove gdb port mapping to fiх llvm#97537.

(cherry picked from commit 82ee31f)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
4 participants