Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Remote-SSH] Extension Host with Copilot Fails on VS Code 1.95.3 but Works on 1.93.0 #234355

Open
technic960183 opened this issue Nov 21, 2024 · 25 comments
Assignees
Labels
mitigated Issue has workaround in place nodejs NodeJS support issues remote Remote system operations issues upstream Issue identified as 'upstream' component related (exists outside of VS Code)

Comments

@technic960183
Copy link

technic960183 commented Nov 21, 2024

Type: Bug

Description

We are encountering an issue with the GitHub Copilot extension when using Visual Studio Code with Remote-SSH on an HPC login node. The extension works flawlessly with VS Code 1.93.0 (4849ca9) but fails on VS Code 1.95.3 (f1a4fb1). Interestingly, VS Code 1.95.3 works fine on the computing node of the same cluster, accessed via SSH forwarding from the login node.

To eliminate variables, we have tested the setup thoroughly:

  • Two personal computers were used:
    • One running VS Code 1.93.0. (We didn't record the version of the extensions when we were testing, and VS Code 1.93.0 auto-updated to 1.95.3 when we tried to check the versions. Please tell us if this information is really needed. Update: Might be 1.245.0 / 0.20.3 from the log file ~/.vscode-server/data/logs/20241121T175642/remoteagent.log)
    • The other running VS Code 1.95.3, Copilot 1.245.0 and Copilot Chat 0.22.4.
  • Both connect to the same HPC login node and same account using Remote-SSH.
  • Before each login, the .vscode-server directory on the remote machine was cleaned up to ensure a fresh environment.
  • On the remote machine, GitHub Copilot and GitHub Copilot Chat were the only extensions installed.

Expected Behavior

The GitHub Copilot extension should initialize and function correctly on both VS Code versions.

Actual Behavior

  • On VS Code 1.95.3, the Copilot extension causes the Extension Host process to crash only on the login node.
  • The issue does not occur on the computing node or when using VS Code 1.93.0 on the login node.
  • Disabling the GitHub Copilot extensions eliminates the issue.

Test by VS Code Bisect

Done on our login node with VS Code 1.95.3. Result:
Extension Bisect is done and has identified github.copilot as the extension causing the problem.

Before, we don't know that it is possible to disable GitHub Copilot alone but not GitHub Copilot Chat.
Later, we found that when VS Code bisect disable GitHub Copilot 1.245.0 alone, GitHub Copilot Chat 0.22.4 works normally on our login node with VS Code 1.95.3.

Logs

Below are logs captured from the Extension Host (remote) process on both the login and computing nodes running VS Code 1.95.3. These logs highlight differences observed during initialization:

Login Node (Fails)

2024-11-21 16:34:43.523 [trace] ExtHostCommands#registerCommand github.copilot.openLogs
2024-11-21 16:34:43.523 [trace] ExtHostCommands#registerCommand github.copilot.signIn
2024-11-21 16:34:43.530 [trace] ExtensionService#_callActivateOptional GitHub.copilot-chat
2024-11-21 16:34:43.555 [trace] extHostWorkspace#findFiles2: fileSearch, extension: GitHub.copilot-chat, entryPoint: findFiles2
2024-11-21 16:34:43.556 [trace] ProxyResolver#tls.connect [{"highWaterMark":16384,"servername":"default.exp-tas.com","session":"null","localAddress":"null","ALPNProtocols":"http/1.1","port":443,"host":"default.exp-tas.com"}]
2024-11-21 16:34:43.562 [debug] ProxyResolver#resolveProxy unconfigured http://169.254.169.254/metadata/instance/compute DIRECT 
2024-11-21 16:34:43.693 [trace] ProxyResolver#tls.connect [443, "default.exp-tas.com", {"servername":"default.exp-tas.com","ALPNProtocols":"h2,http/1.1,http/1.0","signal":"[object AbortSignal]","rejectUnauthorized":true,"ca":"[281 certs]"}]
2024-11-21 16:34:43.723 [trace] ProxyResolver#tls.connect [443, "api.github.com", {"servername":"api.github.com","ALPNProtocols":"h2,http/1.1,http/1.0","signal":"[object AbortSignal]","rejectUnauthorized":true,"ca":"[281 certs]"}]
2024-11-21 16:34:43.760 [debug] ExtHostSearch /work1/user141421/.vscode-server/cli/servers/Stable-f1a4fb101478ce6ec82fe9627c43efbf9e98c813/server/node_modules/@vscode/ripgrep/bin/rg --files --hidden --case-sensitive --no-require-git -g '!**/.git' -g '!**/.svn' -g '!**/.hg' -g '!**/CVS' -g '!**/.DS_Store' -g '!**/Thumbs.db' -g '!**/node_modules' -g '!**/bower_components' -g '!**/*.code-search' --no-ignore-parent --follow --no-config --no-ignore-global
 - cwd: /home/user141421
 - Sibling clauses: {}
2024-11-21 16:34:43.964 [trace] ProxyResolver#tls.connect [443, "api.github.com", {"servername":"api.github.com","ALPNProtocols":"h2,http/1.1,http/1.0","rejectUnauthorized":true,"ca":"[281 certs]"}]
2024-11-21 16:34:43.998 [trace] ExtHostCommands#registerCommand github.copilotChat.signIn
[skip some lines]
2024-11-21 16:34:44.005 [trace] ExtHostCommands#registerCommand github.copilot.buildLocalWorkspaceIndex
2024-11-21 16:34:44.020 [debug] ExtHostSearch Search finished. Stats: {"cmdTime":266,"fileWalkTime":266,"directoriesWalked":0,"filesWalked":0,"cmdResultCount":28852}
2024-11-21 16:34:44.020 [debug] Ext host file search time: 266ms
2024-11-21 16:34:44.174 [trace] ExtHostCommands#registerCommand codereferencing.showOutputPane
2024-11-21 16:34:44.176 [trace] ProxyResolver#tls.connect [443, "copilot-telemet[39 chars]", {"servername":"copilot-telemet[39 chars]","ALPNProtocols":"h2,http/1.1,http/1.0","rejectUnauthorized":true,"ca":"[281 certs]"}]
2024-11-21 16:34:44.215 [trace] ExtHostCommands#executeCommand setContext
2024-11-21 16:34:44.215 [trace] ExtHostCommands#executeCommand _setContext
2024-11-21 16:34:44.215 [trace] ExtHostCommands#registerCommand github.copilot.generate
2024-11-21 16:34:44.215 [trace] ExtHostCommands#registerCommand github.copilot.acceptCursorPanelSolution
2024-11-21 16:34:44.215 [trace] ExtHostCommands#registerCommand github.copilot.previousPanelSolution
2024-11-21 16:34:44.215 [trace] ExtHostCommands#registerCommand github.copilot.nextPanelSolution
2024-11-21 16:34:44.216 [trace] ExtHostCommands#registerCommand _github.copilot.ghostTextPostInsert
[Process Crash]

Computing Node (Works)

2024-11-21 16:30:49.433 [trace] ExtHostCommands#registerCommand github.copilot.openLogs
2024-11-21 16:30:49.433 [trace] ExtHostCommands#registerCommand github.copilot.signIn
[Compare to above, no extra message here.]
2024-11-21 16:30:49.454 [trace] ExtHostCommands#registerCommand github.copilotChat.signIn
[skip the same lines as above]
2024-11-21 16:30:49.459 [trace] ExtHostCommands#registerCommand github.copilot.buildLocalWorkspaceIndex
2024-11-21 16:30:49.664 [trace] ProxyResolver#tls.connect [443, "default.exp-tas.com", {"servername":"default.exp-tas.com","ALPNProtocols":"h2,http/1.1,http/1.0","signal":"[object AbortSignal]","rejectUnauthorized":true,"ca":"[281 certs]"}]
2024-11-21 16:30:49.694 [trace] ProxyResolver#tls.connect [443, "api.github.com", {"servername":"api.github.com","ALPNProtocols":"h2,http/1.1,http/1.0","signal":"[object AbortSignal]","rejectUnauthorized":true,"ca":"[281 certs]"}]
2024-11-21 16:30:49.981 [trace] ProxyResolver#tls.connect [443, "api.github.com", {"servername":"api.github.com","ALPNProtocols":"h2,http/1.1,http/1.0","rejectUnauthorized":true,"ca":"[281 certs]"}]
2024-11-21 16:30:50.116 [trace] ExtHostCommands#registerCommand codereferencing.showOutputPane
2024-11-21 16:30:50.118 [trace] ProxyResolver#tls.connect [443, "copilot-telemet[39 chars]", {"servername":"copilot-telemet[39 chars]","ALPNProtocols":"h2,http/1.1,http/1.0","rejectUnauthorized":true,"ca":"[281 certs]"}]
2024-11-21 16:30:50.156 [trace] ExtHostCommands#executeCommand setContext
2024-11-21 16:30:50.156 [trace] ExtHostCommands#executeCommand _setContext
2024-11-21 16:30:50.156 [trace] ExtHostCommands#registerCommand github.copilot.generate
2024-11-21 16:30:50.156 [trace] ExtHostCommands#registerCommand github.copilot.acceptCursorPanelSolution
2024-11-21 16:30:50.156 [trace] ExtHostCommands#registerCommand github.copilot.previousPanelSolution
2024-11-21 16:30:50.156 [trace] ExtHostCommands#registerCommand github.copilot.nextPanelSolution
2024-11-21 16:30:50.156 [trace] ExtHostCommands#registerCommand _github.copilot.ghostTextPostInsert
2024-11-21 16:30:50.176 [debug] ProxyResolver#loadSystemCertificates count 137
2024-11-21 16:30:50.191 [debug] ProxyResolver#loadSystemCertificates count filtered 134
2024-11-21 16:30:50.191 [debug] ProxyResolver#resolveProxy unconfigured https://mobile.events.data.microsoft.com/OneCollector/1.0?cors=true&content-type=application/x-json-stream DIRECT 
2024-11-21 16:30:50.192 [trace] ProxyResolver#tls.connect [{"protocol":"https:","hostname":"mobile.events.d[32 chars]","port":443,"path":"null","method":"POST","headers":"[object Object]","agent":"[object Object]","_defaultAgent":"[object Object]","host":"mobile.events.d[32 chars]","lookupProxyAuthorization":"[Function: bound dz]","noDelay":true,"servername":"mobile.events.d[32 chars]","secureEndpoint":true,"_vscodeAdditionalCaCerts":"[134 certs]","keepAlive":true,"scheduling":"lifo","timeout":5000,"_agentKey":"mobile.events.d[57 chars]","encoding":"null","keepAliveInitialDelay":1000}]
[Process keep running without crashing]
2024-11-21 16:30:50.176 [debug] ProxyResolver#loadSystemCertificates count 137
2024-11-21 16:30:50.191 [debug] ProxyResolver#loadSystemCertificates count filtered 134
2024-11-21 16:30:50.191 [debug] ProxyResolver#resolveProxy unconfigured https://mobile.events.data.microsoft.com/OneCollector/1.0?cors=true&content-type=application/x-json-stream DIRECT 
2024-11-21 16:30:50.192 [trace] ProxyResolver#tls.connect [{"protocol":"https:","hostname":"mobile.events.d[32 chars]","port":443,"path":"null","method":"POST","headers":"[object Object]","agent":"[object Object]","_defaultAgent":"[object Object]","host":"mobile.events.d[32 chars]","lookupProxyAuthorization":"[Function: bound dz]","noDelay":true,"servername":"mobile.events.d[32 chars]","secureEndpoint":true,"_vscodeAdditionalCaCerts":"[134 certs]","keepAlive":true,"scheduling":"lifo","timeout":5000,"_agentKey":"mobile.events.d[57 chars]","encoding":"null","keepAliveInitialDelay":1000}]

Additional Notes

  • Network configuration remains the same across tests on both the login node and the computing node. Firewall are disable for all nodes.
  • The system administrator has confirmed that no OS-level rules (e.g., process termination) are in effect on the login node.
  • All of the hardware been exactly the same for the login node and the computing nodes in our cluster.

Key questions

  • Why it works on VS Code 1.93.0 but not on VS Code 1.95.3 ?
  • Why VS Code 1.95.3 works on our computing node but not on the login node?

VS Code version: Code 1.95.3 (f1a4fb1, 2024-11-13T14:50:04.152Z)
OS version: Windows_NT x64 10.0.19045
Modes:
Remote OS version: Linux x64 5.15.0-78-generic

System Info
Item Value
CPUs Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz (8 x 2808)
GPU Status 2d_canvas: enabled
canvas_oop_rasterization: enabled_on
direct_rendering_display_compositor: disabled_off_ok
gpu_compositing: enabled
multiple_raster_threads: enabled_on
opengl: enabled_on
rasterization: enabled
raw_draw: disabled_off_ok
skia_graphite: disabled_off
video_decode: enabled
video_encode: enabled
vulkan: disabled_off
webgl: enabled
webgl2: enabled
webgpu: enabled
webnn: disabled_off
Load (avg) undefined
Memory (System) 7.89GB (1.49GB free)
Process Argv --log info --crash-reporter-id edf07c68-6f96-49c4-a777-c6a188cd4542
Screen Reader no
VM 0%
Item Value
Remote SSH: Spock
OS Linux x64 5.15.0-78-generic
CPUs AMD Ryzen Threadripper PRO 5975WX 32-Cores (64 x 1793)
Memory (System) 251.70GB (244.89GB free)
VM 0%
Extensions (7)
Extension Author (truncated) Version
copilot Git 1.245.0
copilot-chat Git 0.22.4
jupyter-keymap ms- 1.1.2
remote-ssh ms- 0.115.1
remote-ssh-edit ms- 0.87.0
remote-explorer ms- 0.4.3
vscode-speech ms- 0.12.1

(1 theme extensions excluded)

A/B Experiments
vsliv368cf:30146710
vspor879:30202332
vspor708:30202333
vspor363:30204092
vscod805cf:30301675
binariesv615:30325510
vsaa593:30376534
py29gd2263:31024239
c4g48928:30535728
azure-dev_surveyone:30548225
962ge761:30959799
pythonnoceb:30805159
asynctok:30898717
pythonmypyd1:30879173
2e7ec940:31000449
pythontbext0:30879054
cppperfnew:31000557
dsvsc020:30976470
pythonait:31006305
dsvsc021:30996838
g316j359:31013175
dvdeprecation:31068756
dwnewjupytercf:31046870
nativerepl2:31139839
pythonrstrctxt:31112756
cf971741:31144450
iacca1:31171482
notype1cf:31157160
5fd0e150:31155592
dwcopilot:31170013
stablechunks:31184530

@lramos15
Copy link
Member

lramos15 commented Nov 21, 2024

Does Copilot Chat work?

Please try using VS Code bisect to help us narrow down the version which broke this

@technic960183
Copy link
Author

technic960183 commented Nov 22, 2024

Thank you for your response.

The result of VS Code bisect confirm that the issue is caused by the github.copilot extension (Version 1.245.0).

When VS Code bisect disable GitHub Copilot alone, GitHub Copilot Chat works normally.

Additionally, we have verified that both GitHub Copilot (1.245.0) and GitHub Copilot Chat (0.22.4) extensions are using the same version on VS Code 1.93.0 and VS Code 1.95.3 during our previous tests. (Wrong information, we are sorry.) I've updated these information to the issue description.

Please let us know if further information is needed.

@lramos15
Copy link
Member

Are you saying that you are using copilot 1.245.0 on both 1.93 and 1.95.3. It's not possible for Copilot Chat 0.22.4 to be used on 1.93 due to breaking API changes.

@lramos15 lramos15 assigned deepak1556 and unassigned lramos15 Nov 22, 2024
@technic960183
Copy link
Author

Sorry for providing incorrect information in the previous update.
We forgot to record the versions of the extensions when testing with VS Code 1.93.0, and it auto-updated to 1.95.3 when we checked the versions.
Due to our limited resources, we don't have extra PCs to install VS Code 1.93.0 and attempt to reproduce the issue, as all of our PCs are in active use.
Please let us know if this information is essential, and we will try to reproduce the issue and get the versions of the extensions.

@deepak1556
Copy link
Collaborator

We did have a runtime version bump between 1.93 and 1.95, so would be good to know the nature of crash. The node executable has debug symbols so you should be able to analyze the dump file gdb -se <path-to-vscode-server>/node -c <path-to-core-file> . Can you run the following commands from the core dumps and attach the output,

> set pagination off
> info sharedlibrary
> info registers
> bt full
> disassemble

@technic960183
Copy link
Author

technic960183 commented Nov 25, 2024

We found these in our .vscode-server directory:

drwxr-xr-x 1 user group       80 Nov 21 20:34 bin/
drwxr-xr-x 1 user group       14 Nov 21 17:56 cli/
-rw------- 1 user group      588 Nov 21 18:08 .cli.4849ca9bdf9666755eb463db297b69e5385090e3.log
-rw------- 1 user group      588 Nov 21 20:34 .cli.f1a4fb101478ce6ec82fe9627c43efbf9e98c813.log
-rwxr-xr-x 1 user group 21701432 Sep  4 21:08 code-4849ca9bdf9666755eb463db297b69e5385090e3*
-rwxr-xr-x 1 user group 22070072 Nov 13 22:56 code-f1a4fb101478ce6ec82fe9627c43efbf9e98c813*
drwx------ 1 user group      124 Nov 21 18:00 data/
drwx------ 1 user group      126 Nov 21 20:29 extensions/
-rw-r--r-- 1 user group    24301 Nov 22 12:22 .f1a4fb101478ce6ec82fe9627c43efbf9e98c813.log
-rw-r--r-- 1 user group        8 Nov 22 11:21 .f1a4fb101478ce6ec82fe9627c43efbf9e98c813.pid
-rwx------ 1 user group       37 Nov 22 11:21 .f1a4fb101478ce6ec82fe9627c43efbf9e98c813.token*

And only f1a4fb101478ce6ec82fe9627c43efbf9e98c813/ is in bin/ as we found node in it. We do not delete 4849ca9 from bin/ but can't find it now.

For analyze the dump file, if we use

gdb -se <path-to-vscode-server>/node -c <path-to-core-file>
<path-to-vscode-server> = .vscode-server/bin/f1a4fb101478ce6ec82fe9627c43efbf9e98c813/node
<path-to-core-file> = .vscode-server/code-f1a4fb101478ce6ec82fe9627c43efbf9e98c813

here 1.95.3 as an example.
We got

Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from .vscode-server/bin/f1a4fb101478ce6ec82fe9627c43efbf9e98c813/node...
"/home/user/.vscode-server/code-f1a4fb101478ce6ec82fe9627c43efbf9e98c813" is not a core dump: file format not recognized
(gdb) exit()

Or how should we find <path-to-vscode-server> and <path-to-core-file>?

Will a core dump file be generated when the 1.95.3 extension host process crash?

@technic960183
Copy link
Author

We found this log file at ~/.vscode-server/data/logs/20241121T175642/remoteagent.log

2024-11-21 17:57:16.590 [info] [<unknown>][e5fdcae7][ExtensionHostConnection] <3836474> Launched Extension Host Process.
2024-11-21 18:00:12.974 [info] Getting Manifest... github.copilot
2024-11-21 18:00:12.982 [error] Error: Unexpected SIGPIPE
    at process.<anonymous> (/work1/user/.vscode-server/cli/servers/Stable-4849ca9bdf9666755eb463db297b69e5385090e3/server/out/vs/server/node/server.main.js:198:6390)
    at process.emit (node:events:531:35)
2024-11-21 18:00:13.024 [info] Installing extension: github.copilot {"installPreReleaseVersion":false,"context":{"clientTargetPlatform":"win32-x64"},"installOnlyNewlyAddedFromExtensionPack":true,"isApplicationScoped":false,"profileLocation":{"$mid":1,"fsPath":"/home/user/.vscode-server/extensions/extensions.json","external":"file:///home/user/.vscode-server/extensions/extensions.json","path":"/home/user/.vscode-server/extensions/extensions.json","scheme":"file"},"productVersion":{"version":"1.93.0","date":"2024-09-04T13:02:38.431Z"}}
2024-11-21 18:00:13.771 [info] Getting Manifest... github.copilot-chat
2024-11-21 18:00:13.798 [info] Installing extension: github.copilot-chat {"installPreReleaseVersion":false,"context":{"clientTargetPlatform":"win32-x64","dependecyOrPackExtensionInstall":true},"installOnlyNewlyAddedFromExtensionPack":true,"isApplicationScoped":false,"profileLocation":{"$mid":1,"fsPath":"/home/user/.vscode-server/extensions/extensions.json","external":"file:///home/user/.vscode-server/extensions/extensions.json","path":"/home/user/.vscode-server/extensions/extensions.json","scheme":"file"},"productVersion":{"version":"1.93.0","date":"2024-09-04T13:02:38.431Z"}}
2024-11-21 18:00:14.519 [info] Extension signature verification result for github.copilot-chat: UnknownError. Executed: true. Duration: 83ms.
2024-11-21 18:00:14.520 [info] Extension signature verification result for github.copilot: UnknownError. Executed: true. Duration: 99ms.
2024-11-21 18:00:15.344 [info] Extracted extension to file:///home/user/.vscode-server/extensions/github.copilot-chat-0.20.3: github.copilot-chat
2024-11-21 18:00:15.351 [info] Renamed to /home/user/.vscode-server/extensions/github.copilot-chat-0.20.3
2024-11-21 18:00:15.453 [info] Extracted extension to file:///home/user/.vscode-server/extensions/github.copilot-1.245.0: github.copilot
2024-11-21 18:00:15.456 [info] Renamed to /home/user/.vscode-server/extensions/github.copilot-1.245.0
2024-11-21 18:00:15.467 [info] Extension installed successfully: github.copilot-chat file:///home/user/.vscode-server/extensions/extensions.json
2024-11-21 18:00:15.467 [info] Extension installed successfully: github.copilot file:///home/user/.vscode-server/extensions/extensions.json
2024-11-21 18:01:47.796 [error] [/home/user/.vscode-server/extensions/github.copilot-chat-0.22.4]: Extension is not compatible with Code 1.93.0. Extension requires: ^1.95.0-20241022.
2024-11-21 18:01:47.797 [error] [/home/user/.vscode-server/extensions/github.copilot-chat-0.22.4]: Extension is using an API proposal 'defaultChatParticipant' that is not compatible with the current version of VS Code.
2024-11-21 18:01:47.799 [info] Extensions added from another source github.copilot-chat file:///home/user/.vscode-server/extensions/extensions.json
2024-11-21 18:01:47.880 [error] [/home/user/.vscode-server/extensions/github.copilot-chat-0.22.4]: Extension is not compatible with Code 1.93.0. Extension requires: ^1.95.0-20241022.
2024-11-21 18:01:47.880 [error] [/home/user/.vscode-server/extensions/github.copilot-chat-0.22.4]: Extension is using an API proposal 'defaultChatParticipant' that is not compatible with the current version of VS Code.

Maybe this suggest that the version of copilot-chat was 0.20.3 when we tested successfully earlier (as we forgot to record the version at that time).
It's unclear whether this also suggests that the version of copilot was 1.245.0.

If further testing is needed, please let us know.

@deepak1556
Copy link
Collaborator

@technic960183 the path to the node binary is correct, as for the core file location it will depend on your OS configuration. Can you share the your OS information cat /etc/os-release

@technic960183
Copy link
Author

technic960183 commented Nov 25, 2024

@deepak1556 Here is the OS-info

user@login:~$ cat /etc/os-release
PRETTY_NAME="Ubuntu 22.04.2 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.2 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy

Also

user@login:~$ cat /proc/sys/kernel/core_pattern
|/usr/share/apport/apport -p%p -s%s -c%c -d%d -P%P -u%u -g%g -- %E

Our system admin said that we didn't config apport as we install the system, so it should stay in the default setting.
But we found the directory /var/crash empty.

@deepak1556
Copy link
Collaborator

Thanks, to confirm have you set ulimit -c unlimited before triggering the crash (if not can you set it and retrigger the crash) ? What does coredumpctl list say ?

@technic960183
Copy link
Author

technic960183 commented Nov 25, 2024

Thanks for your reply.
We wrote a c program that will do nothing but crash to locate the dump file.

#include <stdlib.h>
int main() { abort(); return 0; }

After running ulimit -c unlimited and this program, we searched through the disk by find /tmp /var /home/user -type f -name "core*" 2>/dev/null and we found this location:

user@login:/var/lib/apport/coredump$ ls -l
total 316
-r-------- 1 user root 315392 Nov 26 00:09 core._home_user_will_crash.1071.9cae1a3f-9965-4653-9fc4-2b5aeeb43dc9.415911.238249478

However, when we tried to login from VS Code again. There was still no core dump file in this directory.

The error message from VS Code was

Remote Extension host terminated unexpectedly 3 times within the last 5 minutes.

Will a core dump happen automatically? Or we need to setup something to let it happened?

For ulimit -c, it shows 0. How should we set it to unlimited then trigger the crash? If we set it in VS Code's terminal after we SSH to the cluster (from VS Code) then enable copilot, will this correctly enable the core dump? Or we need to set it from the root (by admin) to enable it with a specific user?

For coredumpctl list,

Command 'coredumpctl' not found, but can be installed with:
apt install systemd-coredump
Please ask your administrator.

We do not have it installed.

@deepak1556
Copy link
Collaborator

Yes the dump file should be generated if VSCode server crashes and the core file size if correctly configured via ulimit -c unlimited, no other additional setup is needed.

For ulimit -c, it shows 0. How should we set it to unlimited then trigger the crash? If we set it in VS Code's terminal after we SSH to the cluster (from VS Code) then enable copilot, will this correctly enable the core dump? Or we need to set it from the root (by admin) to enable it with a specific user?

If you can configure it for a specific user and then connect vscode for that user it should work.

@deepak1556
Copy link
Collaborator

It is also unclear whether you are seeing a non-native exception that triggers the termination of the extension host, can you start VSCode server with the following env variable when connecting to the login node

NODE_OPTIONS='--report-on-fatalerror --report-uncaught-exception --report-exclude-env --report-directory=/tmp/vscode-1_95'

This will generate a report file under /tmp/vscode-1_95 (you can also change the directory path of your choice) when the extension host terminates. You can send the file to Deepak.Mohan@microsoft.com

@technic960183
Copy link
Author

technic960183 commented Nov 26, 2024

@deepak1556 Now, we are working with our system admin to resolve the problem.

For the NODE_OPTIONS, how and where should we set it and how should we start a VSCode server with that env var?
We login VS Code by opening VS Code on the local and "Connect to Host". We do not have a Code server running on our login for every user.

Update: We are trying to add --login on VS Code SSH setting for now (our sh is bin/bash). It seems that VSCode don't start a login shell (and run .bashrc) by default. Still. we are not sure whether the Extension Host will use a login shell or not.
We are referencing this issue to learn about behaviors of the VS Code's shell.

It will be good to know how should we config the "shell" use by these VSCode server process:

user     567260  0.0  0.0   7368  3524 ?        S    14:49   0:00 sh /home/user/.vscode-server/bin/f1a4fb101478ce6ec82fe9627c43efbf9e98c813/bin/code-server --start-server --server-data-dir /home/user/.vscode-server --host=127.0.0.1 --accept-server-license-terms --enable-remote-auto-shutdown --port=0 --telemetry-level all --connection-token-file /home/user/.vscode-server/.f1a4fb101478ce6ec82fe9627c43efbf9e98c813.token
user     567270  0.3  0.0 11810616 99772 ?      Sl   14:49   0:02 /work1/user/.vscode-server/bin/f1a4fb101478ce6ec82fe9627c43efbf9e98c813/node /work1/user/.vscode-server/bin/f1a4fb101478ce6ec82fe9627c43efbf9e98c813/out/server-main.js --start-server --server-data-dir /home/user/.vscode-server --host=127.0.0.1 --accept-server-license-terms --enable-remote-auto-shutdown --port=0 --telemetry-level all --connection-token-file /home/user/.vscode-server/.f1a4fb101478ce6ec82fe9627c43efbf9e98c813.token
user     567335  0.0  0.0 11594752 51972 ?      Sl   14:49   0:00 /work1/user/.vscode-server/bin/f1a4fb101478ce6ec82fe9627c43efbf9e98c813/node /work1/user/.vscode-server/bin/f1a4fb101478ce6ec82fe9627c43efbf9e98c813/out/bootstrap-fork --type=fileWatcher
user     567491  0.2  0.0 22309628 112988 ?     Sl   14:49   0:01 /work1/user/.vscode-server/bin/f1a4fb101478ce6ec82fe9627c43efbf9e98c813/node --dns-result-order=ipv4first /work1/user/.vscode-server/bin/f1a4fb101478ce6ec82fe9627c43efbf9e98c813/out/bootstrap-fork --type=extensionHost --transformURIs --useHostProxy=false
user     567503  0.0  0.0 11616768 58868 ?      Sl   14:49   0:00 /work1/user/.vscode-server/bin/f1a4fb101478ce6ec82fe9627c43efbf9e98c813/node /work1/user/.vscode-server/bin/f1a4fb101478ce6ec82fe9627c43efbf9e98c813/out/bootstrap-fork --type=ptyHost --logsPath /home/user/.vscode-server/data/logs/20241126T144944
user     567569  0.0  0.0   9284  5800 pts/23   Ss+  14:49   0:00 /bin/bash --init-file /work1/user/.vscode-server/bin/f1a4fb101478ce6ec82fe9627c43efbf9e98c813/out/vs/workbench/contrib/terminal/common/scripts/shellIntegration-bash.sh

We tried to set it in the user/.bashrc and user/.profile but it seem that it was not working. (We have checked that the Host processes were ended on the login node before the next test, insure that the processes on the server are "fresh".)

Also, when we did this in the bash after export NODE_OPTIONS='--report-on-fatalerror --report-uncaught-exception --report-exclude-env --report-directory=/work1/ymhsu/tmp/vscode-1_95':

user@login:~/.vscode-server/bin/f1a4fb101478ce6ec82fe9627c43efbf9e98c813$ ./node
./node: --report-exclude-env is not allowed in NODE_OPTIONS

So we removed --report-exclude-env from the NODE_OPTIONS.

We are not sure whether the NODE_OPTIONS is incorrect or the way we set it are incorrect.

@deepak1556
Copy link
Collaborator

The extension host is spawned with login shell environment, so it should get the NODE_OPTIONS set under the .profile and that should be good. The issue you are referencing is mostly about the environment used for the integrated terminals which we can ignore for this issue. To validate if the reporting system is setup correctly, try the following

  1. Set the following export NODE_OPTIONS='--report-on-fatalerror --report-uncaught-exception --report-on-signal --report-directory=/work1/ymhsu/tmp/vscode-1_95' to your .profile or bashrc, notice the new --report-on-signal flag which will help with the test
  2. Use remote ssh to connect
  3. send the signal kill -p <PID of process which has --type=extensionHost in the process tree> SIGUSR2
  4. check whether report is generated under the directory

So we removed --report-exclude-env from the NODE_OPTIONS.

Sorry the option was only added in Node.js v23 and is not supported in the version used by our current remote server, removing --report-exclude-env is the right thing to do.

@technic960183
Copy link
Author

technic960183 commented Nov 26, 2024

Thanks.

We did the following step:

  1. export NODE_OPTIONS at both .profile and .bashrc as you provided.
  2. Open VS Code and remote ssh to login with Copilot and Copilot Chat disable.
  3. kill -s SIGUSR2 <PID of the extensionHost>
  4. We get the file report.20241126.204438.625238.0.001.json in the report directory.
{
  "header": {
    "reportVersion": 3,
    "event": "SIGUSR2",
    "trigger": "Signal",
    "filename": "report.20241126.204438.625238.0.001.json",
    "dumpEventTime": "2024-11-26T20:44:38Z",
    "dumpEventTimeStamp": "1732625078373",
    "processId": 625238,
    "threadId": 0,
    "cwd": "/home/user",
    "commandLine": [
      "/work1/user/.vscode-server/bin/f1a4fb101478ce6ec82fe9627c43efbf9e98c813/node",
      "--dns-result-order=ipv4first",
      "/work1/user/.vscode-server/bin/f1a4fb101478ce6ec82fe9627c43efbf9e98c813/out/bootstrap-fork",
      "--type=extensionHost",
      "--transformURIs",
      "--useHostProxy=false"
    ],
    "nodejsVersion": "v20.18.0",
    "glibcVersionRuntime": "2.35",
    "glibcVersionCompiler": "2.17",
    "wordSize": 64,
...
"userLimits": {
    "core_file_size_blocks": {
      "soft": 0,
      "hard": "unlimited"
    },
    "open_files": {
      "soft": 1048576,
      "hard": 1048576
    },
    "stack_size_bytes": {
      "soft": 8388608,
      "hard": "unlimited"
    },
    "cpu_time_seconds": {
      "soft": 60,
      "hard": 60
    },
    "max_user_processes": {
      "soft": 1030502,
      "hard": 1030502
    },
    "virtual_memory_kbytes": {
      "soft": 34359738368,
      "hard": 34359738368
    }
  },
...
}

Do we need to email the whole report file to you? Or only send it with the real crash by Copilot?
Furthermore, we enable copilot after step 4, then the extension host died, showing Remote Extension host terminated unexpectedly 3 times within the last 5 minutes. But no new report showed up in the report directory. (We tried it for a few more time - closed everything and connect to remote again. But still got nothing.)

We also set ulimit -c unlimited in .profile and .bashrc. But we didn't see any core dump of VS Code yet.
And we wrote a script ulimit_c.sh to print out the value of ulimit -c. Ran it with sh --login ./ulimit_c.sh. There was a permission error in .profile. This might suggest that ulimit -c unlimited can't work correctly at some time.
Our system admin will help us to adjust core_file_size_blocks: soft to unlimited tomorrow.

@deepak1556
Copy link
Collaborator

Thanks for confirming, this rules out any uncaught exception or fatal errors from the runtime.

Do we need to email the whole report file to you? Or only send it with the real crash by Copilot?

Not required as it is a test report

Can you attach the full remote agent logs from the scenario when the extension host terminates by the copilot extension.

@technic960183
Copy link
Author

technic960183 commented Nov 26, 2024

This file remote-server.log is the log file .f1a4fb101478ce6ec82fe9627c43efbf9e98c813.log from ~/.vscode-server. The log level is set to trace.
Maybe we catch it!

[22:03:39] [127.0.0.1][76a18ca0][ExtensionHostConnection] <646121> Launched Extension Host Process.
[22:03:40] [File Watcher (node.js)] Request to start watching: /home/user/gamer/.git (excludes: /home/user/.vscode-server/extensions/**, includes: ["**/.git/objects/**/**","**/.git/subtree-cache/**/**","**/.hg/store/**/**"], filter: <none>, correlationId: <none>)
[22:03:40] [File Watcher (node.js)] Started watching: '/home/user/gamer/.git'
[22:03:40] [File Watcher (node.js)] Request to start watching: /home/user/gamer/.git/refs/remotes/origin/main (excludes: /home/user/.vscode-server/extensions/**, includes: ["**/.git/objects/**/**","**/.git/subtree-cache/**/**","**/.hg/store/**/**"], filter: <none>, correlationId: <none>)
[22:03:40] [File Watcher (node.js)] ignoring a path for watching who's stat info failed to resolve: /home/user/gamer/.git/refs/remotes/origin/main (error: Error: ENOENT: no such file or directory, stat '/home/user/gamer/.git/refs/remotes/origin/main')
[22:03:40] [File Watcher (node.js)] starting fs.watchFile() on /home/user/gamer/.git/refs/remotes/origin/main (correlationId: undefined)
[22:03:41] [File Watcher (node.js)] Request to stop watching: /home/user/gamer/.git/refs/remotes/origin/main (excludes: /home/user/.vscode-server/extensions/**, includes: ["**/.git/objects/**/**","**/.git/subtree-cache/**/**","**/.hg/store/**/**"], filter: <none>, correlationId: <none>)
[22:03:41] [File Watcher (node.js)] stopping file watcher (/home/user/gamer/.git/refs/remotes/origin/main (excludes: /home/user/.vscode-server/extensions/**, includes: ["**/.git/objects/**/**","**/.git/subtree-cache/**/**","**/.hg/store/**/**"], filter: <none>, correlationId: <none>))
[22:03:41] [127.0.0.1][76a18ca0][ExtensionHostConnection] <646121><stderr> 
#
# Fatal process OOM in Failed to reserve virtual memory for CodeRange
#


[22:03:41] [127.0.0.1][76a18ca0][ExtensionHostConnection] <646121> Extension Host Process exited with code: null, signal: SIGTRAP.
Cancelling previous shutdown timeout
[22:03:41] Cancelling previous shutdown timeout
Last EH closed, waiting before shutting down
[22:03:41] Last EH closed, waiting before shutting down

Is there any change from VS Code 1.93.0 to 1.95.3 let the Failed to reserve virtual memory happened?
From the test report, our virtual memory limit seems to be 34359738368 KB. This should be enough.

    "virtual_memory_kbytes": {
      "soft": 34359738368,
      "hard": 34359738368
    }

@technic960183
Copy link
Author

technic960183 commented Nov 27, 2024

Important Update

After our system admin found out and disabled the address space limit policy on the login node, Copilot with VS Code 1.95.3 run correctly!

# Setting up login node user policy
@calab soft cpu 1
@calab hard cpu 1.2
#@calab soft as 33554432
#@calab hard as 33554432

This also explain that why Copilot can run on computing node, as they are no memory limiting rules.
Still, originally the limit is 33554432 KB = 32 GB. This also should be quite enough. Is something change from VS Code 1.93.0 to 1.95.3 hit the limit?

@deepak1556
Copy link
Collaborator

deepak1556 commented Nov 27, 2024

That is great news, thanks for the update.

From the logs, we are hitting OOM from Heap::Setup, the maximum code range size that can be requested is 512MB for 64-bit systems without V8 pointer compression. So definitely the OOM was a side effect of the process already reaching the limit before this call could be made. Based on your scenario, you only saw the extension host crash when the copilot extension got activated so the main thread heap was setup without issues, it hints like a worker thread from the extension that failed to setup its heap (a core dump could validate this).

Also the OOM didn't create a report since V8 couldn't extract a error callback from the isolate which is what Node.js relies on to generate a report. I am puzzled by how this could happen but that's a different issue.

Is something change from VS Code 1.93.0 to 1.95.3 hit the limit?

We bumped Node.js from 20.15.1 to 20.18.0 but there were no version bumps to V8, I doubt a change in the runtime to trigger this. However 32GB sounds good enough size unless we are creating hundreds of worker threads which shouldn't be the case. Can you help validate the memory usage in 1.95

  1. Before enabling the copilot extension, capture the report from extension host using the same method as [Remote-SSH] Extension Host with Copilot Fails on VS Code 1.95.3 but Works on 1.93.0 #234355 (comment) and additionally capture the output of /proc/<extension host pid>/status
  2. Enable the copilot extension and trigger the ghost text completion, capture a new report and /proc/../status from extension host

This should give a better idea about the worker_thread counts and virtual memory usage.

@technic960183
Copy link
Author

technic960183 commented Nov 27, 2024

We found that 4 of the node processes use >10GB of virtual memory. This happened with both copilot enable and disable.
Address space limit on the login node is disable for now.

Copilot & Copilot-Chat enable

/proc/<pid>/status:

VmPeak: 34515868 kB
VmSize: 34400896 kB
VmLck:         0 kB
VmPin:         0 kB
VmHWM:    652084 kB
VmRSS:    538424 kB
RssAnon:          481452 kB
RssFile:           56972 kB
RssShmem:              0 kB
VmData:   606060 kB
VmStk:      1004 kB
VmExe:     81740 kB
VmLib:      8764 kB
VmPTE:      9692 kB
VmSwap:        0 kB
Threads:        14

From top:

top - 19:57:27 up 29 days,  9:35, 10 users,  load average: 0.00, 0.03, 0.00
Tasks: 776 total,   1 running, 774 sleeping,   1 stopped,   0 zombie
%Cpu(s):  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem : 257739.4 total,   4531.8 free,   5399.0 used, 247808.6 buff/cache
MiB Swap:  31664.0 total,  31653.0 free,     11.0 used. 250513.8 avail Mem
ra
    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
 856263 user     20   0    7368   3492   3244 S   0.0   0.0   0:00.00 sh
 856273 user     20   0   11.3g  88408  47120 S   0.0   0.0   0:01.53 node
 856481 user     20   0   32.8g 494600  56128 S   0.0   0.2   0:07.03 node
 856500 user     20   0   11.1g  52772  43096 S   0.0   0.0   0:00.20 node
 856545 user     20   0   11.1g  58804  43716 S   0.0   0.0   0:00.28 node
 856627 user     20   0    9296   5708   3724 S   0.0   0.0   0:00.01 bash
 856938 user     20   0  993652  51988  40772 S   0.0   0.0   0:00.14 node

From ps aux:

USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
user     856263  0.0  0.0   7368  3492 ?        S    19:54   0:00 sh /home/user/.vscode-server/bin/f1a4fb101478ce6ec82fe9627c43efbf9e98c813/bin/code-server --start-server --server-data-dir /home/user/.vscode-server --host=127.0.0.1 --accept-server-license-terms --enable-remote-auto-shutdown --port=0 --telemetry-level all --connection-token-file /home/user/.vscode-server/.f1a4fb101478ce6ec82fe9627c43efbf9e98c813.token
user     856273  0.2  0.0 11801656 89348 ?      Sl   19:54   0:01 /work1/user/.vscode-server/bin/f1a4fb101478ce6ec82fe9627c43efbf9e98c813/node /work1/user/.vscode-server/bin/f1a4fb101478ce6ec82fe9627c43efbf9e98c813/out/server-main.js --start-server --server-data-dir /home/user/.vscode-server --host=127.0.0.1 --accept-server-license-terms --enable-remote-auto-shutdown --port=0 --telemetry-level all --connection-token-file /home/user/.vscode-server/.f1a4fb101478ce6ec82fe9627c43efbf9e98c813.token
user     856481  1.0  0.1 34358896 481284 ?     Sl   19:54   0:07 /work1/user/.vscode-server/bin/f1a4fb101478ce6ec82fe9627c43efbf9e98c813/node --dns-result-order=ipv4first /work1/user/.vscode-server/bin/f1a4fb101478ce6ec82fe9627c43efbf9e98c813/out/bootstrap-fork --type=extensionHost --transformURIs --useHostProxy=false
user     856500  0.0  0.0 11594752 52772 ?      Sl   19:54   0:00 /work1/user/.vscode-server/bin/f1a4fb101478ce6ec82fe9627c43efbf9e98c813/node /work1/user/.vscode-server/bin/f1a4fb101478ce6ec82fe9627c43efbf9e98c813/out/bootstrap-fork --type=fileWatcher
user     856545  0.0  0.0 11617280 59424 ?      Sl   19:54   0:00 /work1/user/.vscode-server/bin/f1a4fb101478ce6ec82fe9627c43efbf9e98c813/node /work1/user/.vscode-server/bin/f1a4fb101478ce6ec82fe9627c43efbf9e98c813/out/bootstrap-fork --type=ptyHost --logsPath /home/user/.vscode-server/data/logs/20241127T195419
user     856627  0.0  0.0   9296  5708 pts/18   Ss+  19:54   0:00 /bin/bash --init-file /work1/user/.vscode-server/bin/f1a4fb101478ce6ec82fe9627c43efbf9e98c813/out/vs/workbench/contrib/terminal/common/scripts/shellIntegration-bash.sh
user     856938  0.0  0.0 993652 52248 ?        Sl   19:57   0:00 /work1/user/.vscode-server/bin/f1a4fb101478ce6ec82fe9627c43efbf9e98c813/node /work1/user/.vscode-server/bin/f1a4fb101478ce6ec82fe9627c43efbf9e98c813/extensions/markdown-language-features/dist/serverWorkerMain --node-ipc --clientProcessId=856481

Copilot & Copilot-Chat disable

/proc/<pid>/status:

VmPeak: 22359908 kB
VmSize: 22326696 kB
VmLck:         0 kB
VmPin:         0 kB
VmHWM:    184992 kB
VmRSS:    139512 kB
RssAnon:           85656 kB
RssFile:           53856 kB
RssShmem:              0 kB
VmData:   188232 kB
VmStk:      1004 kB
VmExe:     81740 kB
VmLib:      5556 kB
VmPTE:      2264 kB
VmSwap:        0 kB
Threads:        12

From top:

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
 861115 user     20   0    7368   3564   3316 S   0.0   0.0   0:00.00 sh
 861125 user     20   0   11.3g  87528  47052 S   0.0   0.0   0:01.48 node
 861201 user     20   0   11.1g  58956  43768 S   0.0   0.0   0:00.26 node
 861252 user     20   0   11.1g  52192  43084 S   0.0   0.0   0:00.15 node
 861357 user     20   0   21.3g 138768  53856 S   0.0   0.1   0:01.52 node
 861392 user     20   0  993868  51532  40736 S   0.0   0.0   0:00.12 node
 861438 user     20   0    9296   5772   3764 S   0.0   0.0   0:00.02 bash

From ps aux:

USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
user     861115  0.0  0.0   7368  3564 ?        S    20:25   0:00 sh /home/user/.vscode-server/bin/f1a4fb101478ce6ec82fe9627c43efbf9e98c813/bin/code-server --start-server --server-data-dir /home/user/.vscode-server --host=127.0.0.1 --accept-server-license-terms --enable-remote-auto-shutdown --port=0 --telemetry-level all --connection-token-file /home/user/.vscode-server/.f1a4fb101478ce6ec82fe9627c43efbf9e98c813.token
user     861125  0.8  0.0 11800484 87792 ?      Sl   20:25   0:01 /work1/user/.vscode-server/bin/f1a4fb101478ce6ec82fe9627c43efbf9e98c813/node /work1/user/.vscode-server/bin/f1a4fb101478ce6ec82fe9627c43efbf9e98c813/out/server-main.js --start-server --server-data-dir /home/user/.vscode-server --host=127.0.0.1 --accept-server-license-terms --enable-remote-auto-shutdown --port=0 --telemetry-level all --connection-token-file /home/user/.vscode-server/.f1a4fb101478ce6ec82fe9627c43efbf9e98c813.token
user     861201  0.1  0.0 11683512 58956 ?      Sl   20:25   0:00 /work1/user/.vscode-server/bin/f1a4fb101478ce6ec82fe9627c43efbf9e98c813/node /work1/user/.vscode-server/bin/f1a4fb101478ce6ec82fe9627c43efbf9e98c813/out/bootstrap-fork --type=ptyHost --logsPath /home/user/.vscode-server/data/logs/20241127T202515
user     861252  0.0  0.0 11594752 52192 ?      Sl   20:25   0:00 /work1/user/.vscode-server/bin/f1a4fb101478ce6ec82fe9627c43efbf9e98c813/node /work1/user/.vscode-server/bin/f1a4fb101478ce6ec82fe9627c43efbf9e98c813/out/bootstrap-fork --type=fileWatcher
user     861357  0.9  0.0 22326696 138220 ?     Sl   20:25   0:01 /work1/user/.vscode-server/bin/f1a4fb101478ce6ec82fe9627c43efbf9e98c813/node --dns-result-order=ipv4first /work1/user/.vscode-server/bin/f1a4fb101478ce6ec82fe9627c43efbf9e98c813/out/bootstrap-fork --type=extensionHost --transformURIs --useHostProxy=false
user     861392  0.0  0.0 993868 51532 ?        Sl   20:25   0:00 /work1/user/.vscode-server/bin/f1a4fb101478ce6ec82fe9627c43efbf9e98c813/node /work1/user/.vscode-server/bin/f1a4fb101478ce6ec82fe9627c43efbf9e98c813/extensions/markdown-language-features/dist/serverWorkerMain --node-ipc --clientProcessId=861357
user     861438  0.0  0.0   9296  5772 pts/21   Ss+  20:25   0:00 /bin/bash --init-file /work1/user/.vscode-server/bin/f1a4fb101478ce6ec82fe9627c43efbf9e98c813/out/vs/workbench/contrib/terminal/common/scripts/shellIntegration-bash.sh

We have preformed the test that you suggest, and will email you the test report and the /proc/<pid>/status later (sent at 21:46 UTC+8 from my Gmail account).

Given the information we having now (especially Extension Host Process exited with code: null, signal: SIGTRAP), should we still need to try to get the core dump? After we set ulimit -c unlimited for the user in default, should we still expect to see a core dump file after the extension host crash?

@deepak1556
Copy link
Collaborator

Thanks for the process info, yeah it is clear that copilot extension was not the primary cause for crash the extension host already reaches 22g for all the other allocations from Node.js and V8. The copilot extension does spawn worker threads which needs their own heap setups that ended up reaching the limit set for the process. I can drill a bit more on my side to understand if there were virtual memory allocation changes between 1.93 and 1.95.

should we still need to try to get the core dump? After we set ulimit -c unlimited for the user in default, should we still expect to see a core dump file after the extension host crash?

The dump file is not needed at this time since we have pinpointed the crash.

Followup question, Is there a requirement to cap the virtual memory on the login node ? would like to understand if the workaround can be applied permanently or not.

@technic960183
Copy link
Author

technic960183 commented Nov 27, 2024

Thanks for all the help and guidance.
Based on my understanding, setting 32GB of address space limit is because our system admin doesn't want a single process to use more than 1/8 of the total memory on the login node (256GB of RAM). Additionally, we only utilize ulimit for process monitoring due to performance considerations, as we have a single login node that needs to support more than 15 users simultaneously.

We are exploring alternative ways to achieve this without impacting performance. Or maybe it is possible to ask our system admin to raise the limit.
We will add a new comment tomorrow after our meeting with the system admin.

Update:

We: Are the following records represent that these node.js process claim a lots of address space but only use very few of them?
System admin: I think yes.
We: The virtual memory are only virtual before the pages are mapped to a physical space?
System admin: And yes, but we need the setting

* soft memlock unlimited
* hard memlock unlimited

to make infiniband driver work properly. So I went to limit virtual memory instead.

This is the reason that virtual memory is limited. We might raise the limit for as a temporary solution, but if node.js claim even more address space in the future, it might still be a problem.

@technic960183
Copy link
Author

We raised the virtual memory limit from 32GB to 40GB and this temporary solve the problem.

We are not quite sure, but it seems that this function from V8 is the source of the problem.

uintptr_t SysInfo::AddressSpaceEnd() {
#if V8_OS_WIN
  SYSTEM_INFO info;
  GetSystemInfo(&info);
  uintptr_t max_address =
      reinterpret_cast<uintptr_t>(info.lpMaximumApplicationAddress);
  return max_address + 1;
#else
  // We don't query POSIX rlimits here (e.g. RLIMIT_AS) as they limit the size
  // of memory mappings, but not the address space (e.g. even with a small
  // RLIMIT_AS, a process can still map pages at high addresses).
  return std::numeric_limits<uintptr_t>::max();
#endif
}

It return the largest possible address instead of following the maximum RLIMIT_AS given.
Still, although they said that RLIMIT_AS limit the size of memory mappings but not the address space. But we observed that RLIMIT_AS changes with our system limit, observing from the diagnostic report (generated by this code showing that it indeed is RLIMIT_AS).

Do you consider this as a bug? Or we misunderstood how this code works?
Would you recommend us to file an issue at Chromium Issue Tracker?

@deepak1556
Copy link
Collaborator

We are not quite sure, but it seems that this function from V8 is the source of the problem.

The function is only used in deciding address space limit for the v8 sandbox which is disabled in our remote server. I don't think the issue is from that.

@deepak1556 deepak1556 added upstream Issue identified as 'upstream' component related (exists outside of VS Code) remote Remote system operations issues nodejs NodeJS support issues mitigated Issue has workaround in place labels Dec 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
mitigated Issue has workaround in place nodejs NodeJS support issues remote Remote system operations issues upstream Issue identified as 'upstream' component related (exists outside of VS Code)
Projects
None yet
Development

No branches or pull requests

5 participants