Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation Fault #3942

Closed
bonnema opened this issue Apr 6, 2020 · 22 comments
Closed

Segmentation Fault #3942

bonnema opened this issue Apr 6, 2020 · 22 comments
Assignees
Labels
severity - Normal Bug Triage Issue needs to be assessed and labeled, further information on reported might be needed

Comments

@bonnema
Copy link
Contributor

bonnema commented Apr 6, 2020

Issue overview

Segmentation fault on Windows encountered when trying to load an IDF with the OpenStudio ruby bindings on "official" Ruby 2.5.

Current Behavior

When trying to load an IDF file, the segmentation fault in segfault.txt is encountered.

Expected Behavior

The IDF should be loaded 20 times
C:\Users\ebonnema>C:\OpenStudio-3.0.0-rc2\bin\openstudio.exe rw.rb
0
1
2
...
18
19

C:\Users\ebonnema>

Steps to Reproduce

  1. Install OpenStudio-3.0.0-rc2 (@tijcolem can provide link)
  2. Install Ruby 2.5.8-1-x64 (https://github.com/oneclick/rubyinstaller2/releases/download/RubyInstaller-2.5.8-1/rubyinstaller-2.5.8-1-x64.exe)
  3. Add a file named openstudio.rb to C:/Ruby25-x64/lib/ruby/site_ruby that has the line: require "C:/OpenStudio-3.0.0-rc2/Ruby/openstudio.so"
  4. Download and unzip rw.zip, then from inside the unzipped directory run ruby rw.rb

Possible Solution

  • Run using CLI (e.g. C:\Users\ebonnema>C:\OpenStudio-3.0.0-rc2\bin\openstudio.exe rw.rb)
  • WARNING: system calls in ruby must also be to the CLI, not the system ruby
  • Tried using newer versions of Ruby (2.6 and 2.7) but they're not compatible with OpenStudio-3.0.0

Details

Ruby Bindings Installations

eric@EBONNEMA-31650S:~$ cat /mnt/c/Ruby22-x64/lib/ruby/site_ruby/openstudio.rb
require 'C:/OpenStudio-2.9.1/Ruby/openstudio.rb'
eric@EBONNEMA-31650S:~$ cat /mnt/c/Ruby25-x64/lib/ruby/site_ruby/openstudio.rb
require "C:/OpenStudio-3.0.0-rc2/Ruby/openstudio.so"
eric@EBONNEMA-31650S:~$ cat /mnt/c/Ruby26-x64/lib/ruby/site_ruby/openstudio.rb
require "C:/OpenStudio-3.0.0-rc2/Ruby/openstudio.so"
eric@EBONNEMA-31650S:~$ cat /mnt/c/Ruby27-x64/lib/ruby/site_ruby/openstudio.rb
require "C:/OpenStudio-3.0.0-rc2/Ruby/openstudio.so"

Testing Ruby Bindings Installations

Microsoft Windows [Version 10.0.18362.720]
(c) 2019 Microsoft Corporation. All rights reserved.

C:\Users\ebonnema>C:\Ruby22-x64\bin\ruby.exe --version
ruby 2.2.4p230 (2015-12-16 revision 53155) [x64-mingw32]

C:\Users\ebonnema>C:\Ruby22-x64\bin\irb
irb(main):001:0> require 'openstudio'
=> true
irb(main):002:0> exit

C:\Users\ebonnema>C:\Ruby25-x64\bin\ruby.exe --version
ruby 2.5.8p224 (2020-03-31 revision 67882) [x64-mingw32]

C:\Users\ebonnema>C:\Ruby25-x64\bin\irb
irb(main):001:0> require 'openstudio'
=> true
irb(main):002:0> exit

C:\Users\ebonnema>C:\Ruby26-x64\bin\ruby.exe --version
ruby 2.6.6p146 (2020-03-31 revision 67876) [x64-mingw32]

C:\Users\ebonnema>C:\Ruby26-x64\bin\irb
irb(main):001:0> require 'openstudio'
Traceback (most recent call last):
9: from C:/Ruby26-x64/bin/irb.cmd:31:in <main>' 8: from C:/Ruby26-x64/bin/irb.cmd:31:in load'
7: from C:/Ruby26-x64/lib/ruby/gems/2.6.0/gems/irb-1.0.0/exe/irb:11:in <top (required)>' 6: from (irb):1 5: from C:/Ruby26-x64/lib/ruby/2.6.0/rubygems/core_ext/kernel_require.rb:54:in require'
4: from C:/Ruby26-x64/lib/ruby/2.6.0/rubygems/core_ext/kernel_require.rb:54:in require' 3: from C:/Ruby26-x64/lib/ruby/site_ruby/openstudio.rb:1:in <top (required)>'
2: from C:/Ruby26-x64/lib/ruby/2.6.0/rubygems/core_ext/kernel_require.rb:54:in require' 1: from C:/Ruby26-x64/lib/ruby/2.6.0/rubygems/core_ext/kernel_require.rb:54:in require'
LoadError (126: The specified module could not be found. - C:/OpenStudio-3.0.0-rc2/Ruby/openstudio.so)
irb(main):002:0> exit

C:\Users\ebonnema>C:\Ruby27-x64\bin\ruby.exe --version
ruby 2.7.1p83 (2020-03-31 revision a0c7c23c9c) [x64-mingw32]

C:\Users\ebonnema>C:\Ruby27-x64\bin\irb
irb(main):001:0> require 'openstudio'
Traceback (most recent call last):
9: from C:/Ruby27-x64/bin/irb.cmd:31:in <main>' 8: from C:/Ruby27-x64/bin/irb.cmd:31:in load'
7: from C:/Ruby27-x64/lib/ruby/gems/2.7.0/gems/irb-1.2.3/exe/irb:11:in <top (required)>' 6: from (irb):1 5: from C:/Ruby27-x64/lib/ruby/2.7.0/rubygems/core_ext/kernel_require.rb:92:in require'
4: from C:/Ruby27-x64/lib/ruby/2.7.0/rubygems/core_ext/kernel_require.rb:92:in require' 3: from C:/Ruby27-x64/lib/ruby/site_ruby/openstudio.rb:1:in <top (required)>'
2: from C:/Ruby27-x64/lib/ruby/2.7.0/rubygems/core_ext/kernel_require.rb:92:in require' 1: from C:/Ruby27-x64/lib/ruby/2.7.0/rubygems/core_ext/kernel_require.rb:92:in require'
LoadError (126: The specified module could not be found. - C:/OpenStudio-3.0.0-rc2/Ruby/openstudio.so)
irb(main):002:0> exit

Environment

  • OpenStudio-3.0.0-rc2
  • ruby 2.5.8p224 (2020-03-31 revision 67882) [x64-mingw32]

Windows Specifications

  • Edition: Windows 10 Enterprise
  • Version: 1903
  • Installed on: 7/18/2019
  • OS build: 18362.720

Context

  • No issues using Ruby-2.2 and OpenStudio-2.9.1
  • This same issue presented itself on Ruby version 2.5.5.
  • Not a problem on Mac or Ubuntu Linux
@bonnema bonnema added the Triage Issue needs to be assessed and labeled, further information on reported might be needed label Apr 6, 2020
@lefticus
Copy link
Contributor

lefticus commented Apr 6, 2020

So it's refusing to load the module at all with 2.5.8 in your description, is that correct?

@bonnema
Copy link
Contributor Author

bonnema commented Apr 6, 2020

it loads it with 2.5.8, but not 2.6.6 or 2.7.1 - but I get the segfault with 2.5.5 and 2.5.8

@lefticus
Copy link
Contributor

lefticus commented Apr 6, 2020

OK, I see. I'll start digging into it tomorrow.

@lefticus
Copy link
Contributor

lefticus commented Apr 7, 2020

@bonnema @kbenne as I recall there was a simple test case of just opening an IDF in a loop with the system Ruby that exhibited this behavior? Based on my understanding of this problem that's probably a good (and fast) place for me to start.

@bonnema
Copy link
Contributor Author

bonnema commented Apr 7, 2020

@lefticus download https://github.com/NREL/OpenStudio/files/4440764/rw.zip, it has a script just like you're talking about.

@lefticus
Copy link
Contributor

lefticus commented Apr 7, 2020

@lefticus download https://github.com/NREL/OpenStudio/files/4440764/rw.zip, it has a script just like you're talking about.

I apologize for only skimming your bug report.

@bonnema
Copy link
Contributor Author

bonnema commented Apr 7, 2020

No worries - it was pretty long...

@lefticus
Copy link
Contributor

lefticus commented Apr 7, 2020

Interesting data point - I "randomly crash" on the 6th iteration each time. If that number means anything to anyone, please share.

@bonnema
Copy link
Contributor Author

bonnema commented Apr 7, 2020

@lefticus when @kbenne and I were messing around, it would not always crash at the 6th iteration (e.g. sometimes it was the 4th iteration); but, once an iteration was "determined" it consistently crashed on the same one.

@lefticus
Copy link
Contributor

lefticus commented Apr 7, 2020

OK, thanks for the data point.

@tijcolem
Copy link
Collaborator

tijcolem commented Apr 7, 2020

Seems related #3944

@lefticus
Copy link
Contributor

lefticus commented Apr 7, 2020

The Problem

Here's the issue as I understand it, I'll do my best to make it readable.

It's a long read, but please do, it makes for an exciting story!!

Fundamental problem: Ruby redefines common C api functions and exports them with its DLL on Windows (fclose fopen notably at the moment).

BTW, this is an issue I initially identified ~ 8 years ago when we were first embedding Ruby into OSApp. Which is why there has to be a header shim that #undefs a bunch of things.

The Story

This is what happens when we use ruby.exe.

   +-ruby.exe-----------------+
   |                          |
   | Loads ruby dll           |
   | Loads OpenStudio.dll     |
   +-----------+--------------+
               |
               v
   +-x64-msvcrt-ruby-250.dll+-+
   |                          |
+--> (provides)fclose         |
|  | (provides)fopen          |
|  | (provides)...            |
|  +------------+-------------+
|               |
|               v
|  +-msvcrt.dll-+-------------+
|  |                          |
|  | (provides)fclose         |
|  | (provides)fopen          |
|  | (provides)...            |
|  +-----------------+--------+
|                    |
|                    v
|  +-OpenStudio.dll+-+--------+
|  |                          |
+--+ (uses)fclose             |
   | (uses)fopen              |
   | (uses)...                |
   +--------------------------+

Compare this to what happens with our own OpenStudio.exe

   +-OpenStudio.exe+----------+
   |                          |
   | Loads msvcrt.dll         |
   | Loads OpenStudio.dll     |
   | Loads our Ruby.dll       |
   +--------------------------+
                     |
                     v
   +-msvcrt.dll---------------+
   |                          |
+--+ (pro^ides)fclose         |
^  | (provides)fopen          |
|  | (provides)...            |
|  +--------------------------+
|                    |
|                    v
|  +-OpenStudio.dll+----------+
|  |                          |
+--+ (uses)fclose             |
   | (uses)fopen              |
   | (uses)...                |
   +--------------------------+

But how does this happen, you ask? Why does ruby define these common system functions?!

See: https://github.com/ruby/ruby/blob/master/include/ruby/win32.h#L750

But then you say "but that's not called fclose, it's called rb_w32_fclose.

In fact, you are correct. The next step is where the magic happens: https://github.com/ruby/ruby/blob/master/win32/mkexports.rb#L38-L64

mkexports.rb searches the header file for everything named rb_w32_.*, strips the rb_w32 and explicitly exports it into the DLL!

You can actually see the issue here in the report that @bonnema posted:

C:\WINDOWS\System32\msvcrt.dll(free+0x1c) [0x00007fff45ad9cfc]
C:\WINDOWS\System32\msvcrt.dll(ungetwc+0xac68) [0x00007fff45b1a0f8]
C:\WINDOWS\System32\msvcrt.dll(clearerr_s+0x109) [0x00007fff45b0cb19]
The "real" fclose --> C:\WINDOWS\System32\msvcrt.dll(fclose+0x60) [0x00007fff45b0cbc0]
The library shim --> C:\Ruby25-x64\bin\x64-msvcrt-ruby250.dll(fclose+0xfc) [0x00000000680f79dc]
Just trying to close a file --> C:\OpenStudio-3.0.0-rc2\Ruby\openstudio.so(openstudio::BCLMeasure::clone+0x42f) [0x00007ffec242e4ff]
C:\OpenStudio-3.0.0-rc2\Ruby\openstudio.so(openstudio::YearDescription::YearDescription+0xe3) [0x00007ffec242b583]
C:\OpenStudio-3.0.0-rc2\Ruby\openstudio.so(openstudio::IdfFile::load+0x95a) [0x00007ffec28f166a]
C:\OpenStudio-3.0.0-rc2\Ruby\openstudio.so(openstudio::IdfFile::load+0x32e) [0x00007ffec28f1ede]

Read from the bottom up. See how openstudio.so calls x64-msvcrt-ruby250.dll:fclose? Why would it do that?

It doesn't, it simply closes a file that is an ifstream object which calls fclose. Ruby has "shimmed" itself into place ( https://en.wikipedia.org/wiki/Shim_(computing) ), so this function is called instead:

https://github.com/ruby/ruby/blob/master/win32/win32.c#L6392-L6412

int
rb_w32_fclose(FILE *fp)
{
    int fd = fileno(fp);
    SOCKET sock = TO_SOCKET(fd);
    int save_errno = errno;

    if (fflush(fp)) return -1;
    if (!is_socket(sock)) {
	UnlockFile((HANDLE)sock, 0, 0, LK_LEN, LK_LEN);
	return fclose(fp); // call the real fclose
    }
    _set_osfhnd(fd, (SOCKET)INVALID_HANDLE_VALUE);
    fclose(fp);
    errno = save_errno;
    if (closesocket(sock) == SOCKET_ERROR) {
	errno = map_errno(WSAGetLastError());
	return -1;
    }
    return 0;
}

Based on the implementation here it appears that the ruby implementers assumined that anything closed with fclose is a socket, not just a plain file?!

What Can We Do?

Provide Our Own Shim

Theoretically we could provide our own shim, but that would have to be loaded before the ruby dll, which AFAICT would require modifying Ruby, so that's a show stopper.

Remove Ruby's Shim

Requires modifying Ruby, so that's a show stopper. Might also break other things.

Statically Link OpenStudio With MSVCRT

Plausible, but requires building all dependencies with static MSVCRT (boost, etc). Also would mean two separate builds.

Link Our Own Shim Just Into OpenStudio (<-- still more research here)

I tried this, but got multiple symbol errors. I'm not 100% sure how to do it, if it's possible. I also

Drop Support For System Ruby on Windows

Easiest option, tell the users to use openstudio.exe as their ruby instead.

Why Didn't This Bite Us Before? (<-- possibly more research here)

Great question. Simple answer is that it's Undefined Behavior, and once you get into that realm there's no guarantees as to what's going to happen. Any little change could have shifted the order of how things are loaded and affected this.

It's theoretically possible that some change to our build of ruby changed things, but I reviewed that code and don't see that as a possibility.

Would Moving to MinGW For OpenStudio Save Us?

Maybe? But I don't think so. It would still be calling an unexpected version of common system API's

Are You Completely Off Your Rocker?

Maybe? But I've verified that the stack trace is consistent. It always happens when closing that file, and is always calling the shimmed ruby fclose provided function. When we really don't want it to be.

I've also gone through debugging steps to print the address of the expected library functions and have verified that the location of fclose changes depending on if the test is executed via openstudio.exe or ruby.exe.

Can You Detect This?

Yes. I think so. I could actually refuse to load OpenStudio.dll if fclose is not the expected version.

@bonnema
Copy link
Contributor Author

bonnema commented Apr 8, 2020

Wow. Nice work @lefticus! What do you think the best option is? Drop system support for Ruby?

@tijcolem
Copy link
Collaborator

tijcolem commented Apr 8, 2020

@lefticus This is some great detective work! So the way I am reading this it seems to be a fundamental flaw in the design of the ruby rb_w32_fclose Would I be able to reproduce this using straight-up ruby I/O functions? Or is this a unique runtime combination we have that now allows this condition to surface?

@lefticus
Copy link
Contributor

lefticus commented Apr 8, 2020

I've been pinging the people I know on Twitter involved at Microsoft: https://twitter.com/MalwareMinigun/status/1247923198743609345 (Billy is one of the standard lib developers at Microsoft). If you read the whole thread you'll see that even the MS developer doesn't see any way around it.

image

Once we're in the land of Undefined Behavior (UB) there's no way to know what combination of things will cause a crash or not.

I think the best we can hope for at the moment is that openstudio.exe works how we want and doesn't beak any of the other loaded ruby modules.

@kbenne
Copy link
Contributor

kbenne commented Apr 8, 2020

so

a) system ruby in this conversation is an mingw based executable yes?
b) we are loading openstudio.dll which is an msvc executable.

I believe a) is true, but jason's load tree which goes ruby.exe -> -x64-msvcrt-ruby-250.dll makes me wonder if I"m misunderstanding.

If a) is true then another datapoint would be to try the test with openstudio's msvc based build of ruby.exe

lefticus added a commit that referenced this issue Apr 10, 2020
 * Issue: ruby's rb_w32_fclose is chosen during ~fstream() shutdown
 * Cause: ruby's dll provides an fclose which is loaded before msvcrt
 * What Changed: We moved to full static linking of openstudio into the
   ruby modules
 * Fix: Make ruby module dynamically link to openstudiolib.dll

NOTE: This is not a perfect fix!

The fundamental problem (Ruby's terrible dll and ruby/win32.h's tendency
to `#define` common functions) still exists and will always exist.

This works because it tweaks the linking order of things.

Issues:

 * less parallelization of building of libraries
 * we have to put openstudiolib.dll in some path that is findable by
   ruby
 * It's possible ruby tests don't currently run if paths aren't set up
   correctly for test execution
 * Still not known why NREL built ruby cannot load our library

Fixes #3942
@lefticus
Copy link
Contributor

I have system ruby working, please see a27b2fc for description and issues.

@lefticus
Copy link
Contributor

Oops, a formatting fix in the last commit actually introduced a compile error. Working on it.

@lefticus
Copy link
Contributor

I believe this branch is ready to merge now, but not sure which branch against which to create the PR https://github.com/NREL/OpenStudio/tree/workaround_ruby_linking_fclose_issues

@bonnema
Copy link
Contributor Author

bonnema commented Apr 11, 2020

@tijcolem can you help here? I am not sure either. Thanks!

@tijcolem
Copy link
Collaborator

@lefticus This is great news! Could you please create a PR to v3.0.0-rc2 which is our latest release branch?

@tijcolem
Copy link
Collaborator

Closed via #3946

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
severity - Normal Bug Triage Issue needs to be assessed and labeled, further information on reported might be needed
Projects
None yet
Development

No branches or pull requests

4 participants