Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Header-only testing #149

Closed
ankane opened this issue Mar 3, 2021 · 74 comments
Closed

Header-only testing #149

ankane opened this issue Mar 3, 2021 · 74 comments

Comments

@ankane
Copy link
Contributor

ankane commented Mar 3, 2021

This tracks the testing status of #146 with existing projects.

Header-only docs

Each project uses the rice-header-only branch

Project Status Diff CI Notes
Midas Success 🎉 Diff CI
LIBFFM Success 🎉 Diff CI
IsoTree Success 🎉 Diff CI
YouTokenToMe Success 🎉 Diff CI
OutlierTree Success 🎉 Diff CI
Faiss Success 🎉 Diff CI
FastText Success 🎉 Diff CI
DataSketches Success 🎉 Diff CI
OR-Tools Success 🎉 Diff CI
Torch.rb Success 🎉 Diff CI
TorchAudio Success 🎉 Diff CI
Tomoto In-Progress Diff CI Mac error due to C++17, runtime segfault on Windows

Migration notes

  • Use rice/rice.hpp instead of individual includes
  • Use define_function/define_singleton_function instead of define_method/define_singleton_method when no self argument
  • Use lambda instead of pointer to lambda
  • Update conversion templates to use Rice::detail and use Rice::detail::From_Ruby<T>::convert(object.value()) instead of from_ruby<T>(object) (similar with to_ruby)
  • Remove outer parentheses from Arg list
  • Automake is no longer needed
cfis added a commit to cfis/rice that referenced this issue Mar 4, 2021
@cfis
Copy link
Collaborator

cfis commented Mar 4, 2021

Ok, I think all the Rice issues for Faiss are now resolved. There are a couple failures in the tests, but they don't look Rice related to me.

cfis added a commit to cfis/rice that referenced this issue Mar 4, 2021
@cfis
Copy link
Collaborator

cfis commented Mar 4, 2021

Datasketches error was due to not having From_Ruby<unsigned char> which is now added (and I added signed char while at it).

@ankane
Copy link
Contributor Author

ankane commented Mar 4, 2021

Great, Faiss looking much better. Seeing bad any_cast runtime error on Ubuntu 18.04, but not on Ubuntu 20.04 or Mac.

Just ran a new build for DataSketches, but still seeing compile errors on all platforms.

Edit: DataSketches error is error: cannot convert ‘Rice::Array::Proxy’ to ‘VALUE’ {aka ‘long unsigned int’}

@ankane
Copy link
Contributor Author

ankane commented Mar 7, 2021

Hey @cfis, thanks for ankane/tomoto-ruby#2. I converted Object to VALUE for convert in the other projects, and two more are now passing 🎉

fastText and DataSketches are failing on similar code, but with different error messages.

fastText - error: conversion from ‘tuple<fasttext::FastText&>’ to non-scalar type ‘tuple<fasttext::FastText>’ requested

    .define_method("dimension", &FastText::getDimension)
    .define_method("quantized?", &FastText::isQuant)
    .define_method("word_id", &FastText::getWordId)
    .define_method("subword_id", &FastText::getSubwordId)

DataSketches - invalid abstract return type

    .define_method("empty?", &theta_sketch::is_empty)

I imagine they could be converted to lambdas if needed, but wanted to see if this way of defining methods was still supported.

For tomoto, I applied the patch, which improved things significantly, but now running into another compile time error on Mac and runtime errors on Linux and Windows. Will spend some more time on it to see if I can figure it out.

@cfis
Copy link
Collaborator

cfis commented Mar 7, 2021

@ankane - Thanks for all the feedback.
On DataSketches, that was a bug. Fixed in b3c37d7. Test run cleanly for me.

@cfis
Copy link
Collaborator

cfis commented Mar 7, 2021

And fastText compiles for me now too. The tests run, but fail, all with the error "ArgumentError: test/support/unsupervised.txt cannot be opened for training!". I am assuming that is not Rice related, but if it is let me know.

@ankane
Copy link
Contributor Author

ankane commented Mar 8, 2021

Thanks @cfis, all tests now green for both of those. Working through the remaining projects now.

Is there a replacement for:

Rice::Data_Object<MPObjective>(x, rb_cMPVariable, nullptr, nullptr);

Rice 3 context

@cfis
Copy link
Collaborator

cfis commented Mar 8, 2021

Just remove the two nullptr's. Those were for mark and free functions, but I updated the library to use TypedData_Struct - as opposed to deprecated Data_Struct which supposedly is going to be finally removed in Ruby 3.1 according to compiler warning messages. Thus mark and free are now setup in Data_Type, you don't have to pass them in to Data_Object.

@ankane
Copy link
Contributor Author

ankane commented Mar 9, 2021

Hmm, will that disable the mark and free functions? Seeing double free or corruption (out) when switching to that.

Also, unrelated, it looks like add_handler now returns Rice::Module instead of Rice::Class for code like this:

  Rice::define_class_under<torch::Device>(m, "Device")
    .add_handler<torch::Error>(handle_error)
    .define_constructor(Rice::Constructor<torch::Device, const std::string&>())

Error: ‘class Rice::Module’ has no member named ‘define_constructor’

@cfis
Copy link
Collaborator

cfis commented Mar 9, 2021

No - mark and free are handled differently now due to changes in Ruby itself (look in Data_Type if interested).

Double deletes - is the library passing you a reference or pointer? If so, you have to tell Rice to not free it. See this:

https://github.com/jasonroelofs/rice/blob/master/README.md#ownership

Previously Rice used to always assume ownership, and I left it that way. But over the last few days I've changed my mind and I think it should be the opposite - Rice does not take ownership except for object that itself creates via calling a C++ constructor. See #150.

@cfis
Copy link
Collaborator

cfis commented Mar 9, 2021

One other thing - if you don't implement To_Ruby correctly (meaning using perfect forwarding for references), you can take ownership when you don't mean too. I ran into that yesterday myself...thinking about what to do about it.

@cfis
Copy link
Collaborator

cfis commented Mar 9, 2021

On the add_handler, what are you actually doing in that code? I'm curious to see a use case for that functionality.

@ankane
Copy link
Contributor Author

ankane commented Mar 9, 2021

Re add_handler: using it to customize the exception message (what_without_backtrace instead of what), but there's probably a better way to do that.

inline void handle_error(torch::Error const & ex)
{
  throw Rice::Exception(rb_eRuntimeError, ex.what_without_backtrace());
}

Re ownership: OR-Tools tests green with Return().takeOwnership(false)

@ankane
Copy link
Contributor Author

ankane commented Mar 10, 2021

Hey @cfis, think I narrowed down the issue with Torch.rb. Any idea why the first works but the second segfaults?

inline VALUE wrap(torch::Tensor x) {
  return Rice::detail::To_Ruby<torch::Tensor>::convert(x);
}

static VALUE torch_ones(int argc, VALUE* argv, VALUE self_)
{
  auto dispatch_ones = []() -> torch::Tensor {
    return torch::ones({2, 3});
  };

  // works
  return Rice::detail::To_Ruby<torch::Tensor>::convert(dispatch_ones());

  // segfault
  return wrap(dispatch_ones());
}

For Tomoto, everything works great on Linux, but half of the classes segfault on Windows (haven't been able to find a pattern between the ones that work and ones that don't). Will continue to dig, but let me know if you have any ideas about what would cause this type of behavior.

@cfis
Copy link
Collaborator

cfis commented Mar 10, 2021

Hmm...of the top of my head not sure on either question. I assume the first one has something to do with lvalues/rvalues and something not getting copied correctly but of the top of me head I don't see it. Would have to walk through it with a debugger to check.

@cfis
Copy link
Collaborator

cfis commented Mar 11, 2021

@ankane - See what happens with the updates I just pushed to master. Note they will break your code.

To fix remove Return().takeOwnership(false) since that is now the default and will no longer compile. Then deal with the opposite issue - methods where Ruby does need to take ownership. In that case do this (note I updated the documentation if you want to see a more fleshed out example):

define_function("method_that_returns_object_ruby_manages.", &some_method, Return().takeOwnership());

If you still run into issues let me know and I can take a look.

@ankane
Copy link
Contributor Author

ankane commented Mar 11, 2021

Great, have Torch.rb working locally, and will try the changes with OR-Tools. Can you regenerate rice.hpp when you have a chance?

Also, needed to wrap some convert calls with Object. Otherwise, ended up with integers. Example

Array a;
a.push(Object(Rice::detail::To_Ruby<torch::Tensor>::convert(t, true)));

Is it possible to push VALUEs directly? Noticed the same for functions that return VALUE.

@cfis
Copy link
Collaborator

cfis commented Mar 11, 2021

Yeah, VALUEs are pita. Since they are simply a typdef to long long, there is no way to specialize templates for them and they get incorrectly converted. Using Object solves that, but at the cost of a an extra allocation for each conversion. My thinking was most of the time you won't be passing VALUE to C++ functions so although annoying easy enough to deal with since it won't occur much (note that SELF is specifically handled since that is the most common occurrence). Are you passing VALUEs around a lot?

However, it would be good to have a solution. My best idea so far is a way tell Rice to not convert these values. Maybe something like this:

VALUE workingWithValues(VALUE input)
{
  return input;
}

define_global_function("working_with_values", @workingWithValues, Arg("input").noConversion(), Return().noConversion());

-- or ---
define_global_function("working_with_values", @workingWithValues, Arg("input").isValue(), Return().isValue());

@ankane
Copy link
Contributor Author

ankane commented Mar 12, 2021

Ah, makes sense. Not using VALUEs a lot (mainly just in the example above), so not a big deal to wrap. The solution above looks clean (think I slightly prefer the 2nd one fwiw).

@ankane
Copy link
Contributor Author

ankane commented Mar 12, 2021

Also, seems like things are in a good spot with testing. Can run a final round of tests for all projects right before the header-only release.

@cfis
Copy link
Collaborator

cfis commented Mar 12, 2021

Great. On tomoto - the segfaults on windows are with mingw64?

@ankane
Copy link
Contributor Author

ankane commented Mar 12, 2021

I believe so (setup-ruby uses RubyInstaller2 and the target is reported as x64-mingw32 - logs from latest run). It happens with about half the classes in the project that use inheritance.

@cfis
Copy link
Collaborator

cfis commented Mar 12, 2021

Which branch are you using?

  • rice-header-only-windows
  • rice-header-only-windows-slda
  • rice-header-only-windows-take2

@ankane
Copy link
Contributor Author

ankane commented Mar 12, 2021

rice-header-only is most recent.

@jasonroelofs
Copy link
Collaborator

@ankane If you've got some time, could you do one more round of testing? There's been some change to how to implement and use To_Ruby and From_Ruby as per #160.

@ankane
Copy link
Contributor Author

ankane commented Apr 25, 2021

Yeah, definitely. It looks like there are currently compilation errors with Ruby < 3 (error: use of undeclared identifier 'rb_uint2num_inline'). Also, Ruby 2.5 is now EOL, so you may consider dropping support for it in 4.0.

@jasonroelofs
Copy link
Collaborator

Good point re: Ruby 2.5, we can shut that one down.

@cfis
Copy link
Collaborator

cfis commented Apr 26, 2021

Hmm, I've been using Ruby 2.7 without issue. Will track down when rb_uint2num_inline came to be. I'm using that instead of the macro so that we call it using detail::protect in case an exception is raised.

@cfis
Copy link
Collaborator

cfis commented Apr 26, 2021

Its not Ruby 2.7/3.0 thing. Its a OS thing:

#if SIZEOF_INT < SIZEOF_LONG

Fix pushed.

@cfis
Copy link
Collaborator

cfis commented Apr 26, 2021

@ankane - I removed is_builtin. So if you want to define your own To_Ruby/From_Ruby functions you need to specialize detail::Type. You can see examples in std::string, std::complex, Hash, etc.

I will update the docs shortly.

@ankane
Copy link
Contributor Author

ankane commented Apr 26, 2021

It looks like there are CI failures on Ubuntu and Mac, so will hold off on testing until those are resolved.

@cfis
Copy link
Collaborator

cfis commented Apr 27, 2021

CI failures have been cleaned up - most of them were caused by MSVC++ counting index_sequences backwards versus GCC /Clang which process them forwards. I just changed the tests to avoid that compiler difference. The other failure is a bit random and caused by the Ruby garbage collector sometimes calling mark of an object it is going to free (usually it doesn't do that) - but its not a bug in Rice.

Anyway, definitely ready for testing and thanks for the help!

Also docs have been updated, although I'm not sure how @jasonroelofs wants to publish them (after they get built).

@ankane
Copy link
Contributor Author

ankane commented Apr 27, 2021

Great, things are mostly working. A few findings:

  1. Seeing type is not defined with Rice: Rice::Symbol with OR-Tools. Fixed after adding:
namespace Rice::detail
{
  template<>
  struct Type<Rice::Symbol>
  {
    static bool verify()
    {
      return true;
    }
  };
}
  1. Seeing Type Rice::Hash is not registered with Torch.rb for a method that returns a hash.

  2. Seeing runtime errors for constructors with default arguments with Faiss and DataSketches (it looks like nil is being passed when arguments are omitted instead of the default).

cfis added a commit that referenced this issue Apr 27, 2021
@cfis
Copy link
Collaborator

cfis commented Apr 27, 2021

Good catches. Pushed fixes for both issues.

I realized that I changed From_Ruby from a struct to a class (since you have to instantiate it and call a method on it), but I didn't change To_Ruby. I guess I can go either way on those (struct or class), but they should be the same.

@cfis
Copy link
Collaborator

cfis commented Apr 29, 2021

@ankane - For the default values, take a look at this documentation (particularly step 3 and then the default arguments section at the bottom):

https://github.com/jasonroelofs/rice/blob/master/doc/advanced/type_conversions.rst

If you add your own type conversions, its up to you to track the default value now. Here is how to update your code:

  template<>
  class From_Ruby<faiss::MetricType>
  {
  public:
    From_Ruby() = default;

    From_Ruby(Arg* arg) : arg_(arg)
    {
    }

    faiss::MetricType convert(VALUE x)
    {
      if (x == Qnil && this->arg_ && this->arg_->hasDefaultValue()) {
        return this->arg_->defaultValue();
      }

      auto s = Object(x).to_s().str();
      if (s == "inner_product") {
        return faiss::METRIC_INNER_PRODUCT;
      } else if (s == "l2") {
        return faiss::METRIC_L2;
      } else {
        // TODO throw argument error
        throw std::runtime_error("Invalid metric: " + s);
      }
    }

  private:
    Arg* arg_;
  };

The logic behind the change is described in #160. Notice From_Ruby is now a class, its instantiated, and you don't want to use static methods.

To_Ruby is also now a class and instantiated (for more info see the doc link above).

@ankane
Copy link
Contributor Author

ankane commented Apr 29, 2021

Hmm, the code gets rid of the error, but the default value isn't set correctly. Also, if I use the std::optional pattern from the docs, it doesn't detect the default value.

For DataSketches, I'm seeing the default argument issue with built-in types (uint8_t and uint64_t).

For Torch.rb, it's having trouble finding rice.hpp with Ruby 2.6. Relevant trace:

/home/runner/work/torch.rb/torch.rb/vendor/bundle/ruby/2.6.0/bundler/gems/rice-4ad34b5d691a/lib/mkmf-rice.rb:16: warning: already initialized constant MakeMakefile::CONFTEST_C
/opt/hostedtoolcache/Ruby/2.6.7/x64/lib/ruby/2.6.0/mkmf.rb:259: warning: previous definition of CONFTEST_C was here
checking for rice/rice.hpp in /home/runner/work/torch.rb/torch.rb/vendor/bundle/ruby/2.6.0/bundler/gems/rice-4ad34b5d691a/include... no
[more lines]
../../../../ext/torch/cuda.cpp:3:25: fatal error: rice/rice.hpp: No such file or directory

@cfis
Copy link
Collaborator

cfis commented Apr 29, 2021

I realized after writing the explanation above that Rice couldn't support the use case of using define_enum and setting a default enumerated value. And in fact, that was true for define_class and define_vector etc.

So sorry to change this again, but I realized a better way to do this that would work for non-copyable C++ types which is passing an Arg instance to From_Ruby and To_Ruby. I did consider that before, and didn't go down that route, but I realized that was a bad decision.

So pushed that change, updated the docs and the code sample above.

As for uint8_t and int8_t (unsigned char and signed char), those did not support default types. So added that in. Not sure about uint64_t, that should be fine (unless its a pointer to a uint64_t - Rice doesn't support default values for pointers although actually it could do so now with this change).

As for not finding rice.hpp...not sure haven't changed that bit in quite a while.

@ankane
Copy link
Contributor Author

ankane commented Apr 29, 2021

Great, DataSketches is now working.

For Faiss, I'm seeing:

/home/runner/work/faiss/faiss/ext/faiss/index.cpp:38:41: error: no matching function for call to ‘Rice::Arg::defaultValue()’
   38 |         return this->arg_->defaultValue();
      |                                         ^
In file included from /home/runner/work/faiss/faiss/ext/faiss/utils.h:3,
                 from /home/runner/work/faiss/faiss/ext/faiss/index.cpp:13:
/home/runner/work/faiss/faiss/vendor/bundle/ruby/3.0.0/bundler/gems/rice-737dd6873246/include/rice/rice.hpp:1194:20: note: candidate: ‘template<class Arg_Type> Arg_Type& Rice::Arg::defaultValue()’
 1194 |   inline Arg_Type& Arg::defaultValue()
      |                    ^~~
/home/runner/work/faiss/faiss/vendor/bundle/ruby/3.0.0/bundler/gems/rice-737dd6873246/include/rice/rice.hpp:1194:20: note:   template argument deduction/substitution failed:
/home/runner/work/faiss/faiss/ext/faiss/index.cpp:38:41: note:   couldn’t deduce template parameter ‘Arg_Type’
   38 |         return this->arg_->defaultValue();
      |                                         ^

It seems a bit odd that each type needs to handle default arguments, since they'll all use the same pattern.

Edit: Using this->arg_->defaultValue<faiss::MetricType>(); fixed it, but it's no longer detecting the default value.

@cfis
Copy link
Collaborator

cfis commented Apr 30, 2021

Yeah it a lot of annoying repeated boiler plate code. The reason is because From_Ruby needs to be specialized for each type, and well, that is just how C++ class templates work. You could probably get rid of the extra constructor that takes an Arg using inheritance, but that isn't much of a win. Pybind has the same issue and uses a macro to avoid a lot of the boilerplate code, but I find macros hard to debug.

My thinking is users generally should not have to define custom versions of From_Ruby and To_Ruby. If you read the pybind11 docs they say "In very rare cases" should you be making custom conversions. I know you use it to avoid defining enums, but I'm hoping most people just use define_enum instead (fyi it occurred to me that Rice could auto define enum types like it does for std::vector which would solve your issue - you wouldn't get individual values in Ruby but if you don't care about that doesn't matter).

One obvious place you needed to have customer converters is for copying a Ruby Array to a std::vector, but that is now supported out of the box. Another case is dealing with Hashes - that should get resolved by supporting std::map.

As for calling defaultValue, yes you need to include the type defaultValue<faiss::MetricType>. Which is a good point, I updated the docs to show that.

The reason you aren't getting default values is because of this:

(Rice::Arg("quantizer"), Rice::Arg("d"), Rice::Arg("nlist"), Rice::Arg("metric") = faiss::METRIC_L2));

The Rice::Args are enclosed in a parentheses and separated by the comma operator, which means that only the last Arg is passed to Rice so it puts it in position 0 instead of 3. To fix remove the parentheses :

Rice::Arg("quantizer"), Rice::Arg("d"), Rice::Arg("nlist"), Rice::Arg("metric") = faiss::METRIC_L2);

This is the old style way of supporting 0 or more arguments - its been replaced by template parameter packs. I decided to remove the old support to simplify the code base (might as well get everything done at once), but I need to document that in the migration guide.

FYI - MSVC doesn't compile the Faiss bindings because it does not like the deferencing of lambdas, ie *[] doesn't work but [] works fine. See set_index_parameter and nprobe=

@ankane
Copy link
Contributor Author

ankane commented Apr 30, 2021

Great, removing the parentheses fixed it (have a feeling that may bite some users - not sure if there's a way to have it error or warn in that scenario).

Last outstanding issue is Torch.rb - still digging into it.

@ankane
Copy link
Contributor Author

ankane commented Apr 30, 2021

It looks like an issue with Ruby 2.6 and Ubuntu 16.04 (seeing the same issue finding rice.hpp with DataSketches and that combination).

Edit: On a related note, it'd probably be good to raise an error if have_header or have_library fails here: https://github.com/jasonroelofs/rice/blob/c91fce42ad1282a724cc6070de352ff522a847bc/lib/mkmf-rice.rb#L119-L125

@cfis
Copy link
Collaborator

cfis commented Apr 30, 2021

I can see about raising an error with the comma operator. And yeah, an error on the missing header is a good idea.

Thanks for digging into Ubuntu 16.

@ankane
Copy link
Contributor Author

ankane commented Apr 30, 2021

Looks like that's the behavior when the compiler doesn't support C++17. Can probably just mention that in the error message if find_header('rice/rice.hpp') returns false.

@ankane
Copy link
Contributor Author

ankane commented Apr 30, 2021

Also, a few of the examples in the conversion docs need public:.

@cfis
Copy link
Collaborator

cfis commented May 1, 2021

Ok, made those changes.

Anything else? Is Tomoto still not working?

@ankane
Copy link
Contributor Author

ankane commented May 1, 2021

Great, thanks. Yeah, still having issues with Tomoto on Windows, but not sure it should hold up the release.

@cfis
Copy link
Collaborator

cfis commented May 2, 2021

Tomoto crashes on windows have nothing to do with Rice. They are caused by Ruby because in, for reason I don't understand, replaces c library functions with its own. In this case, flcose get mapped to rb_w32_fclose.

https://twitter.com/lefticus/status/1247885279639166979

If you want tomoto to work on Windows, you can't link in the tomotopy source code like this. Instead you need to build it into its own shared library (dll) and then use that dll at runtime. Maybe building a shared lib (.lib) file would work, like faiss does, but I guess it would not.

Feel free to complain to the ruby developers, this way of doing things really doesn't make sense (at least to me).

@ankane
Copy link
Contributor Author

ankane commented May 2, 2021

Is the reason it works with Rice 3 that Rice builds a separate library?

@jasonroelofs
Copy link
Collaborator

That is a weird issue, and I was able to track down more details of lefticus' statement: NREL/OpenStudio#3942

I would hazard a guess that Rice 3 works and hasn't shown this bug is because Rice 3 for Windows was mainly processed via mingw, which probably doesn't trigger the same linking fix-up as MSVC. But that's a pure guess at this point.

@cfis
Copy link
Collaborator

cfis commented May 3, 2021

Yes, that's a good explanation by lefticus (NREL/OpenStudio#3942 (comment))

It is a Windows thing, not an msvc vs mingw thing. Note the tomoto CI runs are segfaulting on mingw, and I see the same thing with MSVC. I haven't done an exhaustive search, but here is another link:

https://bugs.ruby-lang.org/issues/8569

Not sure why Rice 3 doesn't trigger it - I'd agree its likely due to building a separate library. Would have to debug the code to verify.

Anyway, I consider this a Ruby bug not a Rice bug.

@jasonroelofs
Copy link
Collaborator

I'm prepping the release of Rice 4, so it's probably time to close this issue out. @ankane thank you so much for all of your help testing these changes!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants