Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix SWIG methods that return char** #2850

Merged
merged 7 commits into from
Mar 20, 2020

Conversation

AlbertoEAF
Copy link
Contributor

@AlbertoEAF AlbertoEAF commented Mar 1, 2020

The main reason to add this pull request is outlined in #2814.

After discussing with @imatiach-msft and the devs at SWIG (swig/swig#1740),
the devised solution aiming at simplicity and correctness was to add a StringArray to wrap any char** parameters so memory can be managed.
This relies on the already existing various.i extensions to return a String[] through a single call.

@imatiach-msft also requested that LGBM_BoosterGetEvalNamesSWIG be refactored to use the same type of API. This meant changing that method signature, and I did so by following the LGBM_* signature of returning the error code as an int 0/-1.

I'm very interested in getting feedback from the team, specially @imatiach-msft with whom I've been discussing this topic. If possible I'd like to know your thoughts on these questions first: #2814.

I also believe this should be improved upon before merging. I left some "@Reviewer" notes throughout the code. If you can read those and discuss with me I'd appreciate it, as I believe those points should be addressed to get the code style more in line with the rest of the code in the repo, and to streamline the code as well.
Also, can we find a way to add the docstrings in the interface files to the documentation?

You can now get hold of String[] by just calling swigStringArrayHandle.data().Working example:

    public String[] getBoosterFeatureNames() {

        final int MAX_FEATURE_NAME_SIZE = 512; // Will segfault if a string has a name larger than this.

        long swigStringArrayHandleOutPtr = lightgbmlibJNI.voidpp_handle();
        long swigStringArrayHandle;

        try {
            int retcodeLGBM = lightgbmlibJNI.LGBM_BoosterGetFeatureNamesSWIG(
                    swig.swigBoosterHandle,
                    MAX_FEATURE_NAME_SIZE,
                    swigStringArrayHandleOutPtr // Returns a newly allocated FeatureNames handle
            );
            if (retcodeLGBM == -1) {
                logger.error("Couldn't read feature names");
                throw new LightGBMException();
            }
            swigStringArrayHandle = lightgbmlibJNI.voidpp_value(swigStringArrayHandleOutPtr); // Get the handle
        }
        finally {
            lightgbmlibJNI.delete_voidpp(swigStringArrayHandleOutPtr);
        }

        String[] featureNames = lightgbmlibJNI.StringArrayHandle_get_strings(swigStringArrayHandle);

        for(int i = 0; i < featureNames.length; ++i) {
            logger.debug("LightGBM model feature " + i + ": " + featureNames[i]);
        }
        logger.trace("Deallocating feature names.");
        lightgbmlibJNI.StringArrayHandle_free(swigStringArrayHandle);

        return featureNames;
    }

Thank you!

By the way, here is the changelog:

  • Add StringArray class to manage and manipulate arrays of fixed-length strings.

    This class is now used to wrap any char** parameters, manage memory and
    manipulate the strings.

  • Fix SWIG LGBM_BoosterGetFeatureNames call that would result in a segfault.

    Added wrapper LGBM_BoosterGetFeatureNamesSWIG (the method it wraps didn't work)

  • For consistency, LGBM_BoosterGetEvalNames was wrapped as well.

    [breaking api change]
    Re-factored wrapper LGBM_BoosterGetEvalNames to follow LGBM_*
    convention of 0/-1 in case of success/failure.

@msftclas
Copy link

msftclas commented Mar 1, 2020

CLA assistant check
All CLA requirements met.

@guolinke guolinke mentioned this pull request Mar 2, 2020
@microsoft microsoft deleted a comment from todo bot Mar 2, 2020
@microsoft microsoft deleted a comment from todo bot Mar 2, 2020
@microsoft microsoft deleted a comment from todo bot Mar 2, 2020
@microsoft microsoft deleted a comment from todo bot Mar 2, 2020
@microsoft microsoft deleted a comment from todo bot Mar 2, 2020
@microsoft microsoft deleted a comment from todo bot Mar 2, 2020
@guolinke guolinke added the fix label Mar 2, 2020
@StrikerRUS
Copy link
Collaborator

@AlbertoEAF Sorry for the inconvenience, but can you please rebase to the latest master to fix CI checks and sign the CLA?

@AlbertoEAF
Copy link
Contributor Author

Hello @StrikerRUS,

I'd very much like to, but I'm waiting for approval from my company to sign it and
I don't know yet when they'll decide, but when they do I'll let you know.

@AlbertoEAF
Copy link
Contributor Author

AlbertoEAF commented Mar 4, 2020

@StrikerRUS done :)

Regarding the code, StringArray.hpp is duplicated inside the interface file char_array_helpers.i (and it should be placed outside only). I'd like to get it in a separate file, maybe in src+include or as a separate .i file which I couldn't manage to do directly as I would have to modify the CMake procedure to add more .cpp files I guess?

Also please don't merge directly, and give me feedback first on the comments that say "@Reviewer".

Also, char_array_helpers.i should also likely be renamed to StringArray_api_extensions.i or something.

@AlbertoEAF AlbertoEAF force-pushed the feat/swig-string-arrays branch 2 times, most recently from 8954897 to cd510d2 Compare March 4, 2020 23:10
@AlbertoEAF
Copy link
Contributor Author

AlbertoEAF commented Mar 8, 2020

Hello, I changed the API to be simpler to use and removed the choice of string allocation size from the user, to be impossible to result in an irrecoverable segmentation fault in case the user decides to allocate too small strings when asking for the feature names.

I changed the 2 wrappers to be closer to what @imatiach-msft had initially for LGBM_BoosterGetEvalNames, by using the return value to return the array of strings, but wrapped in a StringArrayHandle, which can then be manipulated and deallocated on the java side by using StringArrayHandle_free().

The new usage is much simpler now than it was on the first submission (nullptr is returned in case of any error). Example:

public String[] getBoosterFeatureNames() {

    long swigStringArrayHandle = lightgbmlibJNI.LGBM_BoosterGetFeatureNamesSWIG2(swig.swigBoosterHandle);
    if (swigStringArrayHandle == 0) {
        logger.error("Couldn't read feature names");
        throw new LightGBMException();
    }

    String[] featureNames = lightgbmlibJNI.StringArrayHandle_get_strings(swigStringArrayHandle);
    logger.debug("LightGBM model features: {}.", Arrays.toString(featureNames));

    logger.trace("Deallocating feature names.");
    lightgbmlibJNI.StringArrayHandle_free(swigStringArrayHandle);

    return featureNames;
}

Could someone tell me how to include the swig/StringArray.hpp file and remove its copy-pasted code from swig/StringArray_API_extensions.i?

Also, can someone review? All inputs are appreciated :)

Thanks!

@AlbertoEAF AlbertoEAF changed the title Fix SWIG methods that return char**. Pending review: Fix SWIG methods that return char**. Mar 8, 2020
@imatiach-msft
Copy link
Contributor

"Could someone tell me how to include the swig/StringArray.hpp file and remove its copy-pasted code from swig/StringArray_API_extensions.i?"

I would think that in swig/lightgbmlib.i you would be able to include StringArray.hpp using just:

 #include "./StringArray.hpp" 

similar to how the export.h file is included in swig/lightgbmlib.i:

#include "../include/LightGBM/export.h"

please let me know if I misunderstood something!

@AlbertoEAF
Copy link
Contributor Author

AlbertoEAF commented Mar 14, 2020

Hello @guolinke, @imatiach-msft and all to whom this may concern,

Changelog

  1. The first commit strived to avoid impacting any other API's and bring all needed functionality for char** to Java's SWIG wrappers and defines their final form (discussed with @imatiach-msft).
  2. To follow @guolinke's request, I had to change the internals of the C API (and had to change accordingly all impacted wrappers in Java, R and Python) so correctness is mantained across the whole codebase.

C API changes (safety)
The changed C API functions no longer result in undefined behaviour if the received storage space is not enough both in number of strings and their size. The changed functions:

  • receive both the number of strings allocated and their allocated size,
  • output the actual number of required strings and string sizes required,
  • no longer need any other previous function calls to figure out number and size of strings,
  • if the received storage space they receive is insufficient they copy all possible content that fits the received memory (both in number of strings and their capacity, handling proper null-termination in all cases, even when strings are too short).

New API usage

Since this functions are not in hot-code regions, in the Java SWIG wrappers I chose simply to pass a null pointer and 0-sizes for both number of strings and their size, just to allocate memory accordingly and then repeat the call. I could have instead allocated memory with an assumed size, check and if needed repeat the call with enough storage space.
Note that with the new API one doesn't need any other methods, by two consecutive calls it is possible to compute allocation space needed and then retrieve the data. Example from the SWIG wrappers:

int num_features;
size_t max_feature_name_size;
std::unique_ptr<StringArray> strings(nullptr);

// Retrieve required allocation space:
API_OK_OR_NULL(LGBM_BoosterGetFeatureNames(handle,
                                            0, &num_features,
                                            0, &max_feature_name_size,
                                            nullptr));

strings.reset(new StringArray(num_features, max_feature_name_size));

API_OK_OR_NULL(LGBM_BoosterGetFeatureNames(handle,
                                            num_features, &num_features,
                                            max_feature_name_size, &max_feature_name_size,
                                            strings->data()));

return strings.release();

R and Python wrappers changes

  • To minimize code changes, although the new SWIG wrappers support allocating the correct strings size, I didn't add support for resizing in the R and Python wrappers for they already assumed a fixed size of 128 and 255 characters respectively.
  • Added assert for the string size (which couldn't be computed before).
  • Instead of the assert, a warning could be issued instead, for with the new C API implementation, even if you don't allocate enough size in the strings, the strings will have the first N-1 characters initialized and the last character set with the null terminator instead of crashing.

If you want, a warning instead of an error could be triggered instead, or proper sizing of the content like in the SWIG wrappers without assumptions.

Copy link
Collaborator

@jameslamb jameslamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The R-specific changes look ok to me! I looked through the rest of the PR and didn't have any other comments, but I don't know enough about c_api or the SWIG stuff for my approval to count towards a merge on this.

Thanks for the contribution!

src/lightgbm_R.cpp Outdated Show resolved Hide resolved
Copy link
Collaborator

@StrikerRUS StrikerRUS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for Python part!

@AlbertoEAF
Copy link
Contributor Author

Hello @guolinke and @StrikerRUS :),

Do you know if anyone else needs to approve? I'm not sure how the procedure goes now.
Should all the other 4 code owners review as well? Or is it good?

Thanks!

Copy link
Collaborator

@StrikerRUS StrikerRUS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AlbertoEAF
Please consider fixing some style and linting issues below:

include/LightGBM/c_api.h Outdated Show resolved Hide resolved
include/LightGBM/c_api.h Outdated Show resolved Hide resolved
include/LightGBM/c_api.h Outdated Show resolved Hide resolved
include/LightGBM/c_api.h Outdated Show resolved Hide resolved
src/c_api.cpp Outdated Show resolved Hide resolved
swig/StringArray.i Show resolved Hide resolved
swig/StringArray_API_extensions.i Show resolved Hide resolved
src/lightgbm_R.cpp Outdated Show resolved Hide resolved
src/c_api.cpp Outdated Show resolved Hide resolved
src/c_api.cpp Outdated Show resolved Hide resolved
Co-Authored-By: Nikita Titov <nekit94-08@mail.ru>
@AlbertoEAF
Copy link
Contributor Author

Ok @StrikerRUS :)

Anything left? Thanks!

Copy link
Collaborator

@StrikerRUS StrikerRUS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AlbertoEAF Thank you very much! We really appreciate all your efforts!

Only one new cpplint issue has raised after my previous review.

Approving this PR right now to merge it ASAP after resolving and green CI.

python-package/lightgbm/basic.py Outdated Show resolved Hide resolved
src/lightgbm_R.cpp Outdated Show resolved Hide resolved
AlbertoEAF and others added 2 commits March 19, 2020 15:58
Co-Authored-By: Nikita Titov <nekit94-08@mail.ru>
Co-Authored-By: Nikita Titov <nekit94-08@mail.ru>
@StrikerRUS StrikerRUS changed the title Pending review: Fix SWIG methods that return char**. Fix SWIG methods that return char** Mar 20, 2020
@StrikerRUS StrikerRUS merged commit 91185c3 into microsoft:master Mar 20, 2020
@AlbertoEAF AlbertoEAF deleted the feat/swig-string-arrays branch March 20, 2020 01:43
@StrikerRUS StrikerRUS added breaking and removed fix labels Mar 20, 2020
@imatiach-msft
Copy link
Contributor

imatiach-msft commented Mar 30, 2020

@AlbertoEAF I tried to update to latest code today but I am getting a segfault when calling the new LGBM_BoosterGetEvalNamesSWIG API, any idea what might be going wrong? Will take a deeper look over this week.

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007f51b93942fd, pid=13026, tid=0x00007f51c0f61700
#
# JRE version: OpenJDK Runtime Environment (8.0_144-b01) (build 1.8.0_144-b01)
# Java VM: OpenJDK 64-Bit Server VM (25.144-b01 mixed mode linux-amd64 compressed oops)
# Problematic frame:
# C  [lib_lightgbm_swig.so+0x82fd]  LGBM_BoosterGetEvalNamesSWIG(void*)+0x2d

This is the new scala code I am using, but it doesn't seem to get past just the LGBM_BoosterGetEvalNamesSWIG call:

  def getEvalNames(boosterPtr: Option[SWIGTYPE_p_void]): Array[String] = {
    // Need to keep track of best scores for each metric, see callback.py in lightgbm for reference
    // For debugging, can get metric names
    val stringArrayHandle = lightgbmlib.LGBM_BoosterGetEvalNamesSWIG(boosterPtr.get)
    // LightGBMUtils.validateArray(stringArrayHandle, "Booster Get Eval Names")
    val evalNames = lightgbmlib.StringArrayHandle_get_strings(stringArrayHandle)
    lightgbmlib.StringArrayHandle_free(stringArrayHandle)
    evalNames
  }

@AlbertoEAF
Copy link
Contributor Author

Hello @imatiach-msft , just to be sure, what you did in #2958 fixes it right?

It makes sense to me that it would.

@imatiach-msft
Copy link
Contributor

@AlbertoEAF yes, thanks!

std::memcpy(out_strs[idx], name.c_str(), name.size() + 1);
if (idx < len) {
std::memcpy(out_strs[idx], name.c_str(), std::min(name.size() + 1, buffer_len));
out_strs[idx][buffer_len - 1] = '\0';
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here sometimes will have memory problems, maybe it is better for

          auto cur_len = std::min(name.size(), buffer_len - 1);
          std::memcpy(out_strs[idx], name.c_str(), cur_len);
          out_strs[idx][cur_len] = '\0';

btw, is idx < len needed?

ping @AlbertoEAF @StrikerRUS

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, maybe it is also a cause of #3398?..

Copy link
Contributor Author

@AlbertoEAF AlbertoEAF Sep 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the change, it does the same but it's easier to read @guolinke and sets the null byte only once when the string has the limit size.

Yes, the idx < len check is to ensure if you have allocated a char** that only has space for len string pointers, you don't write outside it, even if internally we have a bigger array that would require more space. We stop the copy before writing outside allocated memory to avoid segmentation faults.

Copy link
Contributor Author

@AlbertoEAF AlbertoEAF Sep 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P.S.: In other words, segfaults/memory write violations can only occur if the function receives wrong len and buffer_len arguments from the caller. I.e., the user/caller actually pre-allocated smaller/less "strings" in char **out_strs than that the values he passed in buffer_len/len respectively.

@github-actions
Copy link

This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 24, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants