Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libcudf needs a code style guide #479

Closed
jrhemstad opened this issue Dec 8, 2018 · 17 comments
Closed

libcudf needs a code style guide #479

jrhemstad opened this issue Dec 8, 2018 · 17 comments
Assignees
Labels
feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. proposal Change current process or code

Comments

@jrhemstad
Copy link
Contributor

jrhemstad commented Dec 8, 2018

Feature request

In order to provide a high quality and consistent code base, there should be a code style guide created and enforced.

It is non-trivial to create a quality code style guide from scratch, therefore I suggest we should use the Google C++ style guide as a starting point and modify as necessary (i.e., move from C++11 to C++14 and add rules for CUDA code).

https://google.github.io/styleguide/cppguide.html

The benefit of the Google C++ style guide is that is already well established and used throughout industry and other open source projects.

Furthermore, there is already widespread linter support for ensuring conformance with the style guide, e.g., cpplint and clang-tidy.

Likewise, the ISO CPP Core Guidelines would be an equally good starting point for a libcudf code style guide: https://github.com/isocpp/CppCoreGuidelines

It also has the advantage of being natively supported by clang-tidy.

@jrhemstad jrhemstad added feature request New feature or request help wanted proposal Change current process or code labels Dec 8, 2018
@jrhemstad
Copy link
Contributor Author

jrhemstad commented Dec 8, 2018

Curious about opinions from: @mt-jones @harrism @eyalroz @williamBlazing @BradReesWork

@BradReesWork
Copy link
Member

We also need to discuss whether each sub-module a Class, Name space, or just individual functions (current mode)

@nsakharnykh
Copy link
Collaborator

Here is another popular styling guide https://github.com/isocpp/CppCoreGuidelines

@jrhemstad
Copy link
Contributor Author

Here is another popular styling guide https://github.com/isocpp/CppCoreGuidelines

Indeed, I would be equally happy using the ISO CPP guidelines. It has the same benefit of already being supported by clang-tidy.

@eyalroz
Copy link
Collaborator

eyalroz commented Dec 10, 2018

So, several things:

  1. I support having a coding standard in (the C++ part of) cuDF.
  2. I don't think a coding standard should start out very extensive. Let's add aspects/subject to the standard one at a time, or several at a time, but not all at once; and I'm not sure it should ever get to be very extensive or very long.
  3. The C++ Core guidelines are not quite a coding standard:
    • They don't standardize the things you see most often in coding standards, e.g. naming schemes, use of uppercase/lowercase, and such.
    • They are a huge document: over 116,000 words, a decent novel length.
    • Most of the guidelines express the intent of the language and library designers and are meant to help us avoid pitfalls more than to code uniformly.
      with that being said, I would be supportive of adopting the core guidelines. However note that our code is horribly and terribly breaking the guidelines, and rectifying that would require a deep and all-encompassing overhaul. It would not be just cosmetic or localized. Also, before committing to such support, any one of us should spend some time reading enough of the guidelines to be certain they know what they're signing up for.
  4. There should be an inclusive process of standardization, i.e. of adding requirements/rules to the coding standard, with an opportunity for people both to articulate positions and to express concerns/disapproval. I would tend to oppose a top-imposed coding standard - as well as adopting an entire externally-written style guide in one fell swoop (or at all).
    More specifically, we should allow for the conclusion that a proposed subject or aspect of programming does not, after all, have to be standardized strictly (or at all).
  5. We should bear in mind that several well-known coding standard document were written to accommodate specific circumstances of the organizations which adopted them, which often included inconvenient constraints we may not have; and most of them were written before C++11 was widely adopted.
    Specifically, We must absolutely not use the the Google C++ Style guide. See here and here.
  6. Our C-facing interface will breaking most reasonable C++ coding standard. Which suggests we should have an underlying purely-C++'ish interface and C language bindings that are clearly separate.
  7. Once some coding style rule is set down, let's not start enforcing it immediately, but rather, begin with encouragement and later proceed to enforcement; and allow for justified exceptions.

Finally, here are a couple of relevant videos about guidelines and conventions:

@jrhemstad
Copy link
Contributor Author

jrhemstad commented Dec 10, 2018

@eyalroz thanks for the detailed feedback.

  1. Great!
  2. How would you recommend starting then? I offered the Google C++ guide or the ISO CPP guidelines as a starting point, not something to be adopted wholesale.
  3. Indeed, I do not suggest we wholesale adopt the ISO CPP guidelines, rather use it as a starting point. We would add, remove, and simplify as necessary for libcudf development.
  4. The purpose of this issue is precisely to facilitate the community discussion for a coding standard, so thank you for contributing! I do not expect that a style guide will be top-imposed without input from the community of developers working on libcudf.
  5. If there are parts of the Google C++ guide we wanted to remove/change, that is well within what I originally proposed. Both of your linked articles are several years old and reference items that we can easily remove or adapt. I think it is heavy handed to rule it out entirely.
  6. Obviously the C interface will have to be accommodated. Likewise, most CUDA kernel code would likely break many C++ style guides and will need to be accommodated. Like you said, a style guide is written to accommodate our specific circumstances and needs. Furthermore, as more of our interface moves to using Cython, the C interface may go away entirely.
  7. Yes, I suspect that we will adopt a policy of requiring new PRs to adhere to whatever standard we adopt. I'd push it further and require any PR to update whatever files it touches to bring them into conformance with the standard. In this way, the entire codebase should be brought into conformance over time.

A few key points come to mind:

  • Creating a good style guide completely from scratch is a lot of work. We need to start from somewhere.
  • An imperfect style guide is better than nothing at all
  • We need something we can use sooner than later. We don't have the time or resources to spend many weeks or months coming up with a style guide. That said, I expect the style guide will be something that evolves over the months of development.

A possible way forward that balances both community input and speed of delivery is to appoint 2-3 motivated individuals from across the active contributors to work together to lay the foundation of the libcudf style guide.

Once those individuals have come up with the 80% solution, we can open it up for community input.

@eyalroz
Copy link
Collaborator

eyalroz commented Dec 10, 2018

If there are parts of the Google C++ guide we wanted to remove/change, that is well within what I originally proposed.

No no, I'm absolutely against that. If there is something specific in there that you find useful, I would consider it, but certainly not defaulting to accepting that style guide. Just as an example: You would not have been able to write the type_dispatcher using their style guide, because it says (in its current version): "Code should not use C++14 or C++17 features". And this illustrates how the goals of that style guide are incompatible with ours.

Now, I'm not saying I reject all rules in there. Some I would adopt, some I would be neutral about, some I would reject. But again, not a default-adopt on that thing.

Also, I believe, but am not certain, that the Google style guide is at least to some extent in disagreement with the C++ Core Guidelines. Don't take my word for it though, I'd need to do more reading to check.

Likewise, most CUDA kernel code would likely break many C++ style guides and will need to be accommodated.

I'm not at all sure this would be the case. Are you saying this because of the lack of C++14 in CUDA 9.x? Or the partial usability of the standard library? I think we could manage not to have to accommodate much, if at all.

Creating a good style guide completely from scratch is a lot of work.

  1. Only if we aim to create a comprehensive, complete, guide. But why should we do that?
  2. It will be a lot of work to debate the dozens or hundreds rules in an existing style guide.

We need to start from somewhere.

We start from the here and now, which is code of varying quality and little consistency, and no rules. I disagree that the first step should be adopting a complete rule-set.

Here's how I imagine this proceeding:

  1. Collect the cudf's contributors complaints/rants regarding coding style issues which bother them - aesthetically and functionally.
  2. (Fairly) group and form that input into issues/points to be standardized. Publish these.
  3. Allow contributors (individuals or groups) to phrase specific rules for each of the issues; "do not standardize" is also an option.
  4. Allow rebuttals of proposed rules.
  5. Perform a good-faith ordering of the issues for discussion
  6. Conduct a discussion on Slack - possibly even with a voice chat - where proponents of rules argue for them, and opponents argue against.
  7. Adopt rules on which a consensus (up to abstainers) is reached. This step assumes good faith, self-discipline, and respectfulness to be exhibited by the participants.

@thomcom
Copy link
Contributor

thomcom commented Dec 11, 2018

I am in favor of a minimal style guide that grows as it meets the needs of our specific team. As @BradReesWork mentions, there's a need to identify how sub modules should be scoped. Instead of adopting an entire style guide, perhaps we could start with a GH issue addressing that specific challenge?

The goal of a style guide is to reduce development time and cost. Exactly how much cost can be saved I think is super complicated. The cost of adopting Google's C++ guide and enforcing it is high - regular debates over whether or not the style is matched, time spent studying it, efforts put into backporting our existing code to match the style, and more.

@eyalroz's list above also has a fairly high cost - 7 steps that will take considerable time across the team to negotiate and finally settle down on. Style guides can also have a surprisingly amount of social cost as developers passionately argue. I prefer a slow approach to this challenge. :)

@eyalroz
Copy link
Collaborator

eyalroz commented Dec 12, 2018

@thomcom : The thing is, you want a bit of an involved process before preventing people from programming a certain way... but if we wanted something faster, we could have a shorter process for suggested rules with no explicit objections. Something like:

  • Allow contributors (individuals or groups) to phrase specific rules they suggest we adopt (e.g. on a Wiki page for "Style guide non-biding rule suggestions".
  • Any contributor can veto a suggested rule with a short objecting argument (or link to an objection), in which case the rule may be ignored (but not deleted from the page). Discussions for and against rules will not be held on the Wiki page itself (Slack or the Wiki page's discussion page, if it has one, can be used).
    *Any contributor can express support of a rule; this would only be an informal indication of its active public support. (But be aware that sometimes, it is more obscure, or less convenient rules that have the positive impact).
  • Contributors should be actively informed/reminded of the valid rules and the wiki page.

@vuule
Copy link
Contributor

vuule commented Apr 10, 2019

Here is the draft guide I wrote for cuIO. It's meant to be easily digestible and to highlight the most common issues. For more detailed rules and justifications, the Google style guide is available.

General structure

  • Use #pragma once or a #define guard in each header.
  • Use structs when the contained data can take any value (POD). Use classes when the data has an invariant, and enforce it using encapsulation.
  • Place variables in the narrowest scope possible, and initialize in the declaration. If the variable does not have a good default value, used empty braced initialization to initialize it to zeroes (works for basic types and aggregates).
  • Prefer using return values rather than output parameters. Use pointers for output and input/output parameters.
  • Function parameters should be ordered as follows: input, input/output, output.
  • Prefer short functions that perform a single task. Functions should be small enough to fit in the screen.
    Pass optional input parameters as a pointer to const. Required input parameters should be passed as a const reference, unless they have a primitive type. In that case, pass by value.

Use of language features

  • Prefer C++-style casts to C-style casts to make the cast more visible.
  • Use prefix form of the increment and decrement operators with iterators and other template objects.
  • Use const for variables whenever applicable. Use constexpr for compile-time constants.
  • Use the now-standardized sized integer types, such as uint32_t. Use int when the size is not relevant.
  • Avoid the use of macros to define constants and inline functions. Instead, declare constexpr (typed) variables. Leave the inlining to the compiler, if possible.
  • Use nullptr instead of NULL or 0. The nullptr literal has a pointer type, while NULL is an integer. This can lead to issues in templated code.
  • Use auto when the variable type is obvious from the context, or unimportant.
  • Avoid using goto. RAII can be used to implement a more robust "cleanup on exit" code.

Best practices

  • Code for readability and maintainability first, performance second.
  • Prefer range loops and standard algorithms to hand-crafted loops, where possible
  • Use RAII for resource management.
  • You can use exceptions, but use them only to signal really "exceptional" states, not for common errors.
  • Use the type system to make your APIs easier to use, and to detect errors during compilation, rather than at runtime. Grouping parameters into named structs and using aliases for scalar types makes the call site more readable and less error-prone.
  • Prefer explicit resource ownership. Smart pointers and standard containers can be used for this purpose, in most cases.

Naming/formatting

  • Use Google-style naming https://google.github.io/styleguide/cppguide.html#Naming
  • Use comments to explain why the code is doing what it does, not to explain how.
  • Prefer line comments to block comments. Placed the comment above the code it refers to (Doxygen comments are an exception).
  • Each line should be at most 120 characters long.

For everything that is not explicitly specified here, follow the Google C++ style guide. (https://google.github.io/styleguide/cppguide.html).

@eyalroz
Copy link
Collaborator

eyalroz commented Apr 10, 2019

A lot/most of these items are fine by me, but:

  1. I'm not sure we should just import a list of rules from another project without them having specific motivation in issues that have come up in libcudf. For example: "avoid using goto". Definitely a principle to live by, but - has anyone ever tried to use goto in libcudf? On the other hand, the coding-style issues most glaring to my eyes are not covered here. See also my comment from December 12th 2018.
  2. I again want to stress that the Google Style Guide is quite unacceptable as a basis or a default. It has numerous rules which are widely disparaged, such as avoiding forward declarations or avoiding significant work in constructors. And - a recent policy decision of ours, to use exceptions, contradicts one of those GSG dicta: GSG says not to use exceptions.

(I'm not commenting item-by-item here of course.)

Also, @vuule : Are you aware of the C++ Core Guidlines project?

@vuule
Copy link
Contributor

vuule commented Apr 10, 2019

has anyone ever tried to use goto in libcudf?

Yes. Luckily, it looks like we are now goto-free. If the consensus is that some of these rules are irrelevant, we can remove them. These are based on the code I've seen, and that is still a small part of cuDF.

As you pint out, the list is incomplete (e.g. doing work in constructors). We can always add more exception to the list. The current list is what I came up with in a short period, and a round of reviews would certainly improve it a lot.

I made the list to contain exceptions from GSG, rather than a full list of rules, because I expect that there would be (at least) an order of magnitude fewer exception than rules. Most exceptions would be related to exceptions, as you mentioned.

I am aware of the Core guidelines. I believe some of my items are inspired by those rules. I did not take them too much into account, as they are still incomplete and not that widely adopted. Also, I found that some of the rules there are hard to adhere to without using their https://github.com/Microsoft/GSL.

@eyalroz
Copy link
Collaborator

eyalroz commented Apr 11, 2019

@vuule : I wasn't suggesting we use the C++ Core Guidelines as a style guide, just as an inspiration.

I don't accept your premise: You're assuming we've silently accepted the GSG as a basis for libcudf's style, possibly with exceptions. But - we haven't. If you believe we need to accept the GSG, you need to argue for it as a whole and convince us we should accept it as a default. and I would be against that for three reasons:

  1. I don't support adopting a large guide with many rules many/most of us haven't read (same goes for the core guidelines).
  2. I believe that, at least for the time being, we should only adopt rules based on stylistic issues which have come up. (These rules could come from the GSG of course.)
  3. , The GSG is not widely accepted as a good choice but is contentious,, if not discredited.

@vuule
Copy link
Contributor

vuule commented Apr 11, 2019

I have written this guide primarily for cuIO, and we have a clear consensus that GSG is a good base style for us. If stakeholders of the libcudf as a whole are against using GSG, my proposal is not relevant in this discussion.

@eyalroz
Copy link
Collaborator

eyalroz commented Apr 11, 2019

@vuule, you're suggesting that a consensus would be needed to reject GSG as a basis for a style guide. It is actually the opposite - a style guide would require a consensus, or overwhelming majority support, to adopt.

I'm sorry for being somewhat argumentative here, but I'm quite upset with what's happening here, process-wise.

@harrism
Copy link
Member

harrism commented Apr 12, 2019

@eyalroz I don't think that's what @vuule is suggesting at all. cuIO is a team that is a major contributor to cuDF. @vuule simply said that his team has consensus.

Since the cuIO team was discussing style guides, I asked if they would propose something for cuDF. Nothing has been adopted yet.

@harrism harrism added the libcudf Affects libcudf (C++/CUDA) code. label Jun 16, 2020
@harrism
Copy link
Member

harrism commented Jun 16, 2020

Closing this. #5273 is more relevant today.

@harrism harrism closed this as completed Jun 16, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. proposal Change current process or code
Projects
None yet
Development

No branches or pull requests

7 participants