Stdlib Development Guidelines #16

lars-reimann · 2022-10-26T10:20:56Z

This document describes general guidelines for our user-friendly data science API. In the DO/DON'T examples below we either show client code to describe the code users should/shouldn't have to write, or library code to describe the code we, as library developers, need to write to achieve readable client code. We'll continuously update this document as we find new categories of usability issues.

Prefer named functions over overloaded operators

The names can better convey the intention of the programmer and enable better auto-completion.

✔️ DO (client code):

table.keep_columns("name", "age")

❌ DON'T (client code):

table[["name", "age"]]

Related issues:

lars-reimann/ast22.2-pipelines#20

Prefer methods over global functions

This aids discoverability and again enables better auto-completion. It also supports polymorphism.

✔️ DO (client code):

model.fit(training_data)

❌ DON'T (client code):

fit(model, training_data)

Prefer separate functions over functions with a flag parameter

Some flag parameters drastically alter the semantics of a function. This can lead to confusion, and, if the parameter is optional, to errors if the default value is kept unknowingly. In such cases having two separate functions is preferable.

✔️ DO (client code):

table.drop_columns("name")

❌ DON'T (client code):

table.drop("name", axis="columns")

Related issues:

lars-reimann/ast22.2-pipelines#18
lars-reimann/ast22.2-pipelines#22

Avoid uncommon abbreviations

Write full words rather than abbreviations. The increased verbosity is offset by better readability, better functioning auto-completion, and a reduced need to consult the documentation when writing code. Common abbreviations like CSV or HTML are fine though, since they rarely require explanation.

✔️ DO (client code):

figure.set_color_scheme(ColorScheme.AUTUMN)

❌ DON'T (client code):

figure.scs(CS.AUT)

Related issues:

lars-reimann/ast22.2-pipelines#10
lars-reimann/ast22.2-pipelines#21

Specify types of parameters and results

Use type hints to describe the types of parameters and results of functions. This enables static type checking of client code.

✔️ DO (library code):

def add_ints(a: int, b: int) -> int:
    return a + b

❌ DON'T (library code):

def add_ints(a, b):
    return a + b

Use narrow data types

Use data types that can accurately model the legal values of a declaration. This improves static detection of wrong client code.

✔️ DO (client code):

SupportVectorMachine(kernel=Kernel.LINEAR) # (enum)

❌ DON'T (client code):

SupportVectorMachine(kernel="linear") # (string)

Related issues:

lars-reimann/ast22.2-pipelines#71

Check preconditions of functions and fail early

Not all preconditions of functions can be described with type hints but must instead be checked at runtime. This should be done as early as possible, usually right at the top of the body of a function. If the preconditions fail, execution of the function should halt and either a sensible value be returned (if possible) or an exception with a descriptive message be raised.

✔️ DO (library code):

def nth_prime(n: int) -> int:
    if n <= 0:
        raise ValueError(f"n must be at least 1 but was {n}.")

    # compute nth prime

❌ DON'T (library code):

def nth_prime(n: int) -> int:
    # compute nth prime

Raise either Python exceptions or custom exceptions

The user should not have to deal with exceptions that are defined in the wrapper libraries. So, any exceptions that may be raised when a third-party function is called should be caught and a core Python exception or a custom exception should be raised instead. The exception to this rule is when we call a callable created by the user: In this case, we just pass any exceptions thrown by this callable along.

✔️ DO (library code):

def read_csv(path: str) -> Table:
    try:
        return pd.read_csv(path) # May raise a pd.ParserError
    except pd.ParserError as e:
        raise FileFormatException("The loaded file is not a CSV file.") from e

❌ DON'T (library code):

def read_csv(path: str) -> Table:
    return pd.read_csv(path) # May raise a pd.ParserError

Group API elements by task

Packages should correspond to a specific task like classification or imputation. This eases discovery and makes it easy to switch between different solutions for the same task.

✔️ DO (client code):

from sklearn.classification import SupportVectorMachine

❌ DON'T (client code):

from sklearn.svm import SupportVectorMachine

Group values that are used together into an object

Passing values that are commonly used together around separately is tedious, verbose, and error prone.

✔️ DO (client code):

training_data, validation_data = split(full_data)

❌ DON'T (client code):

training_feature_vectors, validation_feature_vectors, training_target_values, validation_target_values = split(feature_vectors, target_values)

Related issues:

lars-reimann/ast22.2-pipelines#19
lars-reimann/ast22.2-pipelines#31

Test non-trivial functions

If a function contains more code than just the getting or setting of a value, automated test should be added to the tests folder. The file structure in the tests folder should mirror the file structure of the implementation of the package.

Document API elements

All classes should have

a short description,
examples that show how to use them correctly,
a description of their attributes.

All functions should have

a short description,
examples that show how to use them correctly,
a description of their parameters,
a description of their results,
a description of any exceptions that are raised.

The documentation should follow the numpydoc format.

Prefer a usable API over simple implementation

It's more important to provide a user-friendly API to many people than to save some of our time when implementing the functionality.

The text was updated successfully, but these errors were encountered:

Marsmaennchen221 · 2022-10-28T08:33:35Z

Bei der Beschreibung für "Prefer named functions over overloaded operators" wird für table[["name", "age"]] vorgeschlagen, eine Funktion table.keep_columns("name", "age") einzuführen. Der Name keep_columns legt jedoch nahe, dass die aufgerufene Instanz des Datensatzes alle nicht angegebenen Spalten löscht und nur die angegebenen Spalten behält, während die ursprüngliche Methode die Datensatz Instanz unverändert lässt und nur die angegebenen Spalten als neue Instanz eines Datensatzes zurückgibt.
Ein besserer Name könnte get_columns oder return_only_columns (passend dazu für drop: return_all_columns_except)

lars-reimann · 2023-03-04T17:13:43Z

@lars-reimann Turn this into a documentation page.

Closes #16. ### Summary of Changes Add development guidelines to documentation. The original issue is no longer needed.

lars-reimann · 2023-03-24T08:25:55Z

🎉 This issue has been resolved in version 0.3.0 🎉

The release is available on:

v0.3.0
GitHub release

Your semantic-release bot 📦🚀

lars-reimann pinned this issue Oct 26, 2022

lars-reimann transferred this issue from another repository Nov 9, 2022

lars-reimann changed the title ~~Our Usability Guidelines~~ Our API Usability Guidelines Nov 9, 2022

lars-reimann pinned this issue Nov 9, 2022

lars-reimann changed the title ~~Our API Usability Guidelines~~ Stdlib Development Guidelines Nov 9, 2022

lars-reimann mentioned this issue Nov 11, 2022

feat: Create table from file: read_csv and read_json Safe-DS/DSL#164

Merged

Marvjowa unpinned this issue Jan 13, 2023

lars-reimann transferred this issue from Safe-DS/DSL Mar 4, 2023

lars-reimann added this to Library Mar 4, 2023

lars-reimann removed this from Library Mar 4, 2023

lars-reimann added this to Library Mar 4, 2023

github-project-automation bot moved this to Backlog in Library Mar 4, 2023

lars-reimann self-assigned this Mar 4, 2023

lars-reimann moved this from Backlog to Todo in Library Mar 4, 2023

lars-reimann mentioned this issue Mar 14, 2023

docs: add development guidelines #40

Merged

lars-reimann linked a pull request Mar 14, 2023 that will close this issue

docs: add development guidelines #40

Merged

lars-reimann closed this as completed in #40 Mar 14, 2023

lars-reimann added a commit that referenced this issue Mar 14, 2023

docs: add development guidelines (#40)

71dc669

Closes #16. ### Summary of Changes Add development guidelines to documentation. The original issue is no longer needed.

github-project-automation bot moved this from Todo to ✔️ Done in Library Mar 14, 2023

lars-reimann added the released Included in a release label Mar 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stdlib Development Guidelines #16

Stdlib Development Guidelines #16

lars-reimann commented Oct 26, 2022 •

edited

Loading

Marsmaennchen221 commented Oct 28, 2022

lars-reimann commented Mar 4, 2023

lars-reimann commented Mar 24, 2023

Stdlib Development Guidelines #16

Stdlib Development Guidelines #16

Comments

lars-reimann commented Oct 26, 2022 • edited Loading

Prefer named functions over overloaded operators

Prefer methods over global functions

Prefer separate functions over functions with a flag parameter

Avoid uncommon abbreviations

Specify types of parameters and results

Use narrow data types

Check preconditions of functions and fail early

Raise either Python exceptions or custom exceptions

Group API elements by task

Group values that are used together into an object

Test non-trivial functions

Document API elements

Prefer a usable API over simple implementation

Marsmaennchen221 commented Oct 28, 2022

lars-reimann commented Mar 4, 2023

lars-reimann commented Mar 24, 2023

lars-reimann commented Oct 26, 2022 •

edited

Loading