Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Argument validation and documentation of exceptions in _from_table in _tagged_table.py #333

Closed
zzril opened this issue May 29, 2023 · 4 comments · Fixed by #334
Closed
Assignees
Labels
cleanup 🧹 Refactorings and other tasks that improve the code question Further information is requested released Included in a release

Comments

@zzril
Copy link
Contributor

zzril commented May 29, 2023

The static _from_table method in _tagged_table.py partly validates its arguments like this:

        # If no feature names are specified, use all columns except the target column
        if feature_names is None:
            feature_names = table.column_names
            if target_name in feature_names:
                feature_names.remove(target_name)

        # Validate inputs
        if target_name in feature_names:
            raise ValueError(f"Column '{target_name}' cannot be both feature and target.")
        if len(feature_names) == 0:
            raise ValueError("At least one feature column must be specified.")

One obvious case of "bad input" is not recognized here: target_name may not match any of table's columns at all.
In this case, an UnknownColumnNameError would be silently thrown a few lines later:

        result._target = result.get_column(target_name)

This possible exception is also not documented in the docstring. (It is, however, tested for in one of the tests.)

Is there a specific reason to implement it like this?

Possible solution

Test for target_name not in table.column_names at the beginning of the function and throw the appropriate exception when the condition is met.
Mention the exception in the docstring.

Can then also drop the unneccessary if target_name in feature_names: in line 84.

@zzril zzril added question Further information is requested cleanup 🧹 Refactorings and other tasks that improve the code labels May 29, 2023
@lars-reimann
Copy link
Member

lars-reimann commented May 29, 2023

Can then also drop the unneccessary if target_name in feature_names: in line 84.

That is another error: The column exists in the table but is used both as a feature and as the target.

That said, it indeed makes sense to raise the UnknownColumnNameError at the beginning of the _from_table to produce a shorter stacktrace.

@zzril
Copy link
Contributor Author

zzril commented May 29, 2023

Can then also drop the unneccessary if target_name in feature_names: in line 84.

That is another error: The column exists in the table but is used both as a feature and as the target.

I was talking about the second if here:

        # If no feature names are specified, use all columns except the target column
        if feature_names is None:
            feature_names = table.column_names
            if target_name in feature_names:
                feature_names.remove(target_name)

Once we have made sure that target_name is contained in table.column_names, the second if will always be true.

@lars-reimann
Copy link
Member

Can then also drop the unneccessary if target_name in feature_names: in line 84.

That is another error: The column exists in the table but is used both as a feature and as the target.

I was talking about the second if here:

        # If no feature names are specified, use all columns except the target column
        if feature_names is None:
            feature_names = table.column_names
            if target_name in feature_names:
                feature_names.remove(target_name)

Once we have made sure that target_name is contained in table.column_names, the second if will always be true.

I get your point now.

@zzril zzril closed this as completed in #334 Jun 9, 2023
zzril added a commit that referenced this issue Jun 9, 2023
…_table` (#334)

Closes #333.

### Summary of Changes

Explicitly throw `UnknownColumnNameError` in `TaggedTable._from_table`
for a more readable stacktrace.
Document when this exception is thrown.

Co-authored-by: Alexander <47296670+Marsmaennchen221@users.noreply.github.com>
lars-reimann pushed a commit that referenced this issue Jun 30, 2023
## [0.14.0](v0.13.0...v0.14.0) (2023-06-30)

### Features

* 290 properties for width-height of image ([#359](#359)) ([d9ebdc1](d9ebdc1)), closes [#290](#290)
* Add `find_edges` method to `Image` class ([#383](#383)) ([d14b6ce](d14b6ce)), closes [#288](#288)
* Add `StandardScaler` transformer ([#316](#316)) ([57b0572](57b0572)), closes [#142](#142)
* Add docstrings to the getter methods for hyperparameters in Regression and Classification models ([#371](#371)) ([9073f04](9073f04)), closes [#313](#313)
* Added `Table.group_by` to group a table by a given key ([#343](#343)) ([afb98be](afb98be)), closes [#160](#160)
* Added and improved errors and warnings in the table transformers ([#372](#372)) ([544e307](544e307)), closes [#152](#152)
* added crop() method in image and tests ([#365](#365)) ([eba8163](eba8163))
* added invert_colors method ([#367](#367)) ([1e4d110](1e4d110))
* adjust brightness and contrast of image ([#368](#368)) ([1752feb](1752feb)), closes [#289](#289) [#291](#291)
* blur Image method ([#363](#363)) ([c642176](c642176))
* check that methods of table can handle an empty table ([#314](#314)) ([686c2e7](686c2e7)), closes [#123](#123)
* convert image to grayscale ([#366](#366)) ([1312fe7](1312fe7)), closes [#287](#287)
* enhance `replace_column` to accept a list of new columns ([#312](#312)) ([d50c5b5](d50c5b5)), closes [#301](#301)
* Explicitly throw `UnknownColumnNameError` in `TaggedTable._from_table` ([#334](#334)) ([498999f](498999f)), closes [#333](#333)
* flip images / eq method for image ([#360](#360)) ([54f4ae1](54f4ae1)), closes [#280](#280)
* improve `table.summary`. Catch `ValueError` thrown by `column.stability` ([#390](#390)) ([dbbe0e3](dbbe0e3)), closes [#320](#320)
* improve error handling of `column.stability` when given a column that contains only None ([#388](#388)) ([1da2499](1da2499)), closes [#319](#319)
* Improve Error Handling of classifiers and regressors ([#355](#355)) ([66f5f64](66f5f64)), closes [#153](#153)
* Resize image ([#354](#354)) ([3a971ca](3a971ca)), closes [#283](#283)
* rotate_left and rotate_right added to Image ([#361](#361)) ([c877530](c877530)), closes [#281](#281)
* set kernel of support vector machine ([#350](#350)) ([1326f40](1326f40)), closes [#172](#172)
* sharpen image ([#364](#364)) ([3444700](3444700)), closes [#286](#286)

### Bug Fixes

* Keeping no columns with Table.keep_only_columns results in an empty Table with a row count above 0 ([#386](#386)) ([15dab06](15dab06)), closes [#318](#318)
* remove default value of `positive_class` parameter of classifier metrics ([#382](#382)) ([58fc09e](58fc09e))
* remove default value of `radius` parameter of `blur` ([#378](#378)) ([7f07f29](7f07f29))
@lars-reimann
Copy link
Member

🎉 This issue has been resolved in version 0.14.0 🎉

The release is available on:

Your semantic-release bot 📦🚀

@lars-reimann lars-reimann added the released Included in a release label Jun 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cleanup 🧹 Refactorings and other tasks that improve the code question Further information is requested released Included in a release
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

2 participants