Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Autofix for RET504 UnnecessaryAssign #2599

Closed
wants to merge 10 commits into from
Closed

Autofix for RET504 UnnecessaryAssign #2599

wants to merge 10 commits into from

Conversation

Sawbez
Copy link
Contributor

@Sawbez Sawbez commented Feb 6, 2023

Works on #2263

The only problem that I'm not too sure how to work around is how to add multiple fixes to a single violation. Additionally, I would like to remove the usage of .clone on the Located<ExprKind>, but I cannot seem to figure out how.

@Sawbez
Copy link
Contributor Author

Sawbez commented Feb 6, 2023

For more context, currently my fix only changes the return, and doesn't remove the variable assignment (and thus I haven't changed the snap tests). E.x:

def f():
    var = 2
    return var

becomes

def f():
    var = 2
    return 2

));
}
}
checker.diagnostics.push(diagnostic);
Copy link
Member

@charliermarsh charliermarsh Feb 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the framing here should be that we want to replace the Assign statement and the Return statement with the return ${value}.

Something like:

diagnostic.amend(Fix::replacement(
    unparse_expr(assign_expr, checker.stylist),
    assign_stmt.location,
    return_stmt.end_location.unwrap(),
));

Copy link
Contributor Author

@Sawbez Sawbez Feb 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only issue with this is that it won't work for something that is multi-line.
For example:

def f(x):
    val = 2
    print(x)
    return val

Would fail and produce:

def f(x):
    return 2

Unless we are tracking everything between those two with source_slicer or source_locator (something along those lines, I've seen it in other checkers). Additionally, do we want to leave behind empty space? The start of the expression is after the indentation, according to the AST.

Copy link
Contributor Author

@Sawbez Sawbez Feb 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another thing I'm slightly confused on is this code I'm trying:

if let Some(assign_loc) = stack.assigns.get(id.as_str()){
    println!("return {:?}", assign_loc.last().unwrap());
    diagnostic.amend(Fix::replacement(
        unparse_expr(assign_expr, checker.stylist),
        *assign_loc.last().unwrap(),
        expr.end_location.unwrap(),
    ));
}

This code causes ruff to hang, and the prints show why:

return Location { row: 5, column: 4 }
return Location { row: 12, column: 4 }
return Location { row: 20, column: 4 }
return Location { row: 19, column: 4 }
return Location { row: 21, column: 4 }
return Location { row: 25, column: 4 }
return Location { row: 33, column: 4 }
... (this pattern repeats for a LOT of lines)

any clue why this could be happening?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for now we limit this to the version in which the assignment and the return are consecutive statements, if we can.

If we want to support fixes in which there are interleaving statements, we have to do something clever like structure the fix as a replacement with "all the content before the statement, followed by the new return" (rather than two separate fixes).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I can help debug - what file / command are you running to get that output?)

Copy link
Contributor Author

@Sawbez Sawbez Feb 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I think you're right -- I should have this finished by tomorrow. My only other thought is how to work with the Assign statement being on another line, how would you ensure that there is just the assignment + whitespace/comment on the next few lines, would you have loop through the source code directly and check if each line is whitespace/comment or is there a better way?

What I'm talking about is something like this:

def safe_average(lst):
    """There is a extra newline and extra indentation after the assignment"""
    return_value = 2.4
    # This is also a comment explaining why we need to check for comments, and there is indentation on the next line too!
    
    
    """
    Also, the next comment has *critical* information, but would the autofix remove it?
    I presume we would have to keep track of this, or just ignore this for now and 
    put it in an issue somewhere.
    """
    # We return this function in this manner because if the length was zero we would raise a ZeroDivisionError
    return sum(lst) / len(lst) if len(lst) > 0 else 0 

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So here's what I'd try to do (though it'll require more bookkeeping than we do now)...

When we track the assignment, track the start and end locations of the entire assignment statement.

Then, when we create the fix, structure it as: "Replace all the content from the start of the assignment with: (1) the content from the end of the assignment to the start of the return + (2) the modified return statement."

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also have the ability to skip the fix if there's a comment between the assign and the return -- we have a has_comments_in utility that takes a range as input, and returns true if there are any comments. So we could do that instead.

Or we could even really restrictive and say that we're only going to do this fix if it's purely whitespace between the assignment and the return. That's actually pretty reasonable IMO!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll try for the first option, if it seems a bit too difficult (shouldn't be, but will require a bit more restructuring and there's also a whitespace issue, but Black would fix that... not sure what the policy is for that) it's a bit late for me right now so I'll get to it by tomorrow. Sorry if I've asked too many questions or been a hassle, but thanks for helping me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No apology needed at all. I wish I could be even more helpful! Just trying to juggle a few things at once. Appreciate your effort on the issue :)

@@ -15,6 +15,7 @@ pub struct Stack<'a> {
pub assigns: FxHashMap<&'a str, Vec<Location>>,
pub loops: Vec<(Location, Location)>,
pub tries: Vec<(Location, Location)>,
pub assign_values: FxHashMap<&'a str, rustpython_ast::Located<ExprKind>>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a tip, we never need to use Located<T>. Instead, you can do Expr (which is equivalent to Located<ExprKind>).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the tip!

@Sawbez
Copy link
Contributor Author

Sawbez commented Feb 9, 2023

This is nearly finished - all it needs is to dedent the output if it isn't already. @charliermarsh How would you recommend going about this?

@Sawbez Sawbez requested a review from charliermarsh February 10, 2023 00:54
@Sawbez
Copy link
Contributor Author

Sawbez commented Feb 10, 2023

The code is working but it does have a false positive that changes the function of the code. I'm unsure how you would work around that -- the false positive originates from the check itself, and the autofix runs under the assumption that the check isn't a false positive.

We could patch the check, as it appears the problem in question isn't really refactor-able (thus why it's a false-positive in my opinion) unless you use a ternary statement, which is not the focus of this check. On the linked issue, it appears this was actually patched (no longer raises RET504) on the original plugin.

@charliermarsh
Copy link
Member

On the linked issue, it appears this was actually patched

Where can I find the linked issue?

@Sawbez
Copy link
Contributor Author

Sawbez commented Feb 10, 2023

On the linked issue, it appears this was actually patched

Where can I find the linked issue?

@charliermarsh It’s linked in the test

# Can be refactored false positives
# https://github.com/afonasev/flake8-return/issues/47#issuecomment-1122571066
def get_bar_if_exists(obj):
    result = ""
    if hasattr(obj, "bar"):
        result = str(obj.bar)
    return result

@charliermarsh
Copy link
Member

Hmm, I think those issues still exist upstream -- I see them when I run flake8-return locally, and we have that same patch in our own code.

@charliermarsh
Copy link
Member

Spent a little time trying to fix that issue but it's really tough without more rigorous branch analysis.

@charliermarsh
Copy link
Member

I think we should try to be really conservative here, and only fix if they're consecutive statements -- otherwise, we risk breaking code, like above. What do you think?

@Sawbez
Copy link
Contributor Author

Sawbez commented Feb 10, 2023

@charliermarsh I think there are pretty much three valid options here:

  1. Only fix if there are strictly comments.
  2. Only fix if there are no if statements, reassignments, or other complicated constructs.
  3. Only fix if there is nothing between

I can probably implement any of these. Personally, I think the strictly comments (1) is probably the best in-between that will not cause breaking fixes and still fix most of the cases that RET504 was directly supposed to catch. If you would rather opt for one of the other options I can do so.

@charliermarsh
Copy link
Member

I think (1) makes sense. Do you wanna give that a try?

@Sawbez
Copy link
Contributor Author

Sawbez commented Feb 10, 2023

Yep. I’ll do it

@Sawbez
Copy link
Contributor Author

Sawbez commented Feb 11, 2023

This is very close to done. All I need to do is add one more check before going through the autofix, it will also probably fail CI because it's in the middle of being completed.

@Sawbez
Copy link
Contributor Author

Sawbez commented Feb 12, 2023

@charliermarsh This is finished. You can take a look - the implementation isn't pretty but if you have any better ideas they would be well appreciated.

@charliermarsh
Copy link
Member

@Sawbez - Awesome, thank you for all the work! I likely won't get to this tonight, but will review as soon as I can.

@Sawbez Sawbez deleted the branch astral-sh:main February 13, 2023 00:08
@Sawbez Sawbez closed this Feb 13, 2023
@Sawbez Sawbez deleted the main branch February 13, 2023 00:08
@Sawbez Sawbez restored the main branch February 13, 2023 00:08
@Sawbez Sawbez reopened this Feb 13, 2023
@Sawbez
Copy link
Contributor Author

Sawbez commented Feb 13, 2023

Whoops- ignore that

{
let in_between_source = checker
.locator
.slice_source_code_range(&Range::new(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some reason I'm having trouble pushing edits but these need to change to .slice(...) now instead of . slice_source_code_range(...).

expr.location,
))
.to_owned();
// Ensure that all indentation is identical (no if or while statement).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need a more rigorous way to detect this, if I'm interpreting the comment correctly. Indentation isn't a totally "safe" way to detect whether we have control flow like if or while etc., since users could do if x: y on a single line.

What's this logic guarding against? Maybe we can track this condition in some other way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I see what you mean with the one-liner. The main problem is that there's no Dedent token during the code that ensures it's comments.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might have to be checking the line itself too (e.g. slicing the line and checking if there is any tokens there)

if let Some(assign_expr) = stack.assign_values.get(id.as_str()) {
if let Some(assign_locations) = stack.assigns.get(id.as_str()) {
let expr_loc = expr.end_location.unwrap();
if let Some(last_assign_loc) = assign_locations
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When would this be None?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you return an undefined variable (a problem in itself, but shouldn't cause a panic). The second if let can be removed however the third one is also important because it would fail if the variable was assigned to, but only after the return statement (similar problem). In my earlier testing it appeared that every assignment would be in the vector even if they were after the return statement (which was against an assumption I had made and was causing that infinite loop error because it was slicing the wrong way)

line.chars()
.take_while(|c| stylist.indentation().contains(*c))
.collect()
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we perhaps use the indentation helper in crates/ruff/src/ast/whitespace.rs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe? But I would guess it would be slower because I would have to create a bunch of useless intermediate objects because of the way that the function takes in input as a Located<T>, and I mainly just have a &str

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would also be kind of inconvenient. I might have to iterate through it in a sliding window fashion manually (it would also be more performant).

Copy link
Contributor Author

@Sawbez Sawbez Feb 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify, the function you are referring to would be inconvenient (and quite possibly slower) either way, and rewriting without TupleWindows would be faster either way.


if checker.patch(diagnostic.kind.rule()) {
if let Some(assign_expr) = stack.assign_values.get(id.as_str()) {
if let Some(assign_locations) = stack.assigns.get(id.as_str()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add some comments about what assign_expr, assign_locations, etc. represent here? I wrote some of this originally and even I can't remember what they mean and how they apply to this fix heh.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.

pub loops: Vec<(Location, Location)>,
pub tries: Vec<(Location, Location)>,
pub assign_values: FxHashMap<&'a str, Expr>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be FxHashMap<&'a str, &'a Expr>? Would that let you avoid the clone?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will try this.

.or_insert_with(Vec::new)
.push(expr.location);
return;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the same as the previous version? I know we have the logic in // This is hacky ..., but what if we visit a tuple here? Then we'd iterate over the names as part of this method via the recursive call above. In the current version, the self.visit_assign_target in the Tuple case above would have no effect, I think? Given (a, b, c).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused on what you mean here

@Sawbez
Copy link
Contributor Author

Sawbez commented Feb 16, 2023

I will start working on this later today.

@Sawbez
Copy link
Contributor Author

Sawbez commented Feb 16, 2023

Thanks for the tips!

@charliermarsh
Copy link
Member

@Sawbez - I'm sorry for the slow review here. The autofix stuff has to be very carefully done, and the flake8-return logic is always really hard for me to reason about, so I struggled to find time to really dig into it.

@Sawbez
Copy link
Contributor Author

Sawbez commented Feb 17, 2023

@charliermarsh No problem. As long as you get to it eventually, then I'm good. Take as long (or as little) time as you would like.

@charliermarsh
Copy link
Member

I take responsibility for not getting this merged, but I suspect the code has changed enough that we'd want to revisit from scratch. Having seen more bugs and issues related to the flake8-return rules, I'd probably only want to support this behavior for trivial cases (e.g., return immediately following an assignment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants