-
Notifications
You must be signed in to change notification settings - Fork 77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve cli for feedback review #816
Conversation
🚚 Moved rich console object into a separate file in utils as it can be reused whenever we want to use it in the future.
♻️ Used .format() instead of concatenating strings
…h/Mephisto into improve-cli-for-tips-review
🚸 Sorted review_feedback by question to make trends more similar
🔥 Replaced name print with print_out_task_names()
Codecov Report
@@ Coverage Diff @@
## main #816 +/- ##
==========================================
+ Coverage 64.50% 64.56% +0.05%
==========================================
Files 107 107
Lines 9281 9281
==========================================
+ Hits 5987 5992 +5
+ Misses 3294 3289 -5
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking this on, It looks good to me! I only had a comment about python constants. I'm not sure about python best practices, so I'm sharing what I've seen in most other languages.
One high level thought I do have is if a cli is the best interface for this kind of operation? Is that what researchers prefer to use? What if this instead was some kind if an admin web app?
@@ -187,37 +215,34 @@ def main(): | |||
) | |||
) | |||
|
|||
if see_unreviewed_feedback in no_response: | |||
# Filter the toxicity feedback to get unreviewed feedback | |||
if see_unreviewed_feedback == "r": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does "r" mean? It's not ideal to user hard-coded strings like this.
Does python have a concept of constants? If so, this could be something like:
if see_unreviewed_feedback == REVIEW_STATE.REVIEWED:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good callout. This is doable with python Enum
s most easily:
import enum
...
class FeedbackReviewType(enum.Enum):
"""Types of flags"""
REVIEWED = "r"
UNREVIEWED = "u"
...
choices = FeedbackReviewType.list()
...
Same feedback can also be applied to #813.
) | ||
elif see_unreviewed_feedback in yes_response: | ||
# Filter the toxicity feedback to get reviewed feedback | ||
elif see_unreviewed_feedback == "u": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same note as above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally this looks great! Somya's point is solid for better engineering, and should also likely be applied to #813. Approved pending that changeset
To the overall point of if this is the best UX, it's certainly the one that (when SSHed into a deploy server) researchers are used to. In the long run it'd be amazing to be able to create a full dashboard and include this workflow (@pringshia took a first stab at the core back here), but it hasn't been prioritized.
@@ -187,37 +215,34 @@ def main(): | |||
) | |||
) | |||
|
|||
if see_unreviewed_feedback in no_response: | |||
# Filter the toxicity feedback to get unreviewed feedback | |||
if see_unreviewed_feedback == "r": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good callout. This is doable with python Enum
s most easily:
import enum
...
class FeedbackReviewType(enum.Enum):
"""Types of flags"""
REVIEWED = "r"
UNREVIEWED = "u"
...
choices = FeedbackReviewType.list()
...
Same feedback can also be applied to #813.
Summary
Improves the cli for the feedback review script by providing default values when asking for inputs, by adding colors sparingly, and by using tables.
When going through reviewed feedback, the feedback is all displayed in one table for each agent.
Example:
Text that has a toxicity > 0.5 is red
I did not choose to filter by question in this case (I entered the default value of -1) and that is why there is > 1 questions in the tables.
In addition, I did not choose to filter by toxicity in this case and that is why the feedback with text, "I HATE YOU" is visible in the results.
When going through unreviewed feedback, each piece of feedback has a table as each piece of feedback can be set to be reviewed
Example
Video
The video goes through the whole process. It showcases the questions asked and the default options
review_feedback_script.mov