-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/llm custom metric #48
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❌ Changes requested.
- Reviewed the entire pull request up to 9dd5b99
- Looked at
276
lines of code in4
files - Took 1 minute and 0 seconds to review
More info
- Skipped
0
files when reviewing. - Skipped posting
1
additional comments because they didn't meet confidence threshold of50%
.
1. /continuous_eval/metrics/generation/text/custom.py:86
:
- Assessed confidence :
66%
- Grade:
0%
- Comment:
The _scoring_function is expected to handle the string output of the LLM. Ensure that the _scoring_function is properly designed to handle this type of input. For example, if the Numeric scoring function is used, it should be able to correctly extract a numeric value from the string output of the LLM. - Reasoning:
The LLMBasedCustomMetric class uses the _scoring_function to parse the output of the LLM and return a score. However, the output of the LLM is a string, and the _scoring_function is expected to handle this string. If the _scoring_function is not properly designed to handle the string output of the LLM, it could lead to errors or incorrect scores. For example, the Numeric scoring function defined in the ScoringFunctions class uses the _numeric_matcher function to extract a numeric value from the input string, but as noted earlier, the _numeric_matcher function does not correctly match numeric values.
Workflow ID: wflow_U3tYJ2SBKDwEg3hG
Want Ellipsis to fix these issues? Tag @ellipsis-dev
in a comment. We'll respond in a few minutes. Learn more here.
|
||
|
||
def _numeric_matcher(input_val, min_val, max_val): | ||
grp = re.search(f"[{min_val}-{max_val}]", input_val) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The regular expression in the _numeric_matcher function is not correctly defined for matching numeric values within a range. It will match any single character that falls within the ASCII range defined by the ASCII values of the characters representing min_val and max_val. Consider using a different approach to extract the numeric value from the input string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Looks good to me!
- Performed an incremental review on 4e85d59
- Looked at
18
lines of code in1
files - Took 1 minute and 16 seconds to review
More info
- Skipped
0
files when reviewing. - Skipped posting
1
additional comments because they didn't meet confidence threshold of50%
.
1. continuous_eval/metrics/generation/text/custom.py:14
:
- Assessed confidence :
50%
- Comment:
The_numeric_matcher
function currently only returns the first number it finds in the input string. This might not be the correct behavior if there are multiple numbers in the input. Consider modifying the function to handle multiple numbers in some way, either by returning a list of all the numbers it finds, or by taking an additional argument that specifies which number to return. - Reasoning:
The _numeric_matcher function is using a regular expression to find a number in the input string and then it's returning the first match. However, it's not clear what should happen if there are multiple numbers in the input string. The function currently just takes the first number it finds, but this might not be the correct behavior in all cases. It would be better if the function could handle multiple numbers in some way, either by returning a list of all the numbers it finds, or by taking an additional argument that specifies which number to return.
Workflow ID: wflow_8ouRZYiA1lI1eC7u
Not what you expected? You can customize the content of the reviews using rules. Learn more here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Looks good to me!
- Performed an incremental review on d9b9fa3
- Looked at
30
lines of code in1
files - Took 1 minute and 15 seconds to review
More info
- Skipped
0
files when reviewing. - Skipped posting
1
additional comments because they didn't meet confidence threshold of50%
.
1. /docs/src/content/docs/metrics/Retrieval/LLM-Based/llm_context_precision.md:9
:
- Assessed confidence :
100%
- Grade:
0%
- Comment:
The PR description mentions changes to the_numeric_matcher
function incustom.py
, but this file is not included in the diff. Please include this file in the PR for review. - Reasoning:
The PR description mentions changes to the_numeric_matcher
function incustom.py
, but the diff provided does not include this file. I need to check the changes incustom.py
to ensure they are correct and do not introduce any bugs.
Workflow ID: wflow_fuxFxG1uWwDPWmqC
Not what you expected? You can customize the content of the reviews using rules. Learn more here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Looks good to me!
- Performed an incremental review on c11c440
- Looked at
122
lines of code in6
files - Took 1 minute and 1 seconds to review
More info
- Skipped
0
files when reviewing. - Skipped posting
3
additional comments because they didn't meet confidence threshold of50%
.
1. continuous_eval/metrics/generation/text/__init__.py:12
:
- Assessed confidence :
50%
- Comment:
Consider adding a warning message in the except block to inform the user about the missing semantic metrics. - Reasoning:
The import of the semantic metrics is wrapped in a try-except block to handle ImportError. This is a good practice to ensure that the code doesn't break if the semantic metrics are not available. However, it would be better to log a warning message in the except block to inform the user about the missing metrics.
2. pyproject.toml:3
:
- Assessed confidence :
50%
- Comment:
Ensure that all the changes in this version have been properly documented in the changelog or release notes. - Reasoning:
The version number in the pyproject.toml file has been updated from 0.3.3 to 0.3.4. This is a good practice to keep track of different versions of the software. However, it's important to ensure that all the changes in this version have been properly documented in the changelog or release notes.
3. docs/src/content/docs/index.mdx:17
:
- Assessed confidence :
50%
- Comment:
Ensure that the changes in the file paths do not break any existing links in the documentation. - Reasoning:
The PR introduces changes to the documentation files. The changes in the documentation files seem to be related to the path of the files. It's important to ensure that these changes do not break any existing links in the documentation.
Workflow ID: wflow_zf9x59THGA9vrk8z
Not what you expected? You can customize the content of the reviews using rules. Learn more here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary:
This PR introduces a new feature for creating custom LLM-based metrics, adds new classes, updates documentation, provides an example script, and includes minor updates.
Key points:
LLMBasedCustomMetric
,ScoringFunctions
, andEvaluationExample
classes in/continuous_eval/metrics/generation/text/custom.py
/docs/src/content/docs/metrics/Generation/LLM-Based/custom.md
/examples/llm_custom_criteria.py
_numeric_matcher
function incustom.py
Generated with ❤️ by ellipsis.dev