Feature/llm custom metric #48

pantonante · 2024-03-17T00:29:05Z

	🚀 This PR description was created by Ellipsis for commit `c3e8db5`.

Summary:

This PR introduces a new feature for creating custom LLM-based metrics, adds new classes, updates documentation, provides an example script, and includes minor updates.

Key points:

Introduced a new feature for creating custom LLM-based metrics
Added LLMBasedCustomMetric, ScoringFunctions, and EvaluationExample classes in /continuous_eval/metrics/generation/text/custom.py
Updated documentation in /docs/src/content/docs/metrics/Generation/LLM-Based/custom.md
Provided example script at /examples/llm_custom_criteria.py
Updated _numeric_matcher function in custom.py
Minor updates in import statements and documentation links

Generated with ❤️ by ellipsis.dev

ellipsis-dev

❌ Changes requested.

Reviewed the entire pull request up to 9dd5b99
Looked at 276 lines of code in 4 files
Took 1 minute and 0 seconds to review

More info

Skipped 0 files when reviewing.
Skipped posting 1 additional comments because they didn't meet confidence threshold of 50%.

1. /continuous_eval/metrics/generation/text/custom.py:86:

Assessed confidence : 66%
Grade: 0%
Comment:
The _scoring_function is expected to handle the string output of the LLM. Ensure that the _scoring_function is properly designed to handle this type of input. For example, if the Numeric scoring function is used, it should be able to correctly extract a numeric value from the string output of the LLM.
Reasoning:
The LLMBasedCustomMetric class uses the _scoring_function to parse the output of the LLM and return a score. However, the output of the LLM is a string, and the _scoring_function is expected to handle this string. If the _scoring_function is not properly designed to handle the string output of the LLM, it could lead to errors or incorrect scores. For example, the Numeric scoring function defined in the ScoringFunctions class uses the _numeric_matcher function to extract a numeric value from the input string, but as noted earlier, the _numeric_matcher function does not correctly match numeric values.

Workflow ID: wflow_U3tYJ2SBKDwEg3hG

Want Ellipsis to fix these issues? Tag @ellipsis-dev in a comment. We'll respond in a few minutes. Learn more here.

ellipsis-dev · 2024-03-17T00:30:15Z

continuous_eval/metrics/generation/text/custom.py

+
+
+def _numeric_matcher(input_val, min_val, max_val):
+    grp = re.search(f"[{min_val}-{max_val}]", input_val)


The regular expression in the _numeric_matcher function is not correctly defined for matching numeric values within a range. It will match any single character that falls within the ASCII range defined by the ASCII values of the characters representing min_val and max_val. Consider using a different approach to extract the numeric value from the input string.

ellipsis-dev

👍 Looks good to me!

Performed an incremental review on 4e85d59
Looked at 18 lines of code in 1 files
Took 1 minute and 16 seconds to review

More info

Skipped 0 files when reviewing.
Skipped posting 1 additional comments because they didn't meet confidence threshold of 50%.

1. continuous_eval/metrics/generation/text/custom.py:14:

Assessed confidence : 50%
Comment:
The _numeric_matcher function currently only returns the first number it finds in the input string. This might not be the correct behavior if there are multiple numbers in the input. Consider modifying the function to handle multiple numbers in some way, either by returning a list of all the numbers it finds, or by taking an additional argument that specifies which number to return.
Reasoning:
The _numeric_matcher function is using a regular expression to find a number in the input string and then it's returning the first match. However, it's not clear what should happen if there are multiple numbers in the input string. The function currently just takes the first number it finds, but this might not be the correct behavior in all cases. It would be better if the function could handle multiple numbers in some way, either by returning a list of all the numbers it finds, or by taking an additional argument that specifies which number to return.

Workflow ID: wflow_8ouRZYiA1lI1eC7u

Not what you expected? You can customize the content of the reviews using rules. Learn more here.

ellipsis-dev

👍 Looks good to me!

Performed an incremental review on d9b9fa3
Looked at 30 lines of code in 1 files
Took 1 minute and 15 seconds to review

More info

Skipped 0 files when reviewing.
Skipped posting 1 additional comments because they didn't meet confidence threshold of 50%.

1. /docs/src/content/docs/metrics/Retrieval/LLM-Based/llm_context_precision.md:9:

Assessed confidence : 100%
Grade: 0%
Comment:
The PR description mentions changes to the _numeric_matcher function in custom.py, but this file is not included in the diff. Please include this file in the PR for review.
Reasoning:
The PR description mentions changes to the _numeric_matcher function in custom.py, but the diff provided does not include this file. I need to check the changes in custom.py to ensure they are correct and do not introduce any bugs.

Workflow ID: wflow_fuxFxG1uWwDPWmqC

Not what you expected? You can customize the content of the reviews using rules. Learn more here.

ellipsis-dev

👍 Looks good to me!

Performed an incremental review on c11c440
Looked at 122 lines of code in 6 files
Took 1 minute and 1 seconds to review

More info

Skipped 0 files when reviewing.
Skipped posting 3 additional comments because they didn't meet confidence threshold of 50%.

1. continuous_eval/metrics/generation/text/__init__.py:12:

Assessed confidence : 50%
Comment:
Consider adding a warning message in the except block to inform the user about the missing semantic metrics.
Reasoning:
The import of the semantic metrics is wrapped in a try-except block to handle ImportError. This is a good practice to ensure that the code doesn't break if the semantic metrics are not available. However, it would be better to log a warning message in the except block to inform the user about the missing metrics.

2. pyproject.toml:3:

Assessed confidence : 50%
Comment:
Ensure that all the changes in this version have been properly documented in the changelog or release notes.
Reasoning:
The version number in the pyproject.toml file has been updated from 0.3.3 to 0.3.4. This is a good practice to keep track of different versions of the software. However, it's important to ensure that all the changes in this version have been properly documented in the changelog or release notes.

3. docs/src/content/docs/index.mdx:17:

Assessed confidence : 50%
Comment:
Ensure that the changes in the file paths do not break any existing links in the documentation.
Reasoning:
The PR introduces changes to the documentation files. The changes in the documentation files seem to be related to the path of the files. It's important to ensure that these changes do not break any existing links in the documentation.

Workflow ID: wflow_zf9x59THGA9vrk8z

Not what you expected? You can customize the content of the reviews using rules. Learn more here.

ellipsis-dev

No problems found on commit c3e8db5.

Not what you expected? You can customize the content of the reviews using rules. Learn more here.

pantonante added 2 commits March 16, 2024 17:23

Add LLM Custom metric

f9c2f99

Typos

9dd5b99

pantonante requested a review from yisz March 17, 2024 00:29