This experiment framework is a utility library that uses the prompt flow SDK to conduct, track and analyze experiments.
-
Developer-Friendly Experiment Execution: Simplified APIs streamline the process of running experiments.
-
Flexible Execution Environment: Experimentation can be conducted on both local machines and on Azure Machine Learning (AML) compute, facilitating seamless switching between environments based on dataset sizes.
-
Versatile Experiment Flows: Enable the chaining of experiments, allowing easy passing of outputs from one experiment to another.
-
Efficient Experiment Tracking: Unique identifiers and tags help monitor and differentiate experiments, aiding in efficient tracking and management.
-
Variants and Connected Runs: Simplified APIs enable the creation of experiment runs with multiple variants and connected runs in a single step, this in turn will create multiple runs using PromptFlow automatically.
-
Output Management: Provides utility functions to retrieve experiment outputs in various formats (CSV or JSONL) and merge outputs from multiple runs for streamlined analysis.
-
Custom Python Tool for GPT4 with Vision or GPT4o model: Offers a custom Python tool that allows to make GPT4 vision/GPT4o with
detail
parameter to control the resolution. Read more on the implementation here.
Please follow this link for the dev setup details.
- docs : Contains the documentation.
- common: Contains common python files required for experiment execution
- keyword_correctness: Contains experiment and evaluation flow for keyword correctness use case.
Along with the framework, there is a sample use case for keyword correctness is implemented here with a two step experiment flows and also has an evaluation flow.
Experiment architecture - Read more on the details of the experiment architecture
Guidelines to execute the experiments - Follow this link to understand the experiment execution
Understand the metrics - The sample evaluation flow evaluates the results, here is the breakdown of the metrics evaluated in the evaluation flow