Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AGBenchmark: Codebase clean-up #6650

Merged
merged 33 commits into from
Jan 2, 2024
Merged

AGBenchmark: Codebase clean-up #6650

merged 33 commits into from
Jan 2, 2024

Commits on Dec 28, 2023

  1. refactor(benchmark): Deduplicate configuration loading logic

    - Move the configuration loading logic to a separate `load_agbenchmark_config` function in `agbenchmark/config.py` module.
    - Replace the duplicate loading logic in `conftest.py`, `generate_test.py`, `ReportManager.py`, `reports.py`, and `__main__.py` with calls to `load_agbenchmark_config` function.
    Pwuts committed Dec 28, 2023
    Configuration menu
    Copy the full SHA
    20862ca View commit details
    Browse the repository at this point in the history
  2. fix(benchmark): Fix type errors, linting errors, and clean up CLI val…

    …idation in __main__.py
    
    - Fixed type errors and linting errors in `__main__.py`
    - Improved the readability of CLI argument validation by introducing a separate function for it
    Pwuts committed Dec 28, 2023
    Configuration menu
    Copy the full SHA
    c14cfd8 View commit details
    Browse the repository at this point in the history

Commits on Dec 29, 2023

  1. refactor(benchmark): Lint and typefix app.py

    - Rearranged and cleaned up import statements
    - Fixed type errors caused by improper use of `psutil` objects
    - Simplified a number of `os.path` usages by converting to `pathlib`
    - Use `Task` and `TaskRequestBody` classes from `agent_protocol_client` instead of `.schema`
    Pwuts committed Dec 29, 2023
    Configuration menu
    Copy the full SHA
    4a32265 View commit details
    Browse the repository at this point in the history
  2. refactor(benchmark): Replace .agent_protocol_client by `agent-protc…

    …ol-client`, clean up schema.py
    
    - Remove `agbenchmark.agent_protocol_client` (an offline copy of `agent-protocol-client`).
       - Add `agent-protocol-client` as a dependency and change imports to `agent_protocol_client`.
    - Fix type annotation on `agent_api_interface.py::upload_artifacts` (`ApiClient` -> `AgentApi`).
    - Remove all unused types from schema.py (= most of them).
    Pwuts committed Dec 29, 2023
    Configuration menu
    Copy the full SHA
    60b9148 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    9fb7b75 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    14d52b8 View commit details
    Browse the repository at this point in the history
  5. refactor(benchmark): Improve typing, response validation, and readabi…

    …lity in app.py
    
    - Simplified response generation by leveraging type checking and conversion by FastAPI.
    - Introduced use of `HTTPException` for error responses.
    - Improved naming, formatting, and typing in `app.py::create_evaluation`.
    - Updated the docstring on `app.py::create_agent_task`.
    - Fixed return type annotations of `create_single_test` and `create_challenge` in generate_test.py.
    - Added default values to optional attributes on models in report_types_v2.py.
    - Removed unused imports in `generate_test.py`
    Pwuts committed Dec 29, 2023
    Configuration menu
    Copy the full SHA
    41b4972 View commit details
    Browse the repository at this point in the history

Commits on Dec 30, 2023

  1. refactor(benchmark): Clean up logging and print statements

    - Introduced use of the `logging` library for unified logging and better readability.
    - Converted most print statements to use `logger.debug`, `logger.warning`, and `logger.error`.
    - Improved descriptiveness of log statements.
    - Removed unnecessary print statements.
    - Added log statements to unspecific and non-verbose `except` blocks.
    - Added `--debug` flag, which sets the log level to `DEBUG` and enables a more comprehensive log format.
    - Added `.utils.logging` module with `configure_logging` function to easily configure the logging library.
    - Converted raw escape sequences in `.utils.challenge` to use `colorama`.
    - Renamed `generate_test.py::generate_tests` to `load_challenges`.
    Pwuts committed Dec 30, 2023
    Configuration menu
    Copy the full SHA
    4064eb7 View commit details
    Browse the repository at this point in the history
  2. refactor(benchmark): Remove unused server.py and agent_interface.py::…

    …run_agent
    
    - Remove unused server.py file
    - Remove unused run_agent function from agent_interface.py
    Pwuts committed Dec 30, 2023
    Configuration menu
    Copy the full SHA
    56d8d83 View commit details
    Browse the repository at this point in the history
  3. refactor(benchmark): Clean up conftest.py

    - Fix and add type annotations
    - Rewrite docstrings
    - Disable or remove unused code
    - Fix definition of arguments and their types in `pytest_addoption`
    Pwuts committed Dec 30, 2023
    Configuration menu
    Copy the full SHA
    1aa1261 View commit details
    Browse the repository at this point in the history
  4. refactor(benchmark): Clean up generate_test.py file

    - Refactored the `create_single_test` function for clarity and readability
       - Removed unused variables
       - Made creation of `Challenge` subclasses more straightforward
       - Made bare `except` more specific
    - Renamed `Challenge.setup_challenge` method to `run_challenge`
    - Updated type hints and annotations
    - Made minor code/readability improvements in `load_challenges`
    - Added a helper function `_add_challenge_to_module` for attaching a Challenge class to the current module
    Pwuts committed Dec 30, 2023
    Configuration menu
    Copy the full SHA
    d89c7ea View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    294f6ff View commit details
    Browse the repository at this point in the history
  6. refactor(benchmark): Simplify const determination in agent_interface.py

    - Simplify the logic that determines the value of `HELICONE_GRAPHQL_LOGS`
    Pwuts committed Dec 30, 2023
    Configuration menu
    Copy the full SHA
    1ea4123 View commit details
    Browse the repository at this point in the history
  7. fix(benchmark): Register category markers to prevent warnings

    - Use the `pytest_configure` hook to register the known challenge categories as markers. Otherwise, Pytest will raise "unknown marker" warnings at runtime.
    Pwuts committed Dec 30, 2023
    Configuration menu
    Copy the full SHA
    c7cf2c7 View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    1db4bdc View commit details
    Browse the repository at this point in the history
  9. refactor(benchmark): Update agent_api_interface.py

    - Add type annotations to `copy_agent_artifacts_into_temp_folder` function
    - Add note about broken endpoint in the `agent_protocol_client` library
    - Remove unused variable in `run_api_agent` function
    - Improve readability and resolve linting error
    Pwuts committed Dec 30, 2023
    Configuration menu
    Copy the full SHA
    420469e View commit details
    Browse the repository at this point in the history
  10. feat(benchmark): Improve and centralize pathfinding

    - Search path hierarchy for applicable `agbenchmark_config`, rather than assuming it's in the current folder.
    - Create `agbenchmark.utils.path_manager` with `AGBenchmarkPathManager` and exporting a `PATH_MANAGER` const.
    - Replace path constants defined in __main__.py with usages of `PATH_MANAGER`.
    Pwuts committed Dec 30, 2023
    Configuration menu
    Copy the full SHA
    c3f2162 View commit details
    Browse the repository at this point in the history
  11. feat(benchmark/cli): Clean up and improve CLI

    - Updated commands, options, and their descriptions to be more intuitive and consistent
    - Moved slow imports into the entrypoints that use them to speed up application startup
    - Fixed type hints to match output types of Click options
    - Hid deprecated `agbenchmark start` command
    - Refactored code to improve readability and maintainability
    - Moved main entrypoint into `run` subcommand
    - Fixed `version` and `serve` subcommands
    - Added `click-default-group` package to allow using `run` implicitly (for backwards compatibility)
    - Renamed `--no_dep` to `--no-dep` for consistency
    - Fixed string formatting issues in log statements
    Pwuts committed Dec 30, 2023
    Configuration menu
    Copy the full SHA
    fab5366 View commit details
    Browse the repository at this point in the history

Commits on Jan 1, 2024

  1. refactor(benchmark/config): Move AgentBenchmarkConfig and related fun…

    …ctions to config.py
    
    - Move the `AgentBenchmarkConfig` class from `utils/data_types.py` to `config.py`.
    - Extract the `calculate_info_test_path` function from `utils/data_types.py` and move it to `config.py` as a private helper function `_calculate_info_test_path`.
    - Move `load_agent_benchmark_config()` to `AgentBenchmarkConfig.load()`.
    - Changed simple getter methods on `AgentBenchmarkConfig` to calculated properties.
    - Update all code references according to the changes mentioned above.
    Pwuts committed Jan 1, 2024
    Configuration menu
    Copy the full SHA
    956b439 View commit details
    Browse the repository at this point in the history
  2. refactor(benchmark): Fix ReportManager init parameter types and use p…

    …athlib
    
    - Fix the type annotation of the `benchmark_start_time` parameter in `ReportManager.__init__`, was mistyped as `str` instead of `datetime`.
    - Change the type of the `filename` parameter in the `ReportManager.__init__` method from `str` to `Path`.
    - Rename `self.filename` with `self.report_file` in `ReportManager`.
    - Change the way the report file is created, opened and saved to use the `Path` object.
    Pwuts committed Jan 1, 2024
    Configuration menu
    Copy the full SHA
    292ea9e View commit details
    Browse the repository at this point in the history
  3. refactor(benchmark): Improve typing surrounding ChallengeData and cle…

    …an up its implementation
    
    - Use `ChallengeData` objects instead of untyped `dict` in  app.py, generate_test.py, reports.py.
    - Remove unnecessary methods `serialize`, `get_data`, `get_json_from_path`, `deserialize` from `ChallengeData` class.
    - Remove unused methods `challenge_from_datum` and `challenge_from_test_data` from `ChallengeData class.
    - Update function signatures and annotations of `create_challenge` and `generate_single_test` functions in generate_test.py.
    - Add types to function signatures of `generate_single_call_report` and `finalize_reports` in reports.py.
    - Remove unnecessary `challenge_data` parameter (in generate_test.py) and fixture (in conftest.py).
    Pwuts committed Jan 1, 2024
    Configuration menu
    Copy the full SHA
    8990b23 View commit details
    Browse the repository at this point in the history
  4. refactor(benchmark): Clean up generate_test.py, conftest.py and __mai…

    …n__.py
    
    - Cleaned up generate_test.py and conftest.py
       - Consolidated challenge creation logic in the `Challenge` class itself, most notably the new `Challenge.from_challenge_spec` method.
       - Moved challenge selection logic from generate_test.py to the `pytest_collection_modifyitems` hook in conftest.py.
    - Converted methods in the `Challenge` class to class methods where appropriate.
    - Improved argument handling in the `run_benchmark` function in `__main__.py`.
    Pwuts committed Jan 1, 2024
    Configuration menu
    Copy the full SHA
    3ccb093 View commit details
    Browse the repository at this point in the history
  5. refactor(benchmark/config): Merge AGBenchmarkPathManager into AgentBe…

    …nchmarkConfig and reduce fragmented/global state
    
    - Merge the functionality of `AGBenchmarkPathManager` into `AgentBenchmarkConfig` to consolidate the configuration management.
    - Remove the `.path_manager` module containing `AGBenchmarkPathManager`.
    - Pass the `AgentBenchmarkConfig` and its attributes through function arguments to reduce global state and improve code clarity.
    Pwuts committed Jan 1, 2024
    Configuration menu
    Copy the full SHA
    6fe5149 View commit details
    Browse the repository at this point in the history
  6. feat(benchmark/serve): Configurable port for serve subcommand

    - Added `--port` option to `serve` subcommand to allow for specifying the port to run the API on.
    - If no `--port` option is provided, the port will default to the value specified in the `PORT` environment variable, or 8080 if not set.
    Pwuts committed Jan 1, 2024
    Configuration menu
    Copy the full SHA
    e09ec4e View commit details
    Browse the repository at this point in the history
  7. feat(benchmark/cli): Add config subcommand

    - Added a new subcommand `config` to the AGBenchmark CLI, to display information about the present AGBenchmark config.
    Pwuts committed Jan 1, 2024
    Configuration menu
    Copy the full SHA
    116f8c9 View commit details
    Browse the repository at this point in the history

Commits on Jan 2, 2024

  1. fix(benchmark): Gracefully handle incompatible challenge spec files i…

    …n app.py
    
    - Added a check to skip deprecated challenges
    - Added logging to allow debugging of the loading process
    - Added handling of validation errors when parsing challenge spec files
    - Added missing `spec_file` attribute to `ChallengeData`
    Pwuts committed Jan 2, 2024
    Configuration menu
    Copy the full SHA
    fb15bf9 View commit details
    Browse the repository at this point in the history
  2. refactor(benchmark): Move run_benchmark entrypoint to main.py, use …

    …it in `/reports` endpoint
    
    - Move `run_benchmark` and `validate_args` from __main__.py to main.py
    - Replace agbenchmark subprocess in `app.py:run_single_test` with `run_benchmark`
    - Move `get_unique_categories` from __main__.py to challenges/__init__.py
    - Move `OPTIONAL_CATEGORIES` from __main__.py to challenge.py
    - Reduce operations on updates.json (including `initialize_updates_file`) outside of API
    Pwuts committed Jan 2, 2024
    Configuration menu
    Copy the full SHA
    b786a29 View commit details
    Browse the repository at this point in the history
  3. refactor(benchmark): Remove unused /updates endpoint and all relate…

    …d code
    
    - Remove `updates_json_file` attribute from `AgentBenchmarkConfig`
    - Remove `get_updates` and `_initialize_updates_file` in app.py
    - Remove `append_updates_file` and `create_update_json` functions in agent_api_interface.py
    - Remove call to `append_updates_file` in challenge.py
    Pwuts committed Jan 2, 2024
    Configuration menu
    Copy the full SHA
    27c5459 View commit details
    Browse the repository at this point in the history
  4. refactor(benchmark/config): Clean up and update docstrings on `AgentB…

    …enchmarkConfig`
    
    - Add and update docstrings
    - Change base class from `BaseModel` to `BaseSettings`, allow extras for backwards compatibility
    - Make naming of path attributes on `AgentBenchmarkConfig` more consistent
    - Remove unused `agent_home_directory` attribute
    - Remove unused `workspace` attribute
    Pwuts committed Jan 2, 2024
    Configuration menu
    Copy the full SHA
    d6195b4 View commit details
    Browse the repository at this point in the history
  5. fix(benchmark): Restore mechanism to select (optional) categories in …

    …agent benchmark config
    Pwuts committed Jan 2, 2024
    Configuration menu
    Copy the full SHA
    7b92e81 View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    2b56e67 View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    25c1aae View commit details
    Browse the repository at this point in the history
  8. fix(benchmark): Update agent-protocol-client to v1.1.0

    - Fixes issue with fetching task artifact listings
    Pwuts committed Jan 2, 2024
    Configuration menu
    Copy the full SHA
    2135019 View commit details
    Browse the repository at this point in the history