Skip to content

FlagEval Logo


FlagEval

FlagEval, launched by BAAI in 2023, is a comprehensive large model evaluation system that encompasses over 800 open-source and closed-source models from around the globe. It features more than 40 capability dimensions, including reasoning, mathematical skills, and task-solving abilities, along with five major tasks and four categories of metrics.


Recent Developments

In 2024, FlagEval expanded its offerings by launching the Colosseum and Debate Arena. These platforms are dedicated to model-to-model competition and battle, fostering a competitive environment for continuous improvement.


Visit FlagEval

Popular repositories Loading

  1. FlagEval FlagEval Public

    FlagEval is an evaluation toolkit for AI large foundation models.

    Python 311 27

  2. FlagEvalMM FlagEvalMM Public

    A Flexible Framework for Comprehensive Multimodal Model Evaluation

    Python 54 4

  3. CMMU CMMU Public

    [IJCAI 2024] CMMU: A Benchmark for Chinese Multi-modal Multi-type Question Understanding and Reasoning

    Python 23

  4. HalluDial HalluDial Public

    Python 15 1

  5. FlagEval_Report FlagEval_Report Public

    CSS

  6. .github .github Public

Repositories

Showing 6 of 6 repositories
  • FlagEvalMM Public

    A Flexible Framework for Comprehensive Multimodal Model Evaluation

    flageval-baai/FlagEvalMM’s past year of commit activity
    Python 54 4 0 0 Updated Dec 19, 2024
  • .github Public
    flageval-baai/.github’s past year of commit activity
    0 0 0 0 Updated Nov 8, 2024
  • HalluDial Public
    flageval-baai/HalluDial’s past year of commit activity
    Python 15 1 1 0 Updated Aug 19, 2024
  • flageval-baai/FlagEval_Report’s past year of commit activity
    CSS 0 0 0 0 Updated Jul 18, 2024
  • FlagEval Public

    FlagEval is an evaluation toolkit for AI large foundation models.

    flageval-baai/FlagEval’s past year of commit activity
    Python 311 Apache-2.0 27 4 2 Updated Jul 13, 2024
  • CMMU Public

    [IJCAI 2024] CMMU: A Benchmark for Chinese Multi-modal Multi-type Question Understanding and Reasoning

    flageval-baai/CMMU’s past year of commit activity
    Python 23 0 0 0 Updated Feb 1, 2024

People

This organization has no public members. You must be a member to see who’s a part of this organization.

Top languages

Loading…

Most used topics

Loading…