Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BFCL] Support Multi-Model Multi-Category Generation; Add Index to Dataset; Handle vLLM Benign Error #540

Merged
merged 22 commits into from
Jul 24, 2024

Conversation

HuanzhiMao
Copy link
Collaborator

@HuanzhiMao HuanzhiMao commented Jul 22, 2024

In this PR:

  1. Support Multi-Model Multi-Category Generation:

  2. Handling vLLM's Error:

    • A benign error would occur during the cleanup phase after completing a generation task, causing the pipeline to fail despite generating model results. This issue stems from vLLM and is outside our control. See this issue from the vLLM repo.
    • This is annoying because when users attempt category-specific generation for locally-hosted models (as supported in [BFCL] Support Category-Specific Generation for OSS Model, Remove eval_data_compilation Step #512), only the first category result for the first model is generated since the error occurs immediately after.
    • To improve the user experience, we now combine all selected test categories into one task and submit that single task to vLLM, splitting the results afterwards.
    • Note: If multiple locally-hosted models are queued for inference, only the tasks of the first model will complete. Subsequent tasks will still fail due to the cleanup phase error from the first model. Therefore, we recommend running the inference command for one model at a time until vLLM rolls out the fix.
  3. Adding Index to Dataset:

    • Each test file and possible_answer file now includes an index to help match entries.

This PR will not affect the leaderboard score.

Squashed commit of the following:

commit e65a108
Author: Huanzhi Mao <huanzhimao@gmail.com>
Date:   Sat Jul 20 21:50:26 2024 -0700

    update README

commit 8034aed
Author: Huanzhi Mao <huanzhimao@gmail.com>
Date:   Sat Jul 20 17:44:50 2024 -0700

    refactor glm_handler to simplify logic and apply fix

commit 83912f0
Author: Huanzhi Mao <huanzhimao@gmail.com>
Date:   Sat Jul 20 17:31:33 2024 -0700

    polish process_input section

commit 7d08daf
Author: Huanzhi Mao <huanzhimao@gmail.com>
Date:   Sat Jul 20 15:46:06 2024 -0700

    simplify _batch_generate logic; seperate out process_input section

commit c5ac395
Author: Huanzhi Mao <huanzhimao@gmail.com>
Date:   Sat Jul 20 15:27:42 2024 -0700

    remove outdated gemma model name

commit b59af2c
Author: Huanzhi Mao <huanzhimao@gmail.com>
Date:   Sat Jul 20 14:32:23 2024 -0700

    revert, as vllm still requires ray

commit 7a275d7
Author: Huanzhi Mao <huanzhimao@gmail.com>
Date:   Sat Jul 20 14:27:44 2024 -0700

    remove ray from requirements.txt

commit 0d1c478
Merge: 32c1ad4 7b230df
Author: Huanzhi (Hans) Mao <huanzhimao@gmail.com>
Date:   Sat Jul 20 00:01:25 2024 -0700

    Merge branch 'main' into main

commit 32c1ad4
Author: Huanzhi Mao <huanzhimao@gmail.com>
Date:   Fri Jul 19 23:36:42 2024 -0700

    remove ray; use vllm tensor_parallel_size

commit 5ff790e
Author: Huanzhi Mao <huanzhimao@gmail.com>
Date:   Fri Jul 19 21:21:08 2024 -0700

    remove torch inference_mode
@HuanzhiMao HuanzhiMao changed the title [BFCL] Add Index to Datase; Handle vLLM Benign Error [BFCL] Add Index to Dataset; Handle vLLM Benign Error Jul 22, 2024
Squashed commit of the following:

commit 407d443
Author: Huanzhi Mao <huanzhimao@gmail.com>
Date:   Sun Jul 21 00:35:32 2024 -0700

    Add missing description field for 2 entry in javascript dataset

commit 2c0d311
Author: Huanzhi Mao <huanzhimao@gmail.com>
Date:   Sat Jul 20 23:37:20 2024 -0700

    update change log

commit 0d3bc83
Author: Huanzhi Mao <huanzhimao@gmail.com>
Date:   Sat Jul 20 23:03:43 2024 -0700

    clean up

commit 33b9fc3
Author: Huanzhi Mao <huanzhimao@gmail.com>
Date:   Sat Jul 20 22:44:05 2024 -0700

    remove unused parameter string_param from all model_handlers

commit ad7c337
Author: Huanzhi Mao <huanzhimao@gmail.com>
Date:   Sat Jul 20 22:43:27 2024 -0700

    fix language_specific_pre_processing
@HuanzhiMao HuanzhiMao marked this pull request as ready for review July 22, 2024 05:30
Copy link
Collaborator

@CharlieJCJ CharlieJCJ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@HuanzhiMao HuanzhiMao changed the title [BFCL] Add Index to Dataset; Handle vLLM Benign Error [BFCL] Support Multi-Model Multi-Category Generation; Add Index to Dataset; Handle vLLM Benign Error Jul 23, 2024
Copy link
Contributor

@Fanjia-Yan Fanjia-Yan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested with various setup.

@ShishirPatil ShishirPatil merged commit 51682f7 into ShishirPatil:main Jul 24, 2024
@HuanzhiMao HuanzhiMao deleted the add-index branch July 24, 2024 08:00
aw632 pushed a commit to vinaybagade/gorilla that referenced this pull request Aug 22, 2024
…taset; Handle vLLM Benign Error (ShishirPatil#540)

In this PR:

1. **Support Multi-Model Multi-Category Generation**:
- The `openfunctions_evaluation.py` can now take a list of model names
and a list of test categories as command line input.
   - Partially address ShishirPatil#501.

2. **Handling vLLM's Error**:
- A benign error would occur during the cleanup phase after completing a
generation task, causing the pipeline to fail despite generating model
results. This issue stems from vLLM and is outside our control. [See
this issue](vllm-project/vllm#6145) from the
vLLM repo.
- This is annoying because when users attempt category-specific
generation for locally-hosted models (as supported in ShishirPatil#512), only the
first category result for the first model is generated since the error
occurs immediately after.
- To improve the user experience, we now combine all selected test
categories into one task and submit that single task to vLLM, splitting
the results afterwards.
- Note: If multiple locally-hosted models are queued for inference, only
the tasks of the first model will complete. Subsequent tasks will still
fail due to the cleanup phase error from the first model. Therefore, we
recommend running the inference command for one model at a time until
vLLM rolls out the fix.

3. **Adding Index to Dataset**:
- Each test file and possible_answer file now includes an index to help
match entries.
  
This PR **will not** affect the leaderboard score.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants