Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BFCL] Support Multi-Model Multi-Category Generation; Add Index to Dataset; Handle vLLM Benign Error #540

Merged
merged 22 commits into from
Jul 24, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
f57da63
Branch off #537.
HuanzhiMao Jul 21, 2024
1391d2e
add index to dataset files
HuanzhiMao Jul 22, 2024
a974607
add index to possible_answer files
HuanzhiMao Jul 22, 2024
20d58fa
resolve merge conflicts with #538
HuanzhiMao Jul 22, 2024
3c6e42e
remove outdated oss_file_formatter function
HuanzhiMao Jul 22, 2024
e2a2aa6
update handler write method and remove unused load_result method
HuanzhiMao Jul 22, 2024
e20f130
use 'id' instead of 'index' for consistency
HuanzhiMao Jul 22, 2024
6a0a4fe
update openfunctions_evaluation.py to combine test_cases
HuanzhiMao Jul 22, 2024
4a8225c
update changelog
HuanzhiMao Jul 22, 2024
ed47fd5
update glm_handler, simplify logic
HuanzhiMao Jul 22, 2024
94d12bc
update inference method function signature for all oss model handler
HuanzhiMao Jul 22, 2024
e2e8ad5
fix typo
HuanzhiMao Jul 22, 2024
9cf3f05
chore: clean up
HuanzhiMao Jul 22, 2024
e00ba23
include id in generation result
HuanzhiMao Jul 22, 2024
a9304b4
update write method to support single dict input as well
HuanzhiMao Jul 22, 2024
d7ea8d6
Merge remote-tracking branch 'upstream/main' into add-index
HuanzhiMao Jul 22, 2024
fbc76de
add ground_truth field in possible_answer file
HuanzhiMao Jul 22, 2024
7ef6d27
remove duplicate ground_truth in simple possible_answer file
HuanzhiMao Jul 22, 2024
efd9ed7
fix typo
HuanzhiMao Jul 22, 2024
2331526
add argument gpu_memory_utilization to inference method
HuanzhiMao Jul 22, 2024
c0d99a7
support multi-model and multi-categpry result generation in command line
HuanzhiMao Jul 23, 2024
37a4224
Merge remote-tracking branch 'upstream/main' into add-index
HuanzhiMao Jul 24, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions berkeley-function-call-leaderboard/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -208,6 +208,7 @@ Some companies have proposed some optimization strategies in their models' handl

## Changelog

* [July 22, 2024] [#540](https://github.com/ShishirPatil/gorilla/pull/540): Chore: Improve handling of vLLM's cleanup phase error by combining all selected test categories into one single task to submit to the vLLM server.
* [July 21, 2024] [#538](https://github.com/ShishirPatil/gorilla/pull/538): Fix `language_specific_pre_processing` function to properly handle pre-processing for prompts and function docs in Java and JavaScript test categories. All entries in these categories are affected.
* [July 20, 2024] [#537](https://github.com/ShishirPatil/gorilla/pull/537): Update generation script for locally-hosted OSS model to use single-node multi-GPU inference method (tensor parallel). Ray is not used anymore.
* [July 16, 2024] [#525](https://github.com/ShishirPatil/gorilla/pull/525), [#536](https://github.com/ShishirPatil/gorilla/pull/536): Add new model `ibm-granite/granite-20b-functioncalling` to the leaderboard.
Expand Down

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
@@ -1,50 +1,50 @@
{"validateUserInput":{"inputField":["userInputField"],"isComplete":[true]}}
{"getActiveDataEntries":{"listElement":["listElement"],"attribute":["data-active", ""],"value":[true,""]}}
{"extractLastTransactionId":{"filepath":["/var/log/db.log"],"status":[["completed","failed"]],"encoding":["utf-8"],"processFunction":["processFunction"]}}
{"submitAtCoordinate":{"action":["submit"],"formId":["loginForm"],"coordinates":[[0.6,0.3]]}}
{"emailFormatValidator":{"email":["example@domain.com"],"domain":["domain.com"]}}
{"manageReactState":{"store":[{"initialState":["initialStateObject"],"reducers":["reducersMap"],"middlewares":[["loggerMiddleware"]],"enhancers":[["applyMiddleware('myMiddleWare')"]]}],"context":["React.createContext()"],"hooks":[{"useStateSelector":"useStateSelectorHook","useDispatchAction":"useDispatchActionHook"}]}}
{"mapTransitions":{"category":["transition"],"limit":[4.0]}}
{"getNextKeyValues":{"ctx":["dataAnalysisContext"],"currentKey":["userId"]}}
{"doesEmailInputExist":{"formElem":["emailForm"],"inputName":["emailAddress"]}}
{"validateApiResponse":{"jsonPayload":["responseData"],"keyToCheck":["expectedKey"],"processingCallback":["processKeyFunction"]}}
{"fetchSalesDepartmentRecords":{"databaseName":["employeeRecords"],"queryFunction":["getSales"]}}
{"prioritizeAndSort":{"items":["myItemList"],"priorityStatus":["urgent"],"ascending":[true]}}
{"performDataFetch":{"apiEndpoint":["https://api.example.com/data"],"requestConfig":[{"method":["GET"]}],"expectedResponse":[{"key":["value"]}],"handleErrors":[true]}}
{"DynamicChartGenerator":{"userData":[["userDataArray"]],"scalingFactor":[3.0],"dashboard":["dashboardElement"],"options":["", {}]}}
{"chartDataAccessorFactory":{"chart":[{"nm":["BarChart"],"mn":["chartModule"]}],"library":["visualizationLibrary"],"configObject":["config"]}}
{"ChartSeriesGenerator":{"labels":["axisLabelsArray"],"data":["dataPointsArray"],"color":["defaultColor"],"chartLayout":["chartLayoutObject"]}}
{"rotateVertices":{"vertices":[[10.0,15.0],[20.0,25.0]],"pivot":[[12.0,17.0]],"angle":[30.0]}}
{"generateNotificationHandler":{"app":["app"],"priorityLevel":[3],"messagingService":["messagingSvc"],"notificationType":[2]}}
{"calculateFinalVelocity":{"time":[5.0],"gravity":[9.81],"initialVelocity":[0.0]}}
{"configureShaderMaterial":{"property":["materialProps"],"textures":["textureList"],"object3D":["meshObject"]}}
{"buttonAddClickHandler":{"element":["myButton"],"callback":["handleButtonClick"],"options":[{"stopPropagation":[true]}]}}
{"findProductById":{"products":[["Product A","Product B","Product C"]],"id":[123]}}
{"resetStateProperty":{"stateProperty":["userSession"]}}
{"createAuthToken":{"username":["johndoe"],"validity":[3600],"options":[{"issuer":["myapp.net"],"role":["admin"],"algorithm":["HS256"]}]}}
{"getUniqueSorted":{"array":[[3,1,2,1,4,3]]}}
{"trackSubmitWithValidation":{"obj":["formHandler"],"validationFlags":[["isRequired","isValidEmail"]]}}
{"contentUpdater":{"elementID":["contentBox"],"newContent":["Hello World"],"action":["update"]}}
{"validateReactProp":{"obj":["serviceProvider"],"componentName":["UserProfile"]}}
{"filterBooksByAuthor":{"library":[["bookA","bookB","bookC"]],"author":["J.K. Rowling"]}}
{"EventScheduler":{"events":[{"setupStage": ["setupStageFunction"],"cleanupStage": ["cleanupStageFunction"]}],"concurrencyLimit":[3.0]}}
{"setText":{"newText":["Hello, World!"],"start":[5.0],"length":[7.0]}}
{"transformAllDecoratorsOfDeclaration":{"node":["myNode"],"container":["myContainer"]}}
{"pollQueue":{"queue":["fileWatchQueue"],"pollingInterval":[500.0],"pollIndex":[0.0],"chunkSize":[10.0]}}
{"emitNewLineBeforeLeadingComments":{"lineMap":["tsLineMap"],"writer":["tsWriter"],"node":[42]}}
{"forEachType":{"type":["unionTypeObj"],"f":["processType"]}}
{"areDeclarationFlagsIdentical":{"left":["parameterObjects"],"right":["variableDeclarationObject"]}}
{"updateBreak":{"node":["breakNode"],"label":["loopEnd"]}}
{"addInitializedPropertyStatements":{"statements":["shapeStatements"],"property":[["width","height"],["height","width"]],"receiver":["shape"]}}
{"getDirectoryToWatchFromFailedLookupLocationDirectory":{"dir":["/projects/myApp/node_modules/react"],"dirPath":["/projects/myApp/node_modules/react"]}}
{"maybeAddJsSyntheticRestParameter":{"declaration":["funcDeclaration"],"parameters":["funcParameters"]}}
{"assignOwnDefaults":{"objectValue":[12.0],"sourceValue":[10.0],"key":["maxItems"],"object":[{}]}}
{"queue":{"worker":["myWorkerFunction"],"concurrency":[5.0],"payload":["", 0.0]}}
{"B":{"t":[5.0]}}
{"invokeCallback":{"callback":["processResult"],"error":["null"],"value":["Operation successful"]}}
{"skipThrough":{"node":["currentNode"],"st":["nodeState"],"c":["processNode"]}}
{"Sde":{"t":["https://github.com/yarnpkg/berry"],"e":[{"startingCwd":["/home/user/projects"]}]}}
{"vOe":{"r":["packageInfo"],"e":["version"],"t":["1.2.3"]}}
{"sTe":{"r":["2023-04-01"],"e":["2023-04-15"],"t":["days"]}}
{"updateDOMListeners":{"oldVnode":["oldVirtualNode"],"vnode":["newVirtualNode"]}}
{"convertEnumeratedValue":{"key":["contenteditable"],"value":["plaintext-only"]}}
{"id": "javascript_0", "ground_truth": {"validateUserInput": {"inputField": ["userInputField"], "isComplete": [true]}}}
{"id": "javascript_1", "ground_truth": {"getActiveDataEntries": {"listElement": ["listElement"], "attribute": ["data-active", ""], "value": [true, ""]}}}
{"id": "javascript_2", "ground_truth": {"extractLastTransactionId": {"filepath": ["/var/log/db.log"], "status": [["completed", "failed"]], "encoding": ["utf-8"], "processFunction": ["processFunction"]}}}
{"id": "javascript_3", "ground_truth": {"submitAtCoordinate": {"action": ["submit"], "formId": ["loginForm"], "coordinates": [[0.6, 0.3]]}}}
{"id": "javascript_4", "ground_truth": {"emailFormatValidator": {"email": ["example@domain.com"], "domain": ["domain.com"]}}}
{"id": "javascript_5", "ground_truth": {"manageReactState": {"store": [{"initialState": ["initialStateObject"], "reducers": ["reducersMap"], "middlewares": [["loggerMiddleware"]], "enhancers": [["applyMiddleware('myMiddleWare')"]]}], "context": ["React.createContext()"], "hooks": [{"useStateSelector": "useStateSelectorHook", "useDispatchAction": "useDispatchActionHook"}]}}}
{"id": "javascript_6", "ground_truth": {"mapTransitions": {"category": ["transition"], "limit": [4.0]}}}
{"id": "javascript_7", "ground_truth": {"getNextKeyValues": {"ctx": ["dataAnalysisContext"], "currentKey": ["userId"]}}}
{"id": "javascript_8", "ground_truth": {"doesEmailInputExist": {"formElem": ["emailForm"], "inputName": ["emailAddress"]}}}
{"id": "javascript_9", "ground_truth": {"validateApiResponse": {"jsonPayload": ["responseData"], "keyToCheck": ["expectedKey"], "processingCallback": ["processKeyFunction"]}}}
{"id": "javascript_10", "ground_truth": {"fetchSalesDepartmentRecords": {"databaseName": ["employeeRecords"], "queryFunction": ["getSales"]}}}
{"id": "javascript_11", "ground_truth": {"prioritizeAndSort": {"items": ["myItemList"], "priorityStatus": ["urgent"], "ascending": [true]}}}
{"id": "javascript_12", "ground_truth": {"performDataFetch": {"apiEndpoint": ["https://api.example.com/data"], "requestConfig": [{"method": ["GET"]}], "expectedResponse": [{"key": ["value"]}], "handleErrors": [true]}}}
{"id": "javascript_13", "ground_truth": {"DynamicChartGenerator": {"userData": [["userDataArray"]], "scalingFactor": [3.0], "dashboard": ["dashboardElement"], "options": ["", {}]}}}
{"id": "javascript_14", "ground_truth": {"chartDataAccessorFactory": {"chart": [{"nm": ["BarChart"], "mn": ["chartModule"]}], "library": ["visualizationLibrary"], "configObject": ["config"]}}}
{"id": "javascript_15", "ground_truth": {"ChartSeriesGenerator": {"labels": ["axisLabelsArray"], "data": ["dataPointsArray"], "color": ["defaultColor"], "chartLayout": ["chartLayoutObject"]}}}
{"id": "javascript_16", "ground_truth": {"rotateVertices": {"vertices": [[10.0, 15.0], [20.0, 25.0]], "pivot": [[12.0, 17.0]], "angle": [30.0]}}}
{"id": "javascript_17", "ground_truth": {"generateNotificationHandler": {"app": ["app"], "priorityLevel": [3], "messagingService": ["messagingSvc"], "notificationType": [2]}}}
{"id": "javascript_18", "ground_truth": {"calculateFinalVelocity": {"time": [5.0], "gravity": [9.81], "initialVelocity": [0.0]}}}
{"id": "javascript_19", "ground_truth": {"configureShaderMaterial": {"property": ["materialProps"], "textures": ["textureList"], "object3D": ["meshObject"]}}}
{"id": "javascript_20", "ground_truth": {"buttonAddClickHandler": {"element": ["myButton"], "callback": ["handleButtonClick"], "options": [{"stopPropagation": [true]}]}}}
{"id": "javascript_21", "ground_truth": {"findProductById": {"products": [["Product A", "Product B", "Product C"]], "id": [123]}}}
{"id": "javascript_22", "ground_truth": {"resetStateProperty": {"stateProperty": ["userSession"]}}}
{"id": "javascript_23", "ground_truth": {"createAuthToken": {"username": ["johndoe"], "validity": [3600], "options": [{"issuer": ["myapp.net"], "role": ["admin"], "algorithm": ["HS256"]}]}}}
{"id": "javascript_24", "ground_truth": {"getUniqueSorted": {"array": [[3, 1, 2, 1, 4, 3]]}}}
{"id": "javascript_25", "ground_truth": {"trackSubmitWithValidation": {"obj": ["formHandler"], "validationFlags": [["isRequired", "isValidEmail"]]}}}
{"id": "javascript_26", "ground_truth": {"contentUpdater": {"elementID": ["contentBox"], "newContent": ["Hello World"], "action": ["update"]}}}
{"id": "javascript_27", "ground_truth": {"validateReactProp": {"obj": ["serviceProvider"], "componentName": ["UserProfile"]}}}
{"id": "javascript_28", "ground_truth": {"filterBooksByAuthor": {"library": [["bookA", "bookB", "bookC"]], "author": ["J.K. Rowling"]}}}
{"id": "javascript_29", "ground_truth": {"EventScheduler": {"events": [{"setupStage": ["setupStageFunction"], "cleanupStage": ["cleanupStageFunction"]}], "concurrencyLimit": [3.0]}}}
{"id": "javascript_30", "ground_truth": {"setText": {"newText": ["Hello, World!"], "start": [5.0], "length": [7.0]}}}
{"id": "javascript_31", "ground_truth": {"transformAllDecoratorsOfDeclaration": {"node": ["myNode"], "container": ["myContainer"]}}}
{"id": "javascript_32", "ground_truth": {"pollQueue": {"queue": ["fileWatchQueue"], "pollingInterval": [500.0], "pollIndex": [0.0], "chunkSize": [10.0]}}}
{"id": "javascript_33", "ground_truth": {"emitNewLineBeforeLeadingComments": {"lineMap": ["tsLineMap"], "writer": ["tsWriter"], "node": [42]}}}
{"id": "javascript_34", "ground_truth": {"forEachType": {"type": ["unionTypeObj"], "f": ["processType"]}}}
{"id": "javascript_35", "ground_truth": {"areDeclarationFlagsIdentical": {"left": ["parameterObjects"], "right": ["variableDeclarationObject"]}}}
{"id": "javascript_36", "ground_truth": {"updateBreak": {"node": ["breakNode"], "label": ["loopEnd"]}}}
{"id": "javascript_37", "ground_truth": {"addInitializedPropertyStatements": {"statements": ["shapeStatements"], "property": [["width", "height"], ["height", "width"]], "receiver": ["shape"]}}}
{"id": "javascript_38", "ground_truth": {"getDirectoryToWatchFromFailedLookupLocationDirectory": {"dir": ["/projects/myApp/node_modules/react"], "dirPath": ["/projects/myApp/node_modules/react"]}}}
{"id": "javascript_39", "ground_truth": {"maybeAddJsSyntheticRestParameter": {"declaration": ["funcDeclaration"], "parameters": ["funcParameters"]}}}
{"id": "javascript_40", "ground_truth": {"assignOwnDefaults": {"objectValue": [12.0], "sourceValue": [10.0], "key": ["maxItems"], "object": [{}]}}}
{"id": "javascript_41", "ground_truth": {"queue": {"worker": ["myWorkerFunction"], "concurrency": [5.0], "payload": ["", 0.0]}}}
{"id": "javascript_42", "ground_truth": {"B": {"t": [5.0]}}}
{"id": "javascript_43", "ground_truth": {"invokeCallback": {"callback": ["processResult"], "error": ["null"], "value": ["Operation successful"]}}}
{"id": "javascript_44", "ground_truth": {"skipThrough": {"node": ["currentNode"], "st": ["nodeState"], "c": ["processNode"]}}}
{"id": "javascript_45", "ground_truth": {"Sde": {"t": ["https://github.com/yarnpkg/berry"], "e": [{"startingCwd": ["/home/user/projects"]}]}}}
{"id": "javascript_46", "ground_truth": {"vOe": {"r": ["packageInfo"], "e": ["version"], "t": ["1.2.3"]}}}
{"id": "javascript_47", "ground_truth": {"sTe": {"r": ["2023-04-01"], "e": ["2023-04-15"], "t": ["days"]}}}
{"id": "javascript_48", "ground_truth": {"updateDOMListeners": {"oldVnode": ["oldVirtualNode"], "vnode": ["newVirtualNode"]}}}
{"id": "javascript_49", "ground_truth": {"convertEnumeratedValue": {"key": ["contenteditable"], "value": ["plaintext-only"]}}}
Loading