Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

can-ai-code: Self-evaluating interview for AI coders #491

Open
1 task
irthomasthomas opened this issue Jan 31, 2024 · 0 comments
Open
1 task

can-ai-code: Self-evaluating interview for AI coders #491

irthomasthomas opened this issue Jan 31, 2024 · 0 comments
Labels
github gh tools like cli, Actions, Issues, Pages llm-applications Topics related to practical applications of Large Language Models in various fields llm-evaluation Evaluating Large Language Models performance and behavior through human-written evaluation sets New-Label Choose this option if the existing labels are insufficient to describe the content accurately openai OpenAI APIs, LLMs, Recipes and Evals source-code Code snippets

Comments

@irthomasthomas
Copy link
Owner

Title: the-crypt-keeper/can-ai-code: Self-evaluating interview for AI coders

A self-evaluating interview for AI coding models, written by humans and taken by AI.

Key Ideas

  • Interview questions written by humans, test taken by AI
  • Inference scripts for all common API providers and CUDA-enabled quantization runtimes
  • Sandbox enviroment (Docker-based) for untrusted Python and NodeJS code validation
  • Evaluate effects of prompting techniques and sampling parameters on LLM coding performance
  • Evaluate LLM coding performance degradation due to quantization

News

  • 2023-01-23: Evaluate mlabonne/Beyonder-4x7B-v2 (AWQ only, FP16 was mega slow).
  • 2

Suggested labels

{ "label-name": "interview-evaluation", "description": "Self-evaluating interview for AI coding models", "repo": "the-crypt-keeper/can-ai-code", "confidence": 96.49 }

@irthomasthomas irthomasthomas added github gh tools like cli, Actions, Issues, Pages llm-applications Topics related to practical applications of Large Language Models in various fields llm-evaluation Evaluating Large Language Models performance and behavior through human-written evaluation sets New-Label Choose this option if the existing labels are insufficient to describe the content accurately openai OpenAI APIs, LLMs, Recipes and Evals source-code Code snippets labels Jan 31, 2024
@irthomasthomas irthomasthomas changed the title the-crypt-keeper/can-ai-code: Self-evaluating interview for AI coders can-ai-code: Self-evaluating interview for AI coders Jan 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
github gh tools like cli, Actions, Issues, Pages llm-applications Topics related to practical applications of Large Language Models in various fields llm-evaluation Evaluating Large Language Models performance and behavior through human-written evaluation sets New-Label Choose this option if the existing labels are insufficient to describe the content accurately openai OpenAI APIs, LLMs, Recipes and Evals source-code Code snippets
Projects
None yet
Development

No branches or pull requests

1 participant