Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple improvements: #16

Merged
merged 16 commits into from
Mar 13, 2023
Merged

Conversation

marcuswestin
Copy link
Contributor

@marcuswestin marcuswestin commented Mar 13, 2023

  • Make dalai llama script idempotent:
    1: Don't re-download files that have already been downloaded
    2: Don't re-create models that have already been created
  • Allow for specifying which model to run in the UI
    • show an error message if the chosen model hasn't been downloaded and created yet
    • and show instructions for how to do download and create it if it doesn't exist (dalai llama <MODEL>)
  • Better log statements during download, model creation, and request serving
  • Create yarn scripts yarn dalai:llama <optional MODEL> and yarn start
  • Put the dalai llama directory in cwd by default, and allow for specifying where to put it with the LLAMA_DIR environment variable
  • Remove express url encoded warning message on server startup by making the extended option explicit (see https://github.com/expressjs/body-parser#extended for more info)
  • Allow for the server to optionally end with a "\n\n" message when the response is finished
  • Add ./dalai script in dalai repo to allow for easier local development testing of the CLI
  • Exit with success code 0 when dalai llama succeeds

@fermigas
Copy link

I've confirmed that download and quantization fixes work. The UI does, too.
I'm running 65B!

@marcuswestin
Copy link
Contributor Author

Thanks @fermigas

This patch should be a meaningful improvement for everyone that is playing with this I think

@cocktailpeanut would love a code review and merge, or pointers for what to update :)

Copy link

@fermigas fermigas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I manually made the changes I reviewed below and tested them.
Downloads and quantization code changes work fine. Approved.
All 4 models work with the new UI changes. Approved.

I did not apply the changes in the files I haven't marked "viewed".

@@ -9,14 +9,14 @@ const start = (port) => {
dalai.http(httpServer)
app.use(express.static(path.resolve(__dirname, 'public')))
app.use(express.json());
app.use(express.urlencoded());
app.use(express.urlencoded({ extended: true }));

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good. Fixes bug where server emits this line on stating:

body-parser deprecated undefined extended: provide extended option .npm/_npx/3c737cbb02d79cc9/node_modules/dalai/bin/web/index.js:12:19

Comment on lines +33 to +39
<select id="model">
<!-- options: 7B, 13B, 30B, 65B -->
<option value="7B">7B</option>
<option value="13B">13B</option>
<option value="30B">30B</option>
<option value="65B">65B</option>
</select>

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really useful, and works fine.
Might consider adding a text input box for # of threads, and temperature too. I'm finding the model pretty sensitive to temperature settings.

console.log(`Skip file download, it already exists: ${file}`)
continue;
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very useful. The 6th parameter file of the 65B download kept failing for me. I had to pull it from a torrent to get it to work, then had to do the quantization manually. This and the quantization file check below improve error recover a lot.

@@ -117,12 +132,19 @@ class Dalai {
}
}
async query(req, cb) {
console.log(`> query:`, req)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very useful. You might consider logging the model's output to the console, too.

Comment on lines +218 to +224
const outputFile1 = `./models/${model}/ggml-model-f16.bin${suffix}`
const outputFile2 = `./models/${model}/ggml-model-q4_0.bin${suffix}`
if (fs.existsSync(path.resolve(this.home, outputFile1)) && fs.existsSync(path.resolve(this.home, outputFile2))) {
console.log(`Skip quantization, files already exists: ${outputFile1} and ${outputFile2}}`)
continue
}
await this.exec(`./quantize ${outputFile1} ${outputFile2} 2`, this.home)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned above, this is a big improvement.

Copy link

@alcalawil alcalawil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@marcuswestin
Copy link
Contributor Author

👍 🙏

mirroredkube pushed a commit to mirroredkube/dalai that referenced this pull request Mar 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants