[User] train-text-from-scratch.exe stop before"begin training" #2131

SkibaSAY · 2023-07-07T12:23:49Z

Prerequisites

Please answer the following questions for yourself before submitting an issue.

[+] I am running the latest code. Development is very rapid so there are no tagged versions as of now.
[+] I carefully followed the README.md.
[+] I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
[+] I have looked through Discussions, I have a bug similar to train-text-from-scratch.exe stop after "begin training" (tensor->src0 is null) #1869)

Expected Behavior

The command will end after training, the trained model will be saved

Current Behavior

The execution is completed before the start of the training. There are no errors and no trained model.

Environment and Context

Windows 11 Pro(version 21H2)
Intel(R) Core(TM) i5-12600KF 3.70 GHz
RAM 32,0
Cuda 12.0

$ Python 3.10.11
$ cmake version 3.27.0-rc3

Failure Information (for bugs)

It looks like #1869.
I checked, my version already contains the fixes suggested here 5ec8dd5

Steps to Reproduce

1.I have bowed the last master(master-481f793).
2.run train-text-from-scratch, but the application terminates without an error before reaching "begin training":

(from llama.cpp): build\bin\Release\train-text-from-scratch.exe --vocab-model models\ggml-vocab.bin --checkpoint-in chk-lamartine-256x16.bin --checkpoint-out chk-lamartine-256x16.bin --model-out ggml-lamartine-265x16-f32.bin --train-data "shakespeare.txt"

3.execution ends before calling line 3195 printf("%s: begin training\n", func);
in the train-text-from-scratch file

Failure Logs

D:\torrents\LlamaCppTest\llama.cpp>build\bin\Release\train-text-from-scratch.exe --vocab-model models\ggml-vocab.bin --checkpoint-in chk-lamartine-256x16.bin --checkpoint-out chk-lamartine-256x16.bin --model-out ggml-lamartine-265x16-f32.bin --train-data "shakespeare.txt"
main: seed: 1688730007
llama.cpp: loading model from models\ggml-vocab.bin
llama_model_load_internal: format = ggjt v1 (pre #1405)
llama_model_load_internal: n_vocab = 32000
llama_model_load_internal: n_ctx = 512
llama_model_load_internal: n_embd = 4096
llama_model_load_internal: n_mult = 256
llama_model_load_internal: n_head = 32
llama_model_load_internal: n_layer = 32
llama_model_load_internal: n_rot = 128
llama_model_load_internal: ftype = 1 (mostly F16)
llama_model_load_internal: n_ff = 11008
llama_model_load_internal: model size = 7B
main: tokenize training data
main: number of training tokens: 27584
print_params: n_vocab: 32000
print_params: n_ctx: 128
print_params: n_embd: 256
print_params: n_mult: 256
print_params: n_head: 8
print_params: n_ff: 768
print_params: n_layer: 16
print_params: n_rot: 32
main: number of unique tokens: 3070
main: init model
load_checkpoint: Training iterations: 0.
load_checkpoint: Training samples: 0.
load_checkpoint: Training tokens: 0.
main: opt iter 0
used_mem model+cache: 244466304 bytes

D:\torrents\LlamaCppTest\llama.cpp>

My Task(I need advice or an opinion on this task - everything will be useful.)

I need to refine the model based on my data (I have a set of html files from which I need to extract information about the winner (tender purchases) - parsing tools are ineffective because html is generated from ordinary files and we don't know anything about their structure in advance. I have a set of examples for which I know the correct answer, using these examples I want to train a new model or retrain an existing openassistant-llama-30b-ggml-q5_1.bin

If you have any tips or information on how I can retrain an existing model based on my data, please share them - thank you.

The text was updated successfully, but these errors were encountered:

mqy · 2023-07-09T14:52:25Z

I checked out commit 481f793, rebuilt on macOS:

rm -r build/*
cd build
cmake ..
cmake --build . --config Release
./bin/train-text-from-scratch  --vocab-model ../models/ggml-vocab.bin   --checkpoint-in  chk-shakespeare-256x16.bin  --checkpoint-out chk-shakespeare-256x16.bin --model-out ggml-shakespeare-256x16-f32.bin --train-data "../models/shakespeare.txt"

...
used_mem model+cache: 244466304 bytes
main: begin training
main: opt->params.adam.sched 0.00000

Suggest you double check that:

you can run main.
shakespeare.txt exists in llama.cpp/ and not empty?
The program will abort with an error If the train file does not exist.

Finally, do a clean rebuild?

SkibaSAY · 2023-07-10T06:54:22Z

Thank you, after rebuilding the project, the problem is gone, the training has gone =)

I think the problem is that I could have missed this command: cmake --build . --config Release

SkibaSAY · 2023-07-10T07:01:29Z

If it's not difficult for you, please tell me, do you know how to additionally train an existing model using llama.cpp?

As I understand it, train-text-from-scratch is not suitable, tk creates a model from scratch.

mqy · 2023-07-10T07:11:07Z

If it's not difficult for you, please tell me, do you know how to additionally train an existing model using llama.cpp?

llama.cpp is targeting at inference.

SkibaSAY · 2023-07-10T12:04:01Z

Ok, I found information about training my model, maybe it will be useful to someone:
https://tproger.ru/articles/kak-sozdat-prilozhenie-s-nejrosetyu-na-baze-llm-alpaca-bystro-i-prosto/
https://github.com/tatsu-lab/stanford_alpaca/tree/main

I hope they won't ban me for links, I want to help those who are also digging in this direction.
Thank for help =)

SkibaSAY closed this as completed Jul 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[User] train-text-from-scratch.exe stop before"begin training" #2131

[User] train-text-from-scratch.exe stop before"begin training" #2131

SkibaSAY commented Jul 7, 2023

mqy commented Jul 9, 2023

SkibaSAY commented Jul 10, 2023 •

edited

Loading

SkibaSAY commented Jul 10, 2023

mqy commented Jul 10, 2023

SkibaSAY commented Jul 10, 2023

[User] train-text-from-scratch.exe stop before"begin training" #2131

[User] train-text-from-scratch.exe stop before"begin training" #2131

Comments

SkibaSAY commented Jul 7, 2023

Prerequisites

Expected Behavior

Current Behavior

Environment and Context

Failure Information (for bugs)

Steps to Reproduce

Failure Logs

My Task(I need advice or an opinion on this task - everything will be useful.)

mqy commented Jul 9, 2023

SkibaSAY commented Jul 10, 2023 • edited Loading

SkibaSAY commented Jul 10, 2023

mqy commented Jul 10, 2023

SkibaSAY commented Jul 10, 2023

SkibaSAY commented Jul 10, 2023 •

edited

Loading