-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add Alpaca #191
add Alpaca #191
Conversation
model.eval() | ||
if os.environ.get("TORCH_COMPILE") == "1": | ||
aio = '_aio_profiler_print' in dir(torch._C) | ||
model = apply_compile(model, aio) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe let's change that function to torch_compile_maybe() and check for TORCH_COMPILE==1 there?
|
||
def run_single_pass(pytorch_runner, dataset): | ||
inputs = encode(dataset.get_input_array()) | ||
outputs = pytorch_runner.run(1, inputs=inputs.input_ids, max_new_tokens=100) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, I think number of tokens do differ across runs? I.e. model outputs 0 <= n <= 100 tokens.
This creates a need to reimplement our functions again as it is run() call expecting to receive variable length of workload now (in the place where you hard-coded 1 at the moment). With this kind of models we won't know the magnitude of workload before actually calling run() so we need to add a way to pass length of generated output to runner somehow post run.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise we will have inaccurate throughput calculation in place.
utils/pytorch.py
Outdated
A function replacing the last value in the self._workload_size with the new_task_size. | ||
Useful for models where the task_size is unknown before finishing the run (eg. text generation models where the number of tokens generated has an upper bound but can be lower) | ||
""" | ||
self._workload_size[-1] = new_task_size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Works but overall mechanics is not too intuitive now.
I would change this a bit so that:
- task_size arg in runner.run() function has default value of None
- if the task size is not specified at runner.run() stage (task size value remains None) and then it's not set by the function here, and the execution proceeds to the next run() call, error is thrown. I.e. task size has to be set either prior or just after each call.
- runner.run() uses this function as well underneath passing the task size value to it for the convenience of the user. setting task size value both through runner.run(task_size=n) and then through declare_task_size(task_size=n) (my suggested naming in place of update_task_size()) direct call, results in error.
- all runners (tf, torch, ...) should have this functionality - maybe it can be implemented on root class level?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How I see this is:
- by default, run() has task_size=None as it already is
- if it is None, then run() function doesn't append anything to work_size array, why maintain a list of something redundant and then create expressions to check if it is array of legit values or not instead of just relying on its length
- if the task size value, a positive int, has been provided to run() - append to work_size
- when calculating perf metrics check if list of finish/start times is equal to length of workload_size, if your concern is that somebody will waste time waiting for results till the end to learn that it failed due to that then check after each run that workload_size has been updated either manually or by run() func
utils/benchmark.py
Outdated
A function appending new_task_size to the self._workload_size if not set yet or replacing the last value with the new_task_size if the last value is None. | ||
Useful for models where the task_size is unknown before finishing the run (eg. text generation models where the number of tokens generated has an upper bound but can be lower) | ||
""" | ||
if len(self._finish_times) > len(self._workload_size): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it would be safer to just look for an exact difference of 1, which is what is expected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you did it by design for the user to be able to set arbitrary number of task sizes lower than runs completed at any moment, then maybe a better name for a func would be push_task_size().
@@ -34,7 +35,7 @@ def run(self, task_size, *args, **kwargs): | |||
|
|||
self._start_times.append(start) | |||
self._finish_times.append(finish) | |||
self._workload_size.append(task_size) | |||
self.set_task_size(task_size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if task size is None, maybe don't push it at all and expect user to take care of things manually
utils/ctranslate.py
Outdated
@@ -25,7 +25,8 @@ def __init__(self, model, tokenizer, compute_type): | |||
|
|||
print("\nRunning with CTranslate2\n") | |||
|
|||
def run(self, task_size, *args, **kwargs): | |||
def run(self, task_size=None, *args, **kwargs): | |||
assert len(self._workload_size) == 0 or self._workload_size[-1] is not None, "Task size for previous run has not been set" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, so actually you don't expect user to be able to push multiple task sizes post inference (re. "Maybe it would be safer to just look for an exact difference of 1, which is what is expected.")
Example command for PyTorch: | ||
|
||
``` | ||
TORCH_COMPILE=1 AIO_SKIP_MASTER_THREAD=1 python3 run.py -m path/to/alpaca_recovered |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we help user somehow obtain the actual model?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OSError: LLaMA does not appear to have a file named config.json. Checkout 'https://huggingface.co/LLaMA/None' for available files.
root@78e24df80b17:/ampere/ampere_model_library/natural_language_processing/text_
generation/alpaca# ls LLaMA
checklist.chk params.json tokenizer_checklist.chk
consolidated.00.pth tokenizer.model
utils/benchmark.py
Outdated
A function appending new_task_size to the self._workload_size if not set yet or replacing the last value with the new_task_size if the last value is None. | ||
Useful for models where the task_size is unknown before finishing the run (eg. text generation models where the number of tokens generated has an upper bound but can be lower) | ||
""" | ||
if len(self._finish_times) - len(self._workload_size) == 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently vulnerable to the wrong order of execution. Why not allow task size to be set before the given execution?
I.e.:
assert abs(len(self._finish_times) - len(self._workload_size)) == 1
self._workload_size.append(new_task_size)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please describe the process of running the model so that it's enough to run your snippets to get the model to work.
We also need the already converted model in our S3 for mzd use.
```bash | ||
git clone https://huggingface.co/tatsu-lab/alpaca-7b-wdiff | ||
cd alpaca-7b-wdiff | ||
git lfs install |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
git lfs install
git: 'lfs' is not a git command. See 'git --help'.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some hint on installing git-lfs first maybe?
3. Convert the LLaMA weights to HuggingFace format | ||
```bash | ||
git clone https://github.com/huggingface/transformers.git | ||
python transformers/src/transformers/models/llama/convert_llama_weights_to_hf.py --input_dir <path_to_llama> --model_size 7B --output_dir <output_path> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RuntimeError: Failed to import transformers.models.llama.modeling_llama because of the following error (look up to see its traceback):
No module named 'safetensors'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
python3 alpaca-7b-wdiff/transformers/src/transformers/models/llama/convert_llama_weights_to_hf.py --input_dir alpaca_recovered --model_size 7B --output_dir output
FileNotFoundError: [Errno 2] No such file or directory: 'alpaca_recovered/7B/params.json'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure you're passing the correct path to this script? convert_llama_weights_to_hf.py
is used to convert the original weights from Meta (the ones you have to request access to download them). The params.json
file should be there.
I can see that you have alpaca_recovered
in your path, which suggests to me that you used this script on the final Alpaca weights (converted from original weights to HF format and merged with weight diff from Stanford). You should be able to use these weights to run inference with the run.py
script without any further changes.
tl;dr Are you trying to run step 3 with weights that are the output of step 4?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add transformers with Karol's branch as submodule so that AIO compatible version is used.
utils/benchmark.py
Outdated
A function appending new_task_size to the self._workload_size if not set yet or replacing the last value with the new_task_size if the last value is None. | ||
Useful for models where the task_size is unknown before finishing the run (eg. text generation models where the number of tokens generated has an upper bound but can be lower) | ||
""" | ||
assert abs(len(self._finish_times) - len(self._workload_size)) == 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will throw assertion error when setting task size prior to execution. Make it so one can use it either just before or just after single inference pass.
utils/benchmark.py
Outdated
@@ -116,6 +116,14 @@ def abort_maybe(self): | |||
def run(self, task_size: int, *args, **kwargs): | |||
raise NotImplementedError | |||
|
|||
def set_task_size(self, new_task_size): | |||
""" | |||
A function appending new_task_size to the self._workload_size if not set yet or replacing the last value with the new_task_size if the last value is None. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this is no longer accurate on what this func is doing.
utils/ort.py
Outdated
start = time.time() | ||
outputs = self.session.run(self._output_names, self._feed_dict) | ||
finish = time.time() | ||
|
||
self._start_times.append(start) | ||
self._finish_times.append(finish) | ||
self._workload_size.append(task_size) | ||
if task_size is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe better to handle this inside the set_task_size function? i.e. when passed None, ignore?
""" | ||
assert abs(len(self._finish_times) - len(self._workload_size)) == 1 | ||
self._workload_size.append(new_task_size) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After appending check if len(self._finish_times) - len(self._workload_size) in [-1, 0]
i.e. whether workload_size is being kept on track with inference progress.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this ready for the re-review?
.gitmodules
Outdated
@@ -16,3 +16,7 @@ | |||
[submodule "speech_recognition/whisper/whisper"] | |||
path = speech_recognition/whisper/whisper | |||
url = https://github.com/AmpereComputingAI/whisper.git | |||
[submodule "utils/nlp/transformers"] | |||
path = utils/nlp/transformers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this branch of transformers is dedicated to llama only so maybe better place for it would be nlp/text_gen/alpaca/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're pretty inconsistent in that matter. We have nnUNet, dlrm and DeepCTR submodules in their respective utils/<task>
directories, but segment_anything, nanoGPT and whisper submodules are in the <task>/<model>
directories.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that depends on whether datasets make use of submodule facilities or just the model runner
Yes, it's ready for re-review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like transformers mismatch version of our submodule vs pypi one will be giving us a hard time once we have to move to >4.32.x version but lets worry about that later.
No description provided.