Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add Alpaca #191

Merged
merged 19 commits into from
Oct 11, 2023
Merged

add Alpaca #191

merged 19 commits into from
Oct 11, 2023

Conversation

dkupnicki
Copy link
Collaborator

No description provided.

@jan-grzybek-ampere jan-grzybek-ampere self-requested a review July 17, 2023 15:15
model.eval()
if os.environ.get("TORCH_COMPILE") == "1":
aio = '_aio_profiler_print' in dir(torch._C)
model = apply_compile(model, aio)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe let's change that function to torch_compile_maybe() and check for TORCH_COMPILE==1 there?


def run_single_pass(pytorch_runner, dataset):
inputs = encode(dataset.get_input_array())
outputs = pytorch_runner.run(1, inputs=inputs.input_ids, max_new_tokens=100)
Copy link
Member

@jan-grzybek-ampere jan-grzybek-ampere Jul 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I think number of tokens do differ across runs? I.e. model outputs 0 <= n <= 100 tokens.

This creates a need to reimplement our functions again as it is run() call expecting to receive variable length of workload now (in the place where you hard-coded 1 at the moment). With this kind of models we won't know the magnitude of workload before actually calling run() so we need to add a way to pass length of generated output to runner somehow post run.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise we will have inaccurate throughput calculation in place.

utils/pytorch.py Outdated
A function replacing the last value in the self._workload_size with the new_task_size.
Useful for models where the task_size is unknown before finishing the run (eg. text generation models where the number of tokens generated has an upper bound but can be lower)
"""
self._workload_size[-1] = new_task_size
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works but overall mechanics is not too intuitive now.

I would change this a bit so that:

  1. task_size arg in runner.run() function has default value of None
  2. if the task size is not specified at runner.run() stage (task size value remains None) and then it's not set by the function here, and the execution proceeds to the next run() call, error is thrown. I.e. task size has to be set either prior or just after each call.
  3. runner.run() uses this function as well underneath passing the task size value to it for the convenience of the user. setting task size value both through runner.run(task_size=n) and then through declare_task_size(task_size=n) (my suggested naming in place of update_task_size()) direct call, results in error.
  4. all runners (tf, torch, ...) should have this functionality - maybe it can be implemented on root class level?

Copy link
Member

@jan-grzybek-ampere jan-grzybek-ampere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How I see this is:

  • by default, run() has task_size=None as it already is
  • if it is None, then run() function doesn't append anything to work_size array, why maintain a list of something redundant and then create expressions to check if it is array of legit values or not instead of just relying on its length
  • if the task size value, a positive int, has been provided to run() - append to work_size
  • when calculating perf metrics check if list of finish/start times is equal to length of workload_size, if your concern is that somebody will waste time waiting for results till the end to learn that it failed due to that then check after each run that workload_size has been updated either manually or by run() func

A function appending new_task_size to the self._workload_size if not set yet or replacing the last value with the new_task_size if the last value is None.
Useful for models where the task_size is unknown before finishing the run (eg. text generation models where the number of tokens generated has an upper bound but can be lower)
"""
if len(self._finish_times) > len(self._workload_size):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it would be safer to just look for an exact difference of 1, which is what is expected.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you did it by design for the user to be able to set arbitrary number of task sizes lower than runs completed at any moment, then maybe a better name for a func would be push_task_size().

@@ -34,7 +35,7 @@ def run(self, task_size, *args, **kwargs):

self._start_times.append(start)
self._finish_times.append(finish)
self._workload_size.append(task_size)
self.set_task_size(task_size)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if task size is None, maybe don't push it at all and expect user to take care of things manually

@@ -25,7 +25,8 @@ def __init__(self, model, tokenizer, compute_type):

print("\nRunning with CTranslate2\n")

def run(self, task_size, *args, **kwargs):
def run(self, task_size=None, *args, **kwargs):
assert len(self._workload_size) == 0 or self._workload_size[-1] is not None, "Task size for previous run has not been set"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so actually you don't expect user to be able to push multiple task sizes post inference (re. "Maybe it would be safer to just look for an exact difference of 1, which is what is expected.")

Example command for PyTorch:

```
TORCH_COMPILE=1 AIO_SKIP_MASTER_THREAD=1 python3 run.py -m path/to/alpaca_recovered
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we help user somehow obtain the actual model?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OSError: LLaMA does not appear to have a file named config.json. Checkout 'https://huggingface.co/LLaMA/None' for available files.
root@78e24df80b17:/ampere/ampere_model_library/natural_language_processing/text_
generation/alpaca# ls LLaMA
checklist.chk params.json tokenizer_checklist.chk
consolidated.00.pth tokenizer.model

A function appending new_task_size to the self._workload_size if not set yet or replacing the last value with the new_task_size if the last value is None.
Useful for models where the task_size is unknown before finishing the run (eg. text generation models where the number of tokens generated has an upper bound but can be lower)
"""
if len(self._finish_times) - len(self._workload_size) == 1:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently vulnerable to the wrong order of execution. Why not allow task size to be set before the given execution?
I.e.:

assert abs(len(self._finish_times) - len(self._workload_size)) == 1
self._workload_size.append(new_task_size)

Copy link
Member

@jan-grzybek-ampere jan-grzybek-ampere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please describe the process of running the model so that it's enough to run your snippets to get the model to work.

We also need the already converted model in our S3 for mzd use.

```bash
git clone https://huggingface.co/tatsu-lab/alpaca-7b-wdiff
cd alpaca-7b-wdiff
git lfs install
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

git lfs install
git: 'lfs' is not a git command. See 'git --help'.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some hint on installing git-lfs first maybe?

3. Convert the LLaMA weights to HuggingFace format
```bash
git clone https://github.com/huggingface/transformers.git
python transformers/src/transformers/models/llama/convert_llama_weights_to_hf.py --input_dir <path_to_llama> --model_size 7B --output_dir <output_path>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RuntimeError: Failed to import transformers.models.llama.modeling_llama because of the following error (look up to see its traceback):
No module named 'safetensors'

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

python3 alpaca-7b-wdiff/transformers/src/transformers/models/llama/convert_llama_weights_to_hf.py --input_dir alpaca_recovered --model_size 7B --output_dir output

FileNotFoundError: [Errno 2] No such file or directory: 'alpaca_recovered/7B/params.json'

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure you're passing the correct path to this script? convert_llama_weights_to_hf.py is used to convert the original weights from Meta (the ones you have to request access to download them). The params.json file should be there.
Screenshot 2023-09-11 at 14 26 10

I can see that you have alpaca_recovered in your path, which suggests to me that you used this script on the final Alpaca weights (converted from original weights to HF format and merged with weight diff from Stanford). You should be able to use these weights to run inference with the run.py script without any further changes.

tl;dr Are you trying to run step 3 with weights that are the output of step 4?

Copy link
Member

@jan-grzybek-ampere jan-grzybek-ampere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add transformers with Karol's branch as submodule so that AIO compatible version is used.

A function appending new_task_size to the self._workload_size if not set yet or replacing the last value with the new_task_size if the last value is None.
Useful for models where the task_size is unknown before finishing the run (eg. text generation models where the number of tokens generated has an upper bound but can be lower)
"""
assert abs(len(self._finish_times) - len(self._workload_size)) == 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will throw assertion error when setting task size prior to execution. Make it so one can use it either just before or just after single inference pass.

@@ -116,6 +116,14 @@ def abort_maybe(self):
def run(self, task_size: int, *args, **kwargs):
raise NotImplementedError

def set_task_size(self, new_task_size):
"""
A function appending new_task_size to the self._workload_size if not set yet or replacing the last value with the new_task_size if the last value is None.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is no longer accurate on what this func is doing.

utils/ort.py Outdated
start = time.time()
outputs = self.session.run(self._output_names, self._feed_dict)
finish = time.time()

self._start_times.append(start)
self._finish_times.append(finish)
self._workload_size.append(task_size)
if task_size is not None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe better to handle this inside the set_task_size function? i.e. when passed None, ignore?

"""
assert abs(len(self._finish_times) - len(self._workload_size)) == 1
self._workload_size.append(new_task_size)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After appending check if len(self._finish_times) - len(self._workload_size) in [-1, 0] i.e. whether workload_size is being kept on track with inference progress.

Copy link
Member

@jan-grzybek-ampere jan-grzybek-ampere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this ready for the re-review?

.gitmodules Outdated
@@ -16,3 +16,7 @@
[submodule "speech_recognition/whisper/whisper"]
path = speech_recognition/whisper/whisper
url = https://github.com/AmpereComputingAI/whisper.git
[submodule "utils/nlp/transformers"]
path = utils/nlp/transformers
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this branch of transformers is dedicated to llama only so maybe better place for it would be nlp/text_gen/alpaca/

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're pretty inconsistent in that matter. We have nnUNet, dlrm and DeepCTR submodules in their respective utils/<task> directories, but segment_anything, nanoGPT and whisper submodules are in the <task>/<model> directories.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that depends on whether datasets make use of submodule facilities or just the model runner

@dkupnicki
Copy link
Collaborator Author

Yes, it's ready for re-review.

Copy link
Member

@jan-grzybek-ampere jan-grzybek-ampere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like transformers mismatch version of our submodule vs pypi one will be giving us a hard time once we have to move to >4.32.x version but lets worry about that later.

@dkupnicki dkupnicki merged commit 286a747 into main Oct 11, 2023
1 check passed
@dkupnicki dkupnicki deleted the daniel/llama branch October 11, 2023 16:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants