Document Python -> Rust Model Translation Best Practices #549

ralfbiedert · 2022-10-30T09:27:02Z

One follow up to #543, which also touches #545 and and answer you've given here (#174).

While you provide instructions how to re-create the weights from a specific Python version, would it be possible to provide a guide how to best replicate their appropriate architectures?

Background

When I tried to follow the Python instructions they worked alright to get the .ot, but then (taking the yolo example) you also need to "magically know" amongst others:

correct layers and their ordering
getting the right nn::func_t behavior
the VarStore / Path labels for each weight set
Postprocessing and label information (e.g., coco_classes.rs).
...

For simple networks this seems mildly guessable, but when I tried to re-create yolo3 I already ran into these issues, let alone when I was looking into yolo5, yolo6 or yolo7.

I tried to Python print() the models and using their outputs for guidance for a coarse outline, as well as inspecting the model blobs in a Python debugger, but for the more intricate parts I hit a wall pretty quickly having to step through the actual model source in minute detail, and still not being sure if I end up with the right thing.

Question

So, tl;dr, would you mind sharing or documenting your "best practice" how to convert "arbitrary" models to tch? In particular ...

Before you even start, are there "red flags" to tell right away converting a model won't be worth it?
1.1 When is it feasible to recreate a model with existing weights?
1.2. If recreation doesn't work, any thoughts on JIT?
Were there any technical / conversion reasons you picked YOLO_v3_tutorial_from_scratch and not, say, a model from torchhub?
What has been your general workflow to convert something like yolo3-from-scratch?
3.1 How did you determine the right layers and params (e.g., use a debugger, Python source line-by-line, ...)?
3.2 Same for custom functions? I assume these you have to get from source in any case.
3.3 Where do you get the VarStore labels from?
3.4 Do you have any debugging / QA tips (e.g., to actually verify all parameters / weights are correct w.r.t Python)?

I don't think this has to be overly long, but a few lines might help people to ensure they're on the right track and follow "best practices".

The text was updated successfully, but these errors were encountered:

LaurentMazare · 2022-10-31T09:44:36Z

It's hard to call this "best practices" but here is how I would think about it.

First, using a jit module saved from Python (your (1.2)) makes things straightforward as there is no need to write model specific Rust code in this case (see this tutorial if not already familiar with how to use this).

The only reasons I can see not to use a jit model would be customizing the model, running training rather than inference (though I think training can even be done with jit to some extent), or wanting to learn about the model by porting it to Rust.
Re (1.1) most models should be portable as PyTorch and tch share the same internal C++ library, maybe there are some tricky bits where the PyTorch version would use lots of custom Python code but I haven't seen this happening so far.

I don't think I ever tried to reverse engineer a model just by looking at the weight files or by printing the model itself, this would be quite tricky as you noticed and certainly error prone. I always used the Python implementation as a guide on how to implement the Rust version. An important point is that tch tries to mimic all the default behavior of PyTorch so as to make porting models easier, this is the case for variable initializations for example, or optional arguments in function calls.
I just pushed some Stable Diffusion example based on huggingface's diffusers library. The diffusers codebase being pretty large, this required a couple days of work but the Rust structure follows nicely the structure of the Python code and there wasn't any major issue along the way.

The VarStore label try to mimic the Python versions. PyTorch has some magic such that if in a module foo, you create self.bar = nn.Conv2d(...), this will add a bar to the path, using a sequence also adds the index of each layer to the path etc. In tch all this is explicit instead and most of the name will come from the field names of the Python implementation.

When it comes to tips and tricks, lots of issues are detected by shape mismatches when loading the weights, I use tensor-tools ls to also check the weights that are in the .ot file to check if there is some typo/anything unexpected. Usually after this things work well but if not the case it's always possible to line up the Python and Rust versions by printing the tensor values at different stages.

I'm not sure to remember how I ended up with this yolo-v3 version, my guess is that it was the first self-contained and reasonably simple implementation that I came along back then.

ralfbiedert · 2022-10-31T10:01:03Z

Thanks a lot for outlining your process, it helped me to confirm I'm vaguely on the right track, and tensor-tools ls was good to know!

Not sure if I should close this, it might be worth adding your answer to a FAQ?

LaurentMazare · 2022-10-31T21:23:59Z

Ah that's a good point, I've added a small FAQ section to the main readme.

ralfbiedert · 2022-11-01T08:12:08Z

Thanks!

ralfbiedert mentioned this issue Oct 30, 2022

Previous yolo-v3.ot outdated / yolo/main.rs needs update #543

Closed

ralfbiedert closed this as completed Nov 1, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Document Python -> Rust Model Translation Best Practices #549

Document Python -> Rust Model Translation Best Practices #549

ralfbiedert commented Oct 30, 2022

LaurentMazare commented Oct 31, 2022

ralfbiedert commented Oct 31, 2022

LaurentMazare commented Oct 31, 2022

ralfbiedert commented Nov 1, 2022

Document Python -> Rust Model Translation Best Practices #549

Document Python -> Rust Model Translation Best Practices #549

Comments

ralfbiedert commented Oct 30, 2022

Background

Question

LaurentMazare commented Oct 31, 2022

ralfbiedert commented Oct 31, 2022

LaurentMazare commented Oct 31, 2022

ralfbiedert commented Nov 1, 2022