Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
reduce VRAM memory usage by half during model loading
* This moves the call to half() before model.to(device) to avoid GPU copy of full model. Improves speed and reduces memory usage dramatically * This fix contributed by @mh-dm (Mihai)
- Loading branch information