Skip to content

Latest commit

 

History

History
20 lines (17 loc) · 1.33 KB

README.md

File metadata and controls

20 lines (17 loc) · 1.33 KB

Neural Bits Production Hub

This repository consists of code and articles on the Neural Bits Newsletter that showcase:

  • how to optimize, and quantize models for optimal performance
  • efficient model serving in production environments at scale

Categories

Model Optimization

ID 📝  Article 💻 Code Details Complexity Tech Stack
001 Inference Engines Profilling Here Profile a CNN model across PyTorch, ONNX, TensorRT, and TorchCompile 🟩🟩⬜ Python, Jupyter

Model Deployment

ID 📝  Article 💻 Code Details Complexity Tech Stack
002 Deploying DL models with NVIDIA Triton Inference Server Here Full tutorial on how to set-up and deploy ML models with Triton Inference Server 🟩🟩🟩 Python, Docker, Bash

Quantization Techniques

ID Article Code Details Complexity Tech Stack