Skip to content

Hands-on hub to learn techniques to optimize and serve AI models to production the most optimal way.

Notifications You must be signed in to change notification settings

neural-bits/production-hub

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Neural Bits Production Hub

This repository consists of code and articles on the Neural Bits Newsletter that showcase:

  • how to optimize, and quantize models for optimal performance
  • efficient model serving in production environments at scale

Categories

Model Optimization

ID 📝  Article 💻 Code Details Complexity Tech Stack
001 Inference Engines Profilling Here Profile a CNN model across PyTorch, ONNX, TensorRT, and TorchCompile 🟩🟩⬜ Python, Jupyter

Model Deployment

ID 📝  Article 💻 Code Details Complexity Tech Stack
002 Deploying DL models with NVIDIA Triton Inference Server Here Full tutorial on how to set-up and deploy ML models with Triton Inference Server 🟩🟩🟩 Python, Docker, Bash

Quantization Techniques

ID Article Code Details Complexity Tech Stack

About

Hands-on hub to learn techniques to optimize and serve AI models to production the most optimal way.

Topics

Resources

Stars

Watchers

Forks

Languages