- Multi-camera Networks research notes. Target venues: system conferences (OSDI/SOSP/ATC/EuroSys), network (NSDI/SIGCOMM/SoCC), mobile (MobiCom/MobiSys/SenSys/UbiComp), data analytics (VLDB/SIGMOD) and computer vision (ICCV/CVPR/ECCV/ICML/ICLR/NeurIPS).
- Unlike book, I collect papers from system and AI perspective, respectively. To avoid diving into details of specific vision tasks (eg., object detection), I only list low-resource learning, domain adaptation & continual learning and dynamic deep neural networks in AI Algorithm because I think these three topics are generalized on all vision tasks and are useful to help us deploy deep learning based vision applications. In the end, I list datasets and useful toolboxes.
Note: specific vision algorithms (tracking, object detection, segmentation and action recognition) are not collected in this note. If you want to learn or try them, you can refer to SenseTime-CHUK Open-MMLab, which provides a suit of toolboxes to help AI researcher/engineers implement vision algorithms. For example, you can try 50+ image-based object detection models using the same mmdetection API and try 10+ video-based object detection methods using the same mmtracking API.
- Book and Survey - a starting point to understand basic concepts behind multi-camera networks
- Researchers, Workshops and Courses - follow them to get recent research trends in multi-camera networks
- Topics - group recent papers in different sub-topics (i.e., Camera calibration)
- System
- Edge video analytics - speed up analysis pipeline
- Configuration search - search most suitable configuration file
- Database - distributed data processing
- Video streaming - video compression
- Resource management - resource managment
- Prediction serving and model update - model exchange, prediction serving, model monitoring and model updates
- Multi-Camera Collaboration - improve performance and reduce deployments' cost
- Privacy - data privacy, model privacy and computation privacy
- AI Algorithm
- Low-resource learning - efficient learning under limited data/annotations/computation/(time)
- In AI, low-resource learning is often named low-shot learning (few-/one-/zero-shot learning), which expect to retrain or train from the sratch with only a few new data. Inspired by style transfer, many image/speech synthesis tasks leverage Adaptive Instance Normalization layer (AdaIN) to calibrate the distribution of inputing data. But in object detection, there is not existing many works on low-resource learning and I only found two related papers (SpotTune, CVPR'19, Citation=159 and Budget-Aware Adapters, ICCV'19, Citation=10), which are not based on detection architectures and suitable for all CNN models.
- Domain adaptation and continual learning - robustness and sustainability
- For continual learning, most AI works focus on how to learn unseen classes and how to memory seen classes (avoid catastrophic forgetting). Thus, it is also named incremental learning.
- For domain adaptation, AI researchers target to improve generalization of existing pretrained models. Based on given target data (labeled or unlabeled), existing algorithms can be split into two categories: (1) supervised retraining; (2) unsupervised domain adaptation (source-free and source-target-joint training).
- Recent works about Model Exchange & Serving and Model Monitoring & Updates are summarized in this slide provided by Architecture of ML Systems (SS2021, Graz University of Technology).
- Dynamic deep neural networks - computing flexibility
- Low-resource learning - efficient learning under limited data/annotations/computation/(time)
- System
- Dataset - test your ideas on popular datasets
- Toolbox - verify your ideas quickly using toolbox
- Multi-Camera Networks: Principles and Applications. 2005.
- Camera Networks: The Acquisition and Analysis of Videos over Wide Areas (Synthesis Lectures on Computer Vision). 2012.
- M.Valera et al. Intelligent distributed surveillance systems: a review. 2005.
- Wang et al. Intelligent multi-camera video surveillance: a review. 2012.
- Ye et al. Wireless Video Surveillance: A Survey. 2013.
- Zhang et al. Deep Learning in Mobile and Wireless Networking: A Survey. IEEE TRANS 2019.
- System (live video analytics, distributed computing, video streaming, privacy, collaborative/continual learning)
- Matthias Boehm (Graz University of Technology, Austria) - data management and deep learning based data analytics
- Arun Kumar (University of California San Diego, USA) - data management and deep learning based data analytics
- Ganesh Ananthanarayanan (Microsoft Research, USA) - live video analytics, distributed computing
- Yuanchao Shu (Microsoft Research, USA) - live video analytics, collobarative/continual learning
- Feng Qian (University of Minnesota Twin Cities, USA) - video streaming
- Juncheng Jiang (The University of Chicago, USA) - video streaming
- Ravi Netravali (Princeton, USA) - edge video AI
- Fengyuan Xu (Nanjing University, China) - the Internet of Video Things (IoVT) and privacy-preserving edge AI
- Shivaram Venkataraman (University of Wisconsin-Madison, USA) - real-time video processing
- Deep based algorithms (tracking, object detection, segmentation and action recognition)
- Andrea Cavallaro (Queen Mary University of London, UK) - multi-modal fusion, privacy-aware video analytics (based on adversarial-training/learning)
- Amit K. Roy-Chowdhury (UC Riverside, USA) - tracking, reID, super-resolution and domain adaptation
- Jenq-Neng Hwang (University of Washington, USA) - tracking, reID, localization and visual odometry
- Hamed Haddadi (Imperial College London, UK) - privacy-preserving edge AI
- Ying Wu (Northwestern, USA) - tracking, detection, reID and segmentation
- Gaoang Wang (Zhejiang University, China) - scene-aware multi-object tracking
- Haibin Ling (Stony Brook University, USA) - visual tracking in drones
- Mubarak Shah (University of Central Florida, USA) - zero/few-shot learning in video based tracking/segmentation/action recognition
- Ming-Hsuan Yang (UC Merced, USA) - low-resources (data or compute) learning for tracking/detection/segmentation
- The 3rd Workshop on Hot Topics in Video Analytics and Intelligent Edges (ACM MobiCom'21) - focus on deep learning based video analytics
- Multi-camera Multiple People Tracking Workshop (IEEE ICCV'21) - track multiple people from indoor scenes using multiple RGB cameras
- Multimedia Systems Conference (ACM MMSys'21) - contain multiple topics in video analysis
- CS294: Machine Learning Systems (Fall 2019, Berkeley) - contain all concepts/background behind machine learning systems (the best reference website!)
- 706.550: Architecture of ML Systems (Summer 2021, Graz University of Technology) - the architecture and essential concepts of modern ML systems for both local and large-scale machine learning (based on non-deep ML analytics)
- CS231A: Computer Vision, From 3D Reconstruction to Recognition (Winter 2021, Stanford) - focus on basic concepts behind many computer vision tasks across multi-camera networks (camera models, calibration, single- and multiple-view geometry, stereo systems, sfm, stereo, matching, depth estimation, optical flow and optimal estimation)
- COS 598a: Machine Learning-Driven Video Systems (Spring 2022, Princeton) - target to recent research interests on video analytics (Strong Recommendation)
- CS34702 Topics in Networks: Machine Learning for Networking and Systems (Fall 2020, UChicago) - target to awesome recent research works on netwoking system (video streaming and cloud scheduing are recommended)
- CSE 234: Data Systems for Machine Learning (Fall 2021, UCSD) - focus on the lifecycle of ML-based data analytics, including data sourcing and preparation for ML, programming models and systems for scalable ML model building, and systems for faster ML deployment
- CSE 291F: Advanced Data Analytics and ML Systems (Winter 2019, UCSD) - the emerging area of advanced data analytics and ML systems, at the intersection of data management, ML/AI, and systems.
- CS6465: Emerging Cloud Technologies and Systems Challenges (Fall 2019, Cornell) - emerging cloud computing technology, opportunities and challenges.
[1] Li et al. Reducto: On-Camera Filtering for Resource-Efficient Real-Time Video Analytics. In SIGCOMM'20.
[2] Xu et al. Video Analytics with Zero-streaming Cameras. In ATC'21.
[3] Jha et al. Visage: Enabling Timely Analytics for Drone Imagery. In MobiCom'21.
[4] Jiang et al. Flexible High-resolution Object Detection on Edge Devices with Tunable Latency. In MobiCom'21.
[5] Han et al. LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision. In MobiCom'21.
[6] Zhang et al. Elf: Accelerate High-resolution Mobile Deep Vision with Content-aware Parallel Offloading. In MobiCom'21.
[7] Xiao et al. Towards Performance Clarity of Edge Video Analytics. In SEC'21.
[1] Y. Yan et al. Learning in situ: a randomized experiment in video streaming. In NSDI'20.
[2] Kim et al. Neural-Enhanced Live Streaming: Improving Live Video Ingest via Online Learning. In SIGCOMM'20.
[3] Du et al. Server-Driven Video Streaming for Deep Learning Inference. In SIGCOMM'20.
[4] Han et al. ViVo: Visibility-aware Mobile Volumetric Video Streamin. In MobiCom'20.
[5] Zhang et al. SENSEI: Aligning Video Streaming Quality with Dynamic User Sensitivity. In NSDI'21.
[1] Zhang et al. The Design and Implementation of a Wireless Video Surveillance System. In MobiCom'15.
[2] Xu et al. Approximate Query Service on Autonomous IoT Cameras. In MobiSys'20.
[3] Bhardwaj et al. Ekya: Continuous Learning of Video Analytics Models on Edge Compute Servers. In NSDI'22. - target to solve when to retrain models and how to reduce resource usage for multi-tasks (many inference and retraining tasks).
[4] Zhou et al. Octo: INT8 Training with Loss-aware Compensation and Backward Quantization for Tiny On-device Learning. In ATC'21.
[1] Suprem et al. ODIN: Automated Drift Detection and Recovery in Video Analytics. In VLDB'21. - target to detect domain drift and update corresponding models automatically.
[2] Romero et al. INFaaS: Automated Model-less Inference Serving. In ATC'21. Best paper award! - the first model-less prediction serving system
[3] Feng et al. Palleon: A Runtime System for Efficient Video Processing toward Dynamic Class Skew. In ATC'21. - model selection based on the automatically detected class skews
[4] Wang et al. SmartHarvest: Harvesting Idle CPUs Safely and Efficiently in the Cloud. In EuroSys'21. - identify and harvest idle resources
[5] Hu et al. Scrooge: A Cost-Effective Deep Learning Inference System. In SoCC'21. - consider input complexity
[6] Ling et al. RT-mDL: Supporting Real-Time Mixed Deep Learning Tasks on Edge Platforms. In SenSys'21. - scheduling multiple DL jobs in resource-constrainted devices
[7] Schelter et al. Learning to Validate the Predictions of Black Box Classifiers on Unseen Data. In SIGMOD'20. - a tool to monitor models' performance without annotations
[8] Agarwal et al. Boggart: Accelerating Retrospective Video Analytics via Model-Agnostic Ingest Processing. arxiv prePrint 2106.15315.
[9] Gunasekaran et al. Cocktail: Leveraging Ensemble Learning for Optimized Model Serving in Public Cloud. In NSDI'22. - expect to improve prediction serving's performance via ensembling learning
[1] Jain et al. Scaling Video Analytics Systems to Large Camera Deployments. In HotMobile'19.
[2] Liu et al. Who2com: Collaborative Perception via Learnable Handshake Communication. In ICRA'20.
[3] Liu et al. When2com: Multi-Agent Perception via Communication Graph Grouping. In CVPR'20.
[4] Zeng et al. Distream: Scaling Live Video Analytics withWorkload-Adaptive Distributed Edge Intelligence. In SenSys'20.
[5] Jain et al. Spatula: Efficient cross-camera video analytics on large camera networks. In SEC'20. Best Paper Award!
[6] Tong et al. Large-Scale Vehicle Trajectory Reconstruction with Camera Sensing Network. In MobiCom'21.
Useful external links | Keywords |
---|---|
Tutorial on privacy-preserving data analysis (The Alan Turing Institute) | todo |
The Second AAAI Workshop on Privacy-Preserving Artificial Intelligence (PPAI-21) | todo |
A Dive into Privacy Preserving Machine Learning (OpML'20) | todo |
CrypTen (Facebook AI Research) | Privacy Preserving Machine Learning framework, PyTorch, Multi-Party Computation (MPC) |
[1] (TAMU and Adobe Research) Wu et al. Towards Privacy-Preserving Visual Recognition via Adversarial Training: A Pilot Study. In ECCV'18.
[2] (CMU) Wang et al. Enabling Live Video Analytics with a Scalable and Privacy-Aware Framework. In 2018 ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM'18).
[3] (KAIST, USTC, Rice, NJU, SNU, PKU and MSRA) Lee et al. Occlumency: Privacy-preserving Remote Deep-learning Inference Using SGX. In MobiCom'19.
[4] (NUS) Shen et al. Human-imperceptible Privacy Protection Against Machines. In MM'19.
[5] (PSU and Facebook) Khazbak et al. TargetFinder: Privacy Preserving Target Search through IoT Cameras. In IoTDI'19 (Best Paper Award).
[6] (Tsinghua and USTC) Li et al. Invisible: Federated Learning over Non-Informative Intermediate Updates against Multimedia Privacy Leakages. In MM'20.
[7] (UCB and MSR) Poddar et al. Visor: Privacy-Preserving Video Analytics as a Cloud Service. In 29th Usenix Security Symposium (Security'20).
[8] (ICL, QMUL, Telefónica Research and Samsung AI) Mo et al. DarkneTZ: Towards Model Privacy at the Edge using Trusted Execution Environments. In MobiSys'20.
[9] (NJU, Cornell and MSRA) Wu et al. PECAM: privacy-enhanced video streaming and analytics via securely-reversible transformation. In MobiCom'21.
[10] (ASU) Hu et al. LensCap: Split-Process Framework for Fine-Grained Visual Privacy Control for Augmented Reality Apps. In MobiSys'21.
[11] (CUHK) Ouyang et al. ClusterFL: A Similarity-Aware Federated Learning System for Human Activity Recognition. In MobiSys'21.
[12] (ICL and Telefónica Research) Mo et al. PPFL: Privacy-preserving Federated Learning with Trusted Execution Environments. In MobiSys'21 (Best paper award).
[13] (CMU, UCSD and MSR) Dsouza et al. Amadeus: Scalable, Privacy-Preserving Live Video Analytics. arXiv prePrint 2011.05163.
[14] (MIT, Princeton, UChicago and Rutgers) Cangialosi et al. Privid: Practical, Privacy-Preserving Video Analytics Queries. In NSDI'22.
[1] H. Aghdam et al. Active Learning for Deep Detection Neural Networks. In ICCV'19. Public Code Note
xxx
xxx
- Duke MTMC (8 cameras, non-overlapping)
- Nvidia CityFlow (>40 cameras, overlapping and non-overlapping)
- EPFL WildTrack (7 cameras, overlapping)
- EPFL-RLC (3 cameras, overlapping)
- CMU Panoptic Dataset (>50 cameras, overlapping)
- University of Illinois STREETS (100 cameras, non-overlapping)
- Awesome reID dataset
- CHUK-mmcv: a foundational python library for computer vision research and supports many research projects (2D/3D detection, semantic segmentation, image and video editing, pose estimation, action understanding and image classification).
- JDCV-fastreid: a python library implementing SOTA re-identification methods (including pedestrian and vehicle re-identification). They also provided a good documentation.
- Cheetah: an end-to-end deep learning based prediction serving server that speeds up deployment of image classification, object detection, segmentation and tracking techniques, which is based on NVIDIA Trition server and docker.
- Chameleon: an efficient continuous adaptation framework based on NVIDIA TAO.