A simple Android app that performs on-device face recognition by comparing FaceNet embeddings against a vector database of user-given faces
- 2024-09: Add face-spoof detection which uses FASNet from minivision-ai/Silent-Face-Anti-Spoofing
- 2024-07: Add latency metrics on the main screen. It shows the time taken (in milliseconds) to perform face detection, face embedding and vector search.
- Produce on-device face embeddings with FaceNet and use them to perform face recognition on a user-given set of images
- Store face-embedding and other metadata on-device and use vector-search to determine nearest-neighbors
- Use modern Android development practices and recommended architecture guidelines while maintaining code simplicity and modularity
Clone the main
branch,
$> git clone --depth=1 https://github.com/shubham0204/OnDevice-Face-Recognition-Android
Perform a Gradle sync, and run the application.
The app provides two FaceNet models differing in the size of the embedding they provide. facenet.tflite
outputs a 128-dimensional embedding and facenet_512.tflite
a 512-dimensional embedding. In FaceNet.kt, you may change the model by modifying the path of the TFLite model,
// facenet
interpreter =
Interpreter(FileUtil.loadMappedFile(context, "facenet.tflite"), interpreterOptions)
// facenet-512
interpreter =
Interpreter(FileUtil.loadMappedFile(context, "facenet_512.tflite"), interpreterOptions)
For change embeddingDims
in the same file,
// facenet
private val embeddingDim = 128
// facenet-512
private val embeddingDim = 512
Then, in DataModels.kt, change the dimensions of the faceEmbedding
attribute,
@Entity
data class FaceImageRecord(
// primary-key of `FaceImageRecord`
@Id var recordID: Long = 0,
// personId is derived from `PersonRecord`
@Index var personID: Long = 0,
var personName: String = "",
// the FaceNet-512 model provides a 512-dimensional embedding
// the FaceNet model provides a 128-dimensional embedding
@HnswIndex(dimensions = 512)
var faceEmbedding: FloatArray = floatArrayOf()
)
We use the FaceNet model, which given a 160 * 160 cropped face image, produces an embedding of 128 or 512 elements capturing facial features that uniquely identify the face. We represent the embedding model as a function
- When users select an image, the app uses MLKit's
FaceDetector
to crop faces from the image. Each image is labelled with the person's name. SeeMLKitFaceDetector.kt
. - Each cropped face is transformed into a vector/embedding with FaceNet. See
FaceNet.kt
. - We store these face embeddings in a vector database, that enables a faster nearest-neighbor search.
- Now, in the camera preview, for each frame, we perform face detection with MLKit's
FaceDetector
as in (1) and produce face embeddings for the face as in (2). We compare this face embedding (query vector) with those present in the vector database, and determines the name/label of the embedding (nearest-neighbor) closest to the query vector using cosine similarity. - The vector database performs a lossy compression on the embeddings stored in it, and hence the distance returned with the nearest-neighbor is also an estimate. Hence, we re-compute the cosine similarity between the nearest-neighbor vector and the query vector. See
ImageVectorUseCase.kt
- TensorFlow Lite as a runtime to execute the FaceNet model
- Mediapipe Face Detection to crop faces from the image
- ObjectBox for on-device vector-store and NoSQL database
See issue #1
Face-liveness detection is the process of determining if the face captured in the camera frame is real or a spoof (photo, 3D model etc.). There are many techniques to perform face-liveness detection, the simplest ones being smile or wink detection. These are effective against static spoofs (pictures or 3D models) but do not hold for videos.
While exploring the deepface library, I discovered that it had implemented an anti-spoof detection system using the PyTorch models from Silent-Face-Anti-Spoofing repository. It uses the combination of two models that operate on two different scales of the same image. The model is penalized for classification-loss (cross-entropy loss) and the difference between the Fourier transform and the intermediate features from the CNN.
The models used by the deepface
library (same as in the Silent-Face-Anti-Spoofing
) are in the PyTorch format. The project already uses the TFLite runtime for executing the FaceNet model, and adding any other DL runtime would lead to unnecessary bloating of the application.
I converted the PT models to TFLite using this notebook: https://github.com/shubham0204/OnDevice-Face-Recognition-Android/blob/main/resources/Liveness_PT_Model_to_TF.ipynb
How does this project differ from my earlier FaceRecognition_With_FaceNet_Android
project?
The FaceRecognition_With_FaceNet_Android is a similar project initiated in 2020 and re-iterated several times since then. Here are the key similarities and differences with this project:
- Use FaceNet and FaceNet-512 models executed with TensorFlow Lite
- Perform on-device face-recognition on a user-given dataset of images
- Uses ObjectBox to store face embeddings and perform nearest-neighbor search.
- Does not read a directory from the file-system, instead allows the user to select a group of photos and label them with name of a person
- Considers only the nearest-neighbor to infer the identify of a person in the live camera-feed
- Uses the Mediapipe Face Detector instead of MLKit