PoseEstimation

Requirements (not complete)

pip install ultralytics==8.2.18
pip install mediapipe==0.10.11

# pip install decord

Download weights

mkdir checkpoints
wget -O checkpoints/hand_landmarker.task -q https://storage.googleapis.com/mediapipe-models/hand_landmarker/hand_landmarker/float16/1/hand_landmarker.task
wget -O checkpoints/pose_landmarker_full.task -q https://storage.googleapis.com/mediapipe-models/pose_landmarker/pose_landmarker_full/float16/latest/pose_landmarker_full.task
wget -O checkpoints/face_landmarker.task -q https://storage.googleapis.com/mediapipe-models/face_landmarker/face_landmarker/float16/latest/face_landmarker.task

Predict keypoints in parallel

Descriptions:

input: folder with clips
output: spatial cropped clip with size (512, 512) and predicted keypoints in json file with same name as clip
crop is created based on sign space around the person
if input clip is not square, shorter side will be padded with (114,114,114) color
clips without predictions or multiple predictions are skipped
script can run in parallel, each process can access its own index file or select index files randomly

Prepare index files in advance (not necessary).

from pose_prediction_parallel import create_index_files
clip_folder = ""
index_folder = ""
num_index_files = 100

create_index_files(clip_folder, index_folder, num_index_files)

Run multiple parallel jobs. If index files does not exist, first job will crete them. If index_file_id is not specified, clips will be processed randomly.

# create 100 index files and process fill index_file_000
# additional processes can be run in parallel with different index_file_id
python pose_prediction_parallel.py \
    --input_folder data/clips \
    --output_folder data/cropped_clips \
    --tmp_folder data/tmp_clips \
    --num_index_files 100 \
    --index_path data/index_files \
    --index_file_id 0 \
    --checkpoint_folder checkpoints \
    --sign_space 4 \
    --yolo_sign_space 4

Parallel jobs can be run as array job PBS example:

#PBS -J 0-9

python pose_prediction_parallel.py \
  --index_file_id "$PBS_ARRAY_INDEX"
  # ...

Create normalized keypoint feature h5 dataset

Descriptions:

Converts features from json files into h5
Structure of the h5: {"video_name_00": {clip_name_00: features_00_00, clip_name_01: features_00_01, ...}, ...}
Shape of the features: number of frames x embedding dimension
Face keypoints are reduced (see data/h2s.py -> How2SignDatasetJSON.face_landmarks)
Keypoint prediction scripts does not save leg keypoints
Keypoints are normalized before saving:
- global-pose_landmarks
- local-right_hand_landmarks
- local-left_hand_landmarks
- local-face_landmarks
Local normalization: moves keypoints to origin adds square padding and normalizes the values in local space -> captures local shape, independent of position in space and scale
Global normalization: keypoints are normalized in relation to signing space -> captures absolute position and relation between parts
If the name of the clips is not in the format: video_name.time_stamp.mp4 annotation file with columns SENTENCE_NAME and VIDEO_ID should be provided

python create_keypoint_features.py \
  --input_folder data/cropped_clips
  --output_folder data/features
  --dataset_name h2s \
  --split_name train \
  --annotation_file data\how2sign_realigned_train.csv   # only if the name is in bad format

Keypoint dataset

data/keypoint_dataset.py -> KeypointDatasetJSON Description:

Load and normalize keypoints from json files

Output:

List of clip keypoints for one video

# output example
[
   {
      'data': np.empty([n_frames_00, 208]),
      'video_name': 'video_name_00',
      'clip_name': 'clip_name_00'
   },
   {
      'data': np.empty([n_frames_01, 208]),
      'video_name': 'video_name_00',
      'clip_name': 'clip_name_01'
   },
]

Predict

Descriptions:

Prediction script for demo

from predict_pose import predict_pose, create_mediapipe_models#

# load models: 
#   hand_landmarker.task
#   pose_landmarker_full.task
#   face_landmarker.task
checkpoint_folder = ""
models = create_mediapipe_models(checkpoint_folder)

# predict
video = []
prediction = predict_pose(video, models, sign_space=4, yolo_sign_space=4)

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
data		data
examples		examples
.gitignore		.gitignore
README.md		README.md
create_keypoint_features.py		create_keypoint_features.py
pose_prediction_parallel.py		pose_prediction_parallel.py
predict_pose.py		predict_pose.py
stats.ipynb		stats.ipynb
stats.py		stats.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PoseEstimation

Requirements (not complete)

Download weights

Predict keypoints in parallel

Descriptions:

Create normalized keypoint feature h5 dataset

Keypoint dataset

Predict

About

Releases

Packages

Contributors 4

Languages

JSALT2024/PoseEstimation

Folders and files

Latest commit

History

Repository files navigation

PoseEstimation

Requirements (not complete)

Download weights

Predict keypoints in parallel

Descriptions:

Create normalized keypoint feature h5 dataset

Keypoint dataset

Predict

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages