Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regarding codebook #8

Closed
SaiChandra3030 opened this issue Jul 3, 2023 · 10 comments
Closed

Regarding codebook #8

SaiChandra3030 opened this issue Jul 3, 2023 · 10 comments
Labels
inference Good for newcomers Visualization Extra attention is needed

Comments

@SaiChandra3030
Copy link

Hey, this was a fantastic repo I found in my research from the last few weeks I am trying to understand some code things from your repo is it possible for you to solve my issue below written

  1. The codebook is missing will I get this thing after training the model, I have seen the code also but it was not written.
  2. can I use the same codebook that was present CODEBOOK
  3. After Getting BVH, is there anywhere to convert it into the human avatar image?

Waiting for the solution :)

Thanks
Sai

@YoungSeng
Copy link
Owner

YoungSeng commented Jul 4, 2023

Dear Sai,

Sorry for the confusing codes, you should use sample.py rather than inference.py, I have deleted the main/mydiffusion_zeggs/inference.py. And, this work hasn't used the codebook.

Best wishes.

@SaiChandra3030
Copy link
Author

SaiChandra3030 commented Jul 4, 2023

Hi YoungSeng, Thanks For your reply.

Taking your reply into consideration I started playing with the sample.py

  1. First it worked fine with file 015_Happy_4_x_1_0.wav named this format
  2. I tried with the normal name like `1.wav' the sample.py is throwing the below error
Traceback (most recent call last):
  File "/content/drive/MyDrive/DiffuseStyleGesture/main/mydiffusion_zeggs/sample.py", line 418, in <module>
    main(config, save_dir, config.model_path, audio_path=None, mfcc_path=None, audiowavlm_path=config.audiowavlm_path, max_len=config.max_len)
  File "/content/drive/MyDrive/DiffuseStyleGesture/main/mydiffusion_zeggs/sample.py", line 378, in main
    style = style2onehot[audiowavlm_path.split('/')[-1].split('_')[1]]
IndexError: list index out of range

do we have any particular format that needs to be given as the input file name, can you please help me with this.

Regarding Input Format

In what format do we need to send the input with the size and shape of the input file could please help with this also.

Thanks
Sai

@YoungSeng
Copy link
Owner

Dear Sai,

The code is a hard demo, if you want to use your own audio, you can comment out

style = style2onehot[audiowavlm_path.split('/')[-1].split('_')[1]]

and uncomment any of the following lines

# style = [0, 0, 1, 0, 0, 0]
# style = style2onehot['Neutral']

to choose your own Style and Intensity as

style2onehot = {
'Happy':[1, 0, 0, 0, 0, 0],
'Sad':[0, 1, 0, 0, 0, 0],
'Neutral':[0, 0, 1, 0, 0, 0],
'Old':[0, 0, 0, 1, 0, 0],
'Angry':[0, 0, 0, 0, 1, 0],
'Relaxed':[0, 0, 0, 0, 0, 1],
}

Hope this will help you!

@SaiChandra3030
Copy link
Author

Hi YoungSeng, Thanks For You Time and Reply

I am facing a shape error, can you please mention the shape and size of the file need to be given as the input

Traceback (most recent call last):
  File "/content/drive/MyDrive/DiffuseStyleGesture/main/mydiffusion_zeggs/sample.py", line 420, in <module>
    main(config, save_dir, config.model_path, audio_path=None, mfcc_path=None, audiowavlm_path=config.audiowavlm_path, max_len=config.max_len)
  File "/content/drive/MyDrive/DiffuseStyleGesture/main/mydiffusion_zeggs/sample.py", line 384, in the main
    inference(args, wavlm_model, mfcc, sample_fn, model, n_frames=max_len, smoothing=True, SG_filter=True, minibatch=True, skip_timesteps=0, style=style, seed=123456)      # style2onehot['Happy']
  File "/content/drive/MyDrive/DiffuseStyleGesture/main/mydiffusion_zeggs/sample.py", line 233, in inference
    audio_reshape = torch.from_numpy(audio).to(torch.float32).reshape(num_subdivision, int(stride_poses * 16000 / 20)).to(mydevice).transpose(0, 1)       # mfcc[:, :-2]
RuntimeError: shape '[4, 64000]' is invalid for input of size 237867

Looking forward :)

Model file :- './model000450000.pt'

Thanks
Sai

@YoungSeng
Copy link
Owner

it seems to be the problems of the shape of audio, do you set a max_len that more than the length of real audio? You may try to set max_len equal to 0. If you still have this problem, please upload the audio file. I will check it.

@SaiChandra3030
Copy link
Author

Hi YoungSeng, Thanks for your Time

  1. I have run the code, BVH file got generated in "./sample_dir", is there any way to convert it into mkv
  2. I am looking to convert directly "bvh" to "some persons image mp4" rendered video can I know if it is possible or can I know the process for it. I will work on it.

Thanks
Sai

@YoungSeng
Copy link
Owner

Hey Sai,

In practice, I highly recommend using Blender visualization bvh. Similar software are maya, motionbuilder, I have tried them and found Blender more friendly. You can easily perform importing audio, rendering video, or even writing a script like Trimodal.

You can also get a video of the skeleton in Python. Please ref to this issue.

There are some repositories for visualization and you can also try, such as PyMO, npybvh, and Python_BVH_viewer, although I don't really recommend them.

Good luck!

@SaiChandra3030
Copy link
Author

Hi YounSeng,

I have tried a lot but I am not getting how to convert this BVH File to 3D Video With Audio, I need little help. is there any repo or any models or code to like what I needed

Thanks
Sai

@YoungSeng
Copy link
Owner

I recommend you the method I use:

  • Download blender, it is free! And install it.
  • Import .bvh file and you can play it:
  • For render, setting some parameters:
  • Then render:
  • To add audio:

@YoungSeng YoungSeng added Visualization Extra attention is needed inference Good for newcomers labels Jul 6, 2023
@sysu19351118
Copy link

I also encountered this problem, my audio is about 2 seconds, I set max_lenth=0, but still get this error:
Traceback (most recent call last):
File "sample.py", line 442, in
main(config, save_dir, config.model_path, audio_path=None, mfcc_path=None, audiowavlm_path=config.audiowavlm_path, max_len=config.max_len)
File "sample.py", line 406, in main
inference(args, wavlm_model, mfcc, sample_fn, model, n_frames=max_len, smoothing=True, SG_filter=True, minibatch=True, skip_timesteps=0, style=style, seed=123456) # style2onehot['Happy']
File "sample.py", line 237, in inference
audio_reshape = torch.from_numpy(audio).to(torch.float32).reshape(num_subdivision, int(stride_poses * 16000 / 20)).to(mydevice).transpose(0, 1) # mfcc[:, :-2]
RuntimeError: shape '[4, 64000]' is invalid for input of size 36480

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
inference Good for newcomers Visualization Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants