Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fvd compute question #10

Open
Costwen opened this issue Nov 15, 2022 · 1 comment
Open

fvd compute question #10

Costwen opened this issue Nov 15, 2022 · 1 comment

Comments

@Costwen
Copy link

Costwen commented Nov 15, 2022

In StyleGAN-V,they resize the input image to 128x128 to compute the fvd metric.
image
But the official fvd metric use 224x224 as input.
What is the size of the input image in your work? It feels like everyone's treatment is different. Thank you!

@JunyaoHu
Copy link

JunyaoHu commented Mar 21, 2023

Hello, the following is my understanding. The approach is really inconsistent, and that's really a potential problem. However, the calculation of metrics such as FVD may be aligned with the usual used resolution of the dataset. In this article, the data set is read at the same resolution as the data, for example 64 for SMMNIST, 64 for KTH, 64 for BAIR, and 128 for CityScapes.

It is observed that the videos do not scale to calculate FVD, SSIM and PSNR after the data is read. In addition to calculating the LPIPS were scaled down to 128 . And 128 is just about CityScapes. We only report on the CityScapes using LPIPS in this checkpoints test, in fact, all data sets capture LPIPS in this checkpoints test, perhaps because the code is scaled to 128 resolution. So SMMNIST, KTH and BAIR did not report in the paper. It is also worth noting that in the LPIPS official code readme sample, 64 is used instead. So I think the resolution of the evaluation function is related to the common resolution of the data set.


您好,下面是我的理解。处理方法确实不统一,这确实是一个潜在的问题。但FVD等评估指标的计算,可能和需要研究的数据集通常的使用分辨率进行对齐即可。在本文中,数据集和数据读取时的分辨率一致,例如对于SMMNIST是64,KTH是64,BAIR是64,CityScapes是128。

观察到代码中并没有在数据读取后再进行放缩去计算FVD、SSIM、PSNR。除了计算LPIPS进行了放缩到128。而128仅仅和CitySpaces有关。本文只报告了CitySpaces数据集的LPIPS,但实际上根据本文作者在checkpoints里打包的文件来看,所有数据集都做了关于LPIPS的统计,也许是由于代码是缩放到128分辨率的,所以SMMNIST、KTH、BAIR没有报告在论文中。另外值得注意的是,在LPIPS官方代码readme样例中,使用却是64。所以我认为评估函数的分辨率大小和数据集的常用分辨率有关。

image_size: 64

image_size: 128

image_size: 64

def read_video(video, image_size):
cap = cv2.VideoCapture(video)
frames = []
while True:
# Capture frame-by-frame
ret, frame = cap.read()
# if frame is read correctly ret is True
if not ret:
# print("Can't receive frame (stream end?). Exiting ...")
break
# Our operations on the frame come here
gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
image = cv2.resize(gray, (image_size, image_size))
frames.append(image)
cap.release()
return frames

def to_i3d(x):
x = x.reshape(x.shape[0], -1, self.config.data.channels, self.config.data.image_size, self.config.data.image_size)
if self.config.data.channels == 1:
x = x.repeat(1, 1, 3, 1, 1) # hack for greyscale images
x = x.permute(0, 2, 1, 3, 4) # BTCHW -> BCTHW
return x
if (calc_fvd1 or (calc_fvd3 and not second_calc)) and real.shape[1] >= pred.shape[1]:
# real
if future == 0:
real_fvd = torch.cat([
cond_original[:, :self.config.data.num_frames_cond*self.config.data.channels],
real
], dim=1)[::preds_per_test] # Ignore the repeated ones
else:
real_fvd = torch.cat([
cond_original[:, :self.config.data.num_frames_cond*self.config.data.channels],
real,
cond_original[:, -future*self.config.data.channels:]
], dim=1)[::preds_per_test] # Ignore the repeated ones
real_fvd = to_i3d(real_fvd)
real_embeddings.append(get_fvd_feats(real_fvd, i3d=i3d, device=self.config.device))
# fake
if future == 0:
fake_fvd = torch.cat([
cond_original[:, :self.config.data.num_frames_cond*self.config.data.channels], pred], dim=1)
else:
fake_fvd = torch.cat([
cond_original[:, :self.config.data.num_frames_cond*self.config.data.channels],
pred,
cond_original[:, -future*self.config.data.channels:]
], dim=1)
fake_fvd = to_i3d(fake_fvd)
fake_embeddings.append(get_fvd_feats(fake_fvd, i3d=i3d, device=self.config.device))

pred_ij_pil = Transforms.ToPILImage()(pred_ij).convert("RGB")
real_ij_pil = Transforms.ToPILImage()(real_ij).convert("RGB")
# SSIM
pred_ij_np_grey = np.asarray(pred_ij_pil.convert('L'))
real_ij_np_grey = np.asarray(real_ij_pil.convert('L'))
if self.config.data.dataset.upper() == "STOCHASTICMOVINGMNIST" or self.config.data.dataset.upper() == "MOVINGMNIST":
# ssim is the only metric extremely sensitive to gray being compared to b/w
pred_ij_np_grey = np.asarray(Transforms.ToPILImage()(torch.round(pred_ij)).convert("RGB").convert('L'))
real_ij_np_grey = np.asarray(Transforms.ToPILImage()(torch.round(real_ij)).convert("RGB").convert('L'))
avg_ssim += ssim(pred_ij_np_grey, real_ij_np_grey, data_range=255, gaussian_weights=True, use_sample_covariance=False)
# Calculate LPIPS
pred_ij_LPIPS = T2(pred_ij_pil).unsqueeze(0).to(self.config.device)
real_ij_LPIPS = T2(real_ij_pil).unsqueeze(0).to(self.config.device)
avg_distance += model_lpips.forward(real_ij_LPIPS, pred_ij_LPIPS)

T2 = Transforms.Compose([Transforms.Resize((128, 128)),
Transforms.ToTensor(),
Transforms.Normalize(mean=(0.5, 0.5, 0.5),
std=(0.5, 0.5, 0.5))])

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants