fvd compute question #10

Costwen · 2022-11-15T15:42:45Z

In StyleGAN-V，they resize the input image to 128x128 to compute the fvd metric.

But the official fvd metric use 224x224 as input.
What is the size of the input image in your work? It feels like everyone's treatment is different. Thank you!

JunyaoHu · 2023-03-21T11:08:55Z

Hello, the following is my understanding. The approach is really inconsistent, and that's really a potential problem. However, the calculation of metrics such as FVD may be aligned with the usual used resolution of the dataset. In this article, the data set is read at the same resolution as the data, for example 64 for SMMNIST, 64 for KTH, 64 for BAIR, and 128 for CityScapes.

It is observed that the videos do not scale to calculate FVD, SSIM and PSNR after the data is read. In addition to calculating the LPIPS were scaled down to 128 . And 128 is just about CityScapes. We only report on the CityScapes using LPIPS in this checkpoints test, in fact, all data sets capture LPIPS in this checkpoints test, perhaps because the code is scaled to 128 resolution. So SMMNIST, KTH and BAIR did not report in the paper. It is also worth noting that in the LPIPS official code readme sample, 64 is used instead. So I think the resolution of the evaluation function is related to the common resolution of the data set.

您好，下面是我的理解。处理方法确实不统一，这确实是一个潜在的问题。但FVD等评估指标的计算，可能和需要研究的数据集通常的使用分辨率进行对齐即可。在本文中，数据集和数据读取时的分辨率一致，例如对于SMMNIST是64，KTH是64，BAIR是64，CityScapes是128。

观察到代码中并没有在数据读取后再进行放缩去计算FVD、SSIM、PSNR。除了计算LPIPS进行了放缩到128。而128仅仅和CitySpaces有关。本文只报告了CitySpaces数据集的LPIPS，但实际上根据本文作者在checkpoints里打包的文件来看，所有数据集都做了关于LPIPS的统计，也许是由于代码是缩放到128分辨率的，所以SMMNIST、KTH、BAIR没有报告在论文中。另外值得注意的是，在LPIPS官方代码readme样例中，使用却是64。所以我认为评估函数的分辨率大小和数据集的常用分辨率有关。

mcvd-pytorch/configs/smmnist_DDPM_big5.yml

Line 58 in 451da2e

image_size: 64

mcvd-pytorch/configs/kth64_big.yml

Line 57 in 451da2e

image_size: 64

mcvd-pytorch/configs/cityscapes.yml

Line 58 in 451da2e

image_size: 128

mcvd-pytorch/configs/bair.yml

Line 58 in 451da2e

image_size: 64

mcvd-pytorch/datasets/kth_convert.py

Lines 35 to 56 in 226a3fd

    
           def read_video(video, image_size): 
        
               cap = cv2.VideoCapture(video) 
        
               frames = [] 
        
               while True: 
        
                   # Capture frame-by-frame 
        
                   ret, frame = cap.read() 
        
                   # if frame is read correctly ret is True 
        
                   if not ret: 
        
                       # print("Can't receive frame (stream end?). Exiting ...") 
        
                       break 
        
                   # Our operations on the frame come here 
        
                   gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY) 
        
                   image = cv2.resize(gray, (image_size, image_size)) 
        
                   frames.append(image) 
        
               cap.release() 
        
               return frames

mcvd-pytorch/runners/ncsn_runner.py

Lines 1918 to 1953 in 226a3fd

    
           def to_i3d(x): 
        
               x = x.reshape(x.shape[0], -1, self.config.data.channels, self.config.data.image_size, self.config.data.image_size) 
        
               if self.config.data.channels == 1: 
        
                   x = x.repeat(1, 1, 3, 1, 1) # hack for greyscale images 
        
               x = x.permute(0, 2, 1, 3, 4)  # BTCHW -> BCTHW 
        
               return x 
        
           if (calc_fvd1 or (calc_fvd3 and not second_calc)) and real.shape[1] >= pred.shape[1]: 
        
               # real 
        
               if future == 0: 
        
                   real_fvd = torch.cat([ 
        
                       cond_original[:, :self.config.data.num_frames_cond*self.config.data.channels], 
        
                       real 
        
                   ], dim=1)[::preds_per_test]    # Ignore the repeated ones 
        
               else: 
        
                   real_fvd = torch.cat([ 
        
                       cond_original[:, :self.config.data.num_frames_cond*self.config.data.channels], 
        
                       real, 
        
                       cond_original[:, -future*self.config.data.channels:] 
        
                   ], dim=1)[::preds_per_test]    # Ignore the repeated ones 
        
               real_fvd = to_i3d(real_fvd) 
        
               real_embeddings.append(get_fvd_feats(real_fvd, i3d=i3d, device=self.config.device)) 
        
               # fake 
        
               if future == 0: 
        
                   fake_fvd = torch.cat([ 
        
                       cond_original[:, :self.config.data.num_frames_cond*self.config.data.channels], pred], dim=1) 
        
               else: 
        
                   fake_fvd = torch.cat([ 
        
                       cond_original[:, :self.config.data.num_frames_cond*self.config.data.channels], 
        
                       pred, 
        
                       cond_original[:, -future*self.config.data.channels:] 
        
                   ], dim=1) 
        
               fake_fvd = to_i3d(fake_fvd) 
        
               fake_embeddings.append(get_fvd_feats(fake_fvd, i3d=i3d, device=self.config.device))

mcvd-pytorch/runners/ncsn_runner.py

Lines 1759 to 1774 in 226a3fd

    
           pred_ij_pil = Transforms.ToPILImage()(pred_ij).convert("RGB") 
        
           real_ij_pil = Transforms.ToPILImage()(real_ij).convert("RGB") 
        
           # SSIM 
        
           pred_ij_np_grey = np.asarray(pred_ij_pil.convert('L')) 
        
           real_ij_np_grey = np.asarray(real_ij_pil.convert('L')) 
        
           if self.config.data.dataset.upper() == "STOCHASTICMOVINGMNIST" or self.config.data.dataset.upper() == "MOVINGMNIST": 
        
               # ssim is the only metric extremely sensitive to gray being compared to b/w  
        
               pred_ij_np_grey = np.asarray(Transforms.ToPILImage()(torch.round(pred_ij)).convert("RGB").convert('L')) 
        
               real_ij_np_grey = np.asarray(Transforms.ToPILImage()(torch.round(real_ij)).convert("RGB").convert('L')) 
        
           avg_ssim += ssim(pred_ij_np_grey, real_ij_np_grey, data_range=255, gaussian_weights=True, use_sample_covariance=False) 
        
           # Calculate LPIPS 
        
           pred_ij_LPIPS = T2(pred_ij_pil).unsqueeze(0).to(self.config.device) 
        
           real_ij_LPIPS = T2(real_ij_pil).unsqueeze(0).to(self.config.device) 
        
           avg_distance += model_lpips.forward(real_ij_LPIPS, pred_ij_LPIPS)

mcvd-pytorch/runners/ncsn_runner.py

Lines 1427 to 1430 in 226a3fd

    
           T2 = Transforms.Compose([Transforms.Resize((128, 128)), 
        
                        Transforms.ToTensor(), 
        
                        Transforms.Normalize(mean=(0.5, 0.5, 0.5), 
        
                                            std=(0.5, 0.5, 0.5))])

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fvd compute question #10

fvd compute question #10

Costwen commented Nov 15, 2022

JunyaoHu commented Mar 21, 2023 •

edited

Loading

fvd compute question #10

fvd compute question #10

Comments

Costwen commented Nov 15, 2022

JunyaoHu commented Mar 21, 2023 • edited Loading

JunyaoHu commented Mar 21, 2023 •

edited

Loading