different setup of input_hint_block compared to paper? #698

liren-jin · 2024-08-21T14:09:42Z

Hi, i noticed that the implementation of the tiny work converting control images into feature space is different from the structure menioned in the paper: "In particular, we use a tiny network E(·) of four convolution layers with 4 × 4 kernels and 2 × 2 strides (activated by ReLU, using 16, 32, 64, 128, channels respectively". The corresponding implementation should be here right(correct me if i am wrong):

ControlNet/cldm/cldm.py

Lines 147 to 163 in ed85cd1

    
           self.input_hint_block = TimestepEmbedSequential( 
        
               conv_nd(dims, hint_channels, 16, 3, padding=1), 
        
               nn.SiLU(), 
        
               conv_nd(dims, 16, 16, 3, padding=1), 
        
               nn.SiLU(), 
        
               conv_nd(dims, 16, 32, 3, padding=1, stride=2), 
        
               nn.SiLU(), 
        
               conv_nd(dims, 32, 32, 3, padding=1), 
        
               nn.SiLU(), 
        
               conv_nd(dims, 32, 96, 3, padding=1, stride=2), 
        
               nn.SiLU(), 
        
               conv_nd(dims, 96, 96, 3, padding=1), 
        
               nn.SiLU(), 
        
               conv_nd(dims, 96, 256, 3, padding=1, stride=2), 
        
               nn.SiLU(), 
        
               zero_module(conv_nd(dims, 256, model_channels, 3, padding=1)) 
        
           )

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

different setup of input_hint_block compared to paper? #698

different setup of input_hint_block compared to paper? #698

liren-jin commented Aug 21, 2024

different setup of input_hint_block compared to paper? #698

different setup of input_hint_block compared to paper? #698

Comments

liren-jin commented Aug 21, 2024