I'm currently trying to implement the following paper: https://research.nvidia.com/sites/default/files/publications/dnn_denoise_author.pdf
My data has the following shape: (7, 512, 512, 1), where 7 is the number of frames in my sequence, 512 is the width and height of an image and 1 the number of channels.
My question is: During training, is it better to feed the convolutional and RNN network the full sequence of images, or feed each frame of a sequence one by one?
I've already tried the first approach, but the results don't look too good. Therefore, is this the "correct" way of processing image sequences, or you have some advice?
Thank you for your time!