Research Guide for Video Frame Interpolation with Deep Learning

In this research guide, we’ll look at deep learning papers aimed at synthesizing video frames within an existing video.

By Derrick Mwiti, Data Scientist on October 15, 2019 in Computer Vision, Deep Learning, Neural Networks, Video recognition

comments

In this research guide, we’ll look at deep learning papers aimed at synthesizing video frames within an existing video. This could be in between video frames, known as interpolation, or after them, known as extrapolation.

The better part of this guide will cover interpolation. Interpolation is useful in software editing tools as well as in generating video animations. It can also be used to generate clear video frames in sections where a video is blurred.

Video frame interpolation is a very common task, especially in film and video production. Optical flow is one of the common tactics used in solving this problem. Optical Flow Estimation is the process of estimating the motion of each pixel in a sequence of frames. In this paper, we’ll look at advanced methods of video frame interpolation using deep learning techniques.

Video Frame Interpolation via Adaptive Separable Convolution (ICCV, 2017)

In this paper, the authors propose a deep fully convolutional neural network that’s fed with two input frames and estimates pairs of 1D kernels for all pixels. The method is capable of estimating kernels and synthesizing the entire video frame at once. This makes it possible to incorporate perceptual loss to train the neural network, in order to produce visually appealing frames.

Video Frame Interpolation via Adaptive Separable Convolution
Standard video frame interpolation methods first estimate optical flow between input frames and then synthesize an…

The paper introduces a spatially-adaptive separable convolution technique, which aims to interpolate a new frame in the middle of two video frames. The convolution-based interpolation method then estimates a pair of 2D convolution kernels. This is then used to convolve the two video frames in order to compute the color of the output pixel.

The pixel-dependent kernels capture both motion and re-sampling information that’s required for interpolation. Four sets of 1D kernels are estimated by directing the information flow into four sub-networks. Each of the subnetworks estimates one kernel. The Rectified Linear Unit is used with the 3x3 convolutional layers.

Research Guide for Video Frame Interpolation with Deep Learning

Video Frame Interpolation via Adaptive Separable Convolution (ICCV, 2017)

Video Frame Interpolation via Adaptive Convolution (CVPR 2017)

Video Frame Synthesis using Deep Voxel Flow (ICCV 2017)

Long-Term Video Interpolation with Bidirectional Predictive Network (2017)

PhaseNet for Video Frame Interpolation (CVPR 2018)

Super SloMo: High Quality Estimation of Multiple Intermediate Frames for Video Interpolation (CVPR 2018)

Depth-Aware Video Frame Interpolation (CVPR 2019)

Frame Interpolation with Multi-Scale Deep Loss Functions and Generative Adversarial Networks (2019)

Conclusion

More On This Topic

Latest Posts

Top Posts