Stopping the flicker when you restyle a video frame by frame A developer describes a technique to eliminate flicker when applying diffusion-based restyling to video frames. The approach stylizes only a sparse set of keyframes and uses optical flow to warp those stylized frames to fill the gaps, ensuring temporal consistency. The implementation includes scene detection, keyframe selection, and a VideoSequence class for managing frame sequences. Run a diffusion restyle on every frame of a clip, one frame at a time, and the still images look great. Then you play them back and the whole thing boils. Textures crawl, colors pulse, a brick wall shifts its grout lines every frame. The model did nothing wrong on any single frame. It just made a slightly different choice each time, and at 24 frames a second your eye reads those differences as flicker. This is a walkthrough of the code that kills that flicker. The trick is to stop restyling every frame. Stylize a few frames, then carry the style to the rest by following the motion. The interesting part is the bookkeeping and the blending that make the carry invisible, so I will spend most of the post there. Each frame is a separate sample from the model, so each frame lands in a slightly different place. Played in sequence, those small differences become flicker. Photo: Unsplash. A diffusion model starts from noise and walks toward an image. Two frames of a video that look almost identical to you are still two different starting points and two different walks. The model has no memory of what it drew last frame, so it picks a fresh interpretation of "oil painting" or "anime" each time. On a still you never notice. In motion you see the model changing its mind 24 times a second. You can lower the denoise strength so the model stays close to the input, but then you barely restyle anything. You can feed the previous frame back in, which helps a little and drifts a lot. The cleaner answer is structural: restyle only a sparse set of frames, and fill the gaps by warping a real stylized frame into place. A warped frame cannot disagree with itself between neighbors, because it is the same pixels pushed along the motion. This is the idea behind Rerender A Video Yang et al., SIGGRAPH Asia 2023 and behind EbSynth Jamriška et al., ACM ToG 2019 , and it is what the code below implements. Keyframes come from scene detection plus a fixed interval. Inside the scene detector: if interval: scene frames.extend range start frame, end frame, interval scene frames.append end frame - 1 The default interval is 10. So inside each detected scene you take every tenth frame, plus the last frame of the scene, as keyframes. Those are the only frames the diffusion model ever touches. Everything between two keyframes is going to be synthesized by warping, not by the model. Pick the interval too large and motion outruns the warp; pick it too small and you pay for diffusion you did not need. Ten is a reasonable middle for most footage. Once you commit to "stylize keyframes, propagate the rest," you inherit a filing problem. For every gap between keyframe i and keyframe i+1 you need the right input frames, the right output paths, the right optical-flow files, and the right guide images, in both the forward and backward direction. Get one index off and a frame lands in the wrong folder. VideoSequence in video sequence.py is the class that does this filing. It is constructed with the list of keyframes here called frame files with interval and it makes one output subdirectory per keyframe: self. frame files with interval = f for f in frame files with interval if ".png" in f self. n seq = len self. frame files with interval ... out subdir = self. get out subdir frame file out