{"slug": "stopping-the-flicker-when-you-restyle-a-video-frame-by-frame", "title": "Stopping the flicker when you restyle a video frame by frame", "summary": "A developer describes a technique to eliminate flicker when applying diffusion-based restyling to video frames. The approach stylizes only a sparse set of keyframes and uses optical flow to warp those stylized frames to fill the gaps, ensuring temporal consistency. The implementation includes scene detection, keyframe selection, and a VideoSequence class for managing frame sequences.", "body_md": "Run a diffusion restyle on every frame of a clip, one frame at a time, and the still images look great. Then you play them back and the whole thing boils. Textures crawl, colors pulse, a brick wall shifts its grout lines every frame. The model did nothing wrong on any single frame. It just made a slightly different choice each time, and at 24 frames a second your eye reads those differences as flicker.\n\nThis is a walkthrough of the code that kills that flicker. The trick is to stop restyling every frame. Stylize a few frames, then carry the style to the rest by following the motion. The interesting part is the bookkeeping and the blending that make the carry invisible, so I will spend most of the post there.\n\n*Each frame is a separate sample from the model, so each frame lands in a slightly different place. Played in sequence, those small differences become flicker. Photo: Unsplash.*\n\nA diffusion model starts from noise and walks toward an image. Two frames of a video that look almost identical to you are still two different starting points and two different walks. The model has no memory of what it drew last frame, so it picks a fresh interpretation of \"oil painting\" or \"anime\" each time. On a still you never notice. In motion you see the model changing its mind 24 times a second.\n\nYou can lower the denoise strength so the model stays close to the input, but then you barely restyle anything. You can feed the previous frame back in, which helps a little and drifts a lot. The cleaner answer is structural: restyle only a sparse set of frames, and fill the gaps by warping a real stylized frame into place. A warped frame cannot disagree with itself between neighbors, because it is the same pixels pushed along the motion. This is the idea behind Rerender A Video (Yang et al., SIGGRAPH Asia 2023) and behind EbSynth (Jamriška et al., ACM ToG 2019), and it is what the code below implements.\n\nKeyframes come from scene detection plus a fixed interval. Inside the scene detector:\n\n```\nif interval:\n    scene_frames.extend(range(start_frame, end_frame, interval))\n    scene_frames.append(end_frame - 1)\n```\n\nThe default `interval`\n\nis 10. So inside each detected scene you take every tenth frame, plus the last frame of the scene, as keyframes. Those are the only frames the diffusion model ever touches. Everything between two keyframes is going to be synthesized by warping, not by the model. Pick the interval too large and motion outruns the warp; pick it too small and you pay for diffusion you did not need. Ten is a reasonable middle for most footage.\n\nOnce you commit to \"stylize keyframes, propagate the rest,\" you inherit a filing problem. For every gap between keyframe `i`\n\nand keyframe `i+1`\n\nyou need the right input frames, the right output paths, the right optical-flow files, and the right guide images, in both the forward and backward direction. Get one index off and a frame lands in the wrong folder. `VideoSequence`\n\nin `video_sequence.py`\n\nis the class that does this filing.\n\nIt is constructed with the list of keyframes (here called `frame_files_with_interval`\n\n) and it makes one output subdirectory per keyframe:\n\n```\nself.__frame_files_with_interval = [f for f in frame_files_with_interval if \".png\" in f]\nself.__n_seq = len(self.__frame_files_with_interval)\n# ...\nout_subdir = self.__get_out_subdir(frame_file)   # out_<keyframe-name>\n```\n\nThe core method is `get_input_sequence`\n\n. Given a gap index `i`\n\n, it returns the list of input frame paths in that gap:\n\n``` python\ndef get_input_sequence(self, i, is_forward=True):\n    if i + 1 > len(self.__frame_files_with_interval) - 1:\n        # last gap: run from the final keyframe to the true last frame of the video\n        last_input_frame = self.__input_frames[-1]\n        last_interval_frame = self.__frame_files_with_interval[i]\n        if last_input_frame == last_interval_frame:\n            return None\n        else:\n            beg_id = int(last_interval_frame.split(\".\")[0])\n            end_id = int(last_input_frame.split(\".\")[0])\n    else:\n        beg_id = self.get_sequence_beg_id(i)\n        end_id = self.get_sequence_beg_id(i + 1)\n    if is_forward:\n        id_list = list(range(beg_id, end_id))\n    else:\n        id_list = list(range(end_id, beg_id, -1))\n    return [os.path.join(self.__input_dir, self.__input_format % id)\n            for id in id_list if self.__input_format % id in self.__input_frames]\n```\n\nTwo things to notice. First, `beg_id`\n\nand `end_id`\n\ncome straight from the keyframe filenames, which are named by frame number (`%04d.jpg`\n\n). The filename is the index. That is why the whole class can do its math on `int(name.split(\".\")[0])`\n\ninstead of carrying a separate table. Second, the `is_forward`\n\nflag reverses the range. The pipeline propagates style from the left keyframe rightward, and from the right keyframe leftward, then meets in the middle. The backward pass needs the same frames in reverse, and this one flag gives both.\n\nThe same shape repeats for every artifact the propagation needs:\n\n`get_output_sequence`\n\nbuilds the destination paths inside the keyframe's `out_`\n\nfolder.`get_flow_sequence`\n\nbuilds `flow_f_%04d.npy`\n\nfor forward and `flow_b_%04d.npy`\n\nfor backward.`get_edge_sequence`\n\n, `get_temporal_sequence`\n\n, `get_pos_sequence`\n\nbuild the per-frame guide paths in the keyframe's tmp folder.One detail worth flagging: the flow lists are one element shorter than the frame lists. There are N frames but only N-1 motions between them:\n\n```\nif is_forward:\n    id_list = list(range(beg_id, end_id - 1))      # forward flows: N-1\nelse:\n    id_list = list(range(end_id, beg_id + 1, -1))  # backward flows: N-1\n```\n\nIf you ever zip flows against frames and get an off-by-one, this is where it comes from. The class is built so the flow list and the warp loop line up.\n\nThe last gap is the awkward one. The final keyframe is rarely the literal last frame of the video, so the code special-cases it: when `i+1`\n\nruns past the keyframe list, it uses `self.__input_frames[-1]`\n\n, the true last frame, as the end of the gap. Without that branch the tail of every clip would go unstyled.\n\nPropagation here is done EbSynth-style: you give EbSynth a source stylized image and a set of guide channels, and it synthesizes the target frame so it matches the style of the source while respecting the guides. The guides live in `guide.py`\n\n. Each one answers a different question for the synthesizer.\n\n**The positional guide** answers \"where did this pixel come from?\" It starts from a synthetic image where each pixel encodes its own coordinate as color, then warps that image along the optical flow, frame after frame:\n\n``` python\n@staticmethod\ndef __generate_first_img(H, W):\n    Hs = np.linspace(0, 1, H)\n    Ws = np.linspace(0, 1, W)\n    i, j = np.meshgrid(Hs, Ws, indexing='ij')\n    r = (i * 255).astype(np.uint8)   # row -> red\n    g = (j * 255).astype(np.uint8)   # col -> green\n    b = np.zeros(r.shape)\n    return np.stack((b, g, r), 2)\n```\n\nRed is the row, green is the column. After you warp this map by the flow, a pixel's color tells you which original pixel ended up there. That is a dense, smooth correspondence field, which is exactly what a synthesizer wants so it does not invent new texture in moving regions.\n\n**The temporal guide** answers \"what did the previous stylized frame look like, moved to here?\" It takes the previous stylized frame and warps it forward by the flow:\n\n``` php\ndef get_cmd(self, i, weight) -> str:\n    if i == 0:\n        warped_img = self.stylized_imgs[0]\n    else:\n        prev_img = cv2.imread(self.stylized_imgs[i - 1])\n        warped_img = self.flow_calc.warp(prev_img, self.flows[i - 1], 'nearest').astype(np.uint8)\n        warped_img = cv2.inpaint(warped_img, self.masks[i - 1], 30, cv2.INPAINT_TELEA)\n        cv2.imwrite(self.imgs[i], warped_img)\n    return super().get_cmd(i, weight)\n```\n\nThis is the anti-flicker guide. It pushes the synthesizer to make frame `i`\n\nlook like frame `i-1`\n\ncarried along the motion, so the style stays put on a surface as it moves instead of re-rolling every frame.\n\nBoth warps leave holes. Where motion uncovers a region the camera could not see last frame, the warp has no data, and the optical-flow mask marks those pixels. The fix is the same in both guides:\n\n```\ncur_img = cv2.inpaint(cur_img, mask, 30, cv2.INPAINT_TELEA)\n```\n\n`cv2.INPAINT_TELEA`\n\nfills the disoccluded holes from their surroundings so the guide has no black gaps. A radius of 30 pixels is generous, which suits the smooth guide maps; you do not need sharp inpainting here, just plausible filler.\n\nThe other two guides are simpler. The edge guide runs a Laplacian-style filter so the synthesizer keeps structure aligned to the input:\n\n```\nfilter = np.array([[0, -1, 0], [-1, 4, -1], [0, -1, 0]])\nres = cv2.filter2D(img, -1, filter)\n```\n\nAnd the color guide is just the raw frames, so the synthesizer has the original colors to refer to. Each guide carries a `-weight`\n\n, so you can dial how strongly the synthesizer listens to motion versus structure versus color.\n\nNow you have two stylized versions of every in-between frame: one propagated forward from the left keyframe, one propagated backward from the right keyframe. They agree on geometry but rarely on color, because each picked up the tint of a different keyframe along the way. Stack them naively and you get a visible color step. `histogram_blend.py`\n\nfixes the color before the seam gets stitched.\n\nIt works in Lab color space, which separates lightness from color so you can match tone without muddying brightness:\n\n```\na = cv2.cvtColor(a, cv2.COLOR_BGR2Lab)\nb = cv2.cvtColor(b, cv2.COLOR_BGR2Lab)\n# normalize each to a common mean/std\nt_mean_val = 0.5 * 256\nt_std_val = (1 / 36) * 256\na = histogram_transform(a, a_mean, a_std, t_mean, t_std)\nb = histogram_transform(b, b_mean, b_std, t_mean, t_std)\n# average them, then re-key to the reference frame's statistics\nab = (a * weight1 + b * weight2 - t_mean_val) / 0.5 + t_mean_val\nab = histogram_transform(ab, ab_mean, ab_std, min_error_mean, min_error_std)\n```\n\nThe shape is: push both images to the same neutral mean and standard deviation, average them, then push the average to the statistics of `min_error`\n\n, the frame the pipeline trusts most for this position. The two odd constants are a target mean of `0.5 * 256`\n\n(mid-gray) and a target std of `(1/36) * 256`\n\n. They are just a stable common ground to average in; the final re-keying is what makes the result match a real frame rather than a washed-out midpoint. This is per-channel mean/std transfer, the same idea as classic Reinhard color transfer, done twice.\n\nColor matching makes the two halves agree on average. It does not hide the actual boundary where forward meets backward. For that the pipeline pastes in the gradient domain, the same Poisson Image Editing idea from Pérez, Gangnet and Blake (SIGGRAPH 2003) that photo tools use to drop an object into a new background without a halo.\n\nThe principle: do not copy pixels, copy differences between pixels. Build a target gradient field by taking gradients from image 1 outside the mask and image 2 inside it, then solve for the image whose gradients match that field. Seams disappear because you never enforce an absolute pixel value at the boundary, only the slope across it.\n\n``` python\ndef poisson_fusion(blendI, I1, I2, mask, grad_weight=[2.5, 0.5, 0.5]):\n    Iab = cv2.cvtColor(blendI, cv2.COLOR_BGR2LAB).astype(float)\n    Ia  = cv2.cvtColor(I1, cv2.COLOR_BGR2LAB).astype(float)\n    Ib  = cv2.cvtColor(I2, cv2.COLOR_BGR2LAB).astype(float)\n    m = (mask > 0).astype(float)[:, :, np.newaxis]\n\n    # gradient from I1 outside the mask, from I2 inside it\n    gx[:-1] = (Ia[:-1] - Ia[1:]) * (1 - m[:-1]) + (Ib[:-1] - Ib[1:]) * m[:-1]\n    gy[:, :-1] = (Ia[:, :-1] - Ia[:, 1:]) * (1 - m[:, :-1]) + (Ib[:, :-1] - Ib[:, 1:]) * m[:, :-1]\n```\n\nThen for each channel it solves a least-squares system `Ax = b`\n\nwhere `A`\n\nstacks the gradient operators and an identity term, and `b`\n\nstacks the target gradients and the original intensities:\n\n```\nA = As[i]\nb = np.vstack([im_dx * weight, im_dy * weight, im])\nout = scipy.sparse.linalg.lsqr(A, b)\n```\n\nTwo things make this practical. First, `grad_weight=[2.5, 0.5, 0.5]`\n\nweights the L (lightness) channel five times harder than the two color channels. Lightness carries the structure your eye locks onto, so the solver is told to preserve lightness gradients tightly and let color relax. Second, the big sparse matrix `A`\n\ndepends only on image size and weights, not on pixel values, so it is built once and cached:\n\n```\ncrt_states = (h, w, grad_weight)\nif As is None or crt_states != prev_states:\n    As = construct_A(*crt_states)\n    prev_states = crt_states\n```\n\nBuilding `A`\n\nwalks every pixel to wire up the gradient operators, which is slow. For a video you run `poisson_fusion`\n\non hundreds of frames at the same resolution, so caching it across calls turns a per-frame cost into a one-time cost. That global cache is the difference between a restyle that finishes and one you abandon.\n\nAbout the author. I'm Wlad Radchenko, a\n|\n\nThe interval is the dial that matters most. With fast motion or a moving camera, optical flow gets unreliable and a wide interval lets the warp smear. Drop the interval so keyframes are closer together and the propagation has less work to do per gap.\n\nThe flow list is N-1, not N. If you write your own warp loop against `get_flow_sequence`\n\n, remember there is one fewer flow than frame, and the temporal guide already accounts for it by special-casing `i == 0`\n\n.\n\nInpaint radius is forgiving here. The TELEA radius of 30 looks large, but it fills guide maps, not final pixels, so a soft fill is fine. The real frame quality comes from the synthesizer, the histogram match, and the Poisson solve downstream.\n\nWatch the last gap. The final keyframe is almost never the last frame of the clip. The `i + 1 > len(...)`\n\nbranch in every `VideoSequence`\n\nmethod exists to run that tail against the real last frame. If you reimplement the bookkeeping and skip it, your output will be a few frames short and the cut will be obvious.\n\nPer-frame restyle flickers because the model re-decides the look on every frame. The fix is to decide rarely and propagate. Stylize sparse keyframes, warp them across the gaps with EbSynth-style positional and temporal guides, match colors in Lab with a double histogram transfer, and stitch the forward and backward halves with a gradient-domain Poisson solve that weights lightness heavily and caches its matrix. None of the temporal-coherence work is diffusion. It is flow, inpainting, and two classic blends, sequenced carefully.\n\nThe code is in `visual_generation/restyle/blender/`\n\nin the [Wunjo Make repo](https://github.com/wladradchenko/wunjo.wladradchenko.ru): `video_sequence.py`\n\nfor the bookkeeping, `guide.py`\n\nfor the guides, `histogram_blend.py`\n\nand `poisson_fusion.py`\n\nfor the blends. If your own restyle boils, start by cutting the per-frame diffusion down to keyframes.", "url": "https://wpnews.pro/news/stopping-the-flicker-when-you-restyle-a-video-frame-by-frame", "canonical_source": "https://dev.to/wladradchenko/stopping-the-flicker-when-you-restyle-a-video-frame-by-frame-5eio", "published_at": "2026-06-27 14:01:21+00:00", "updated_at": "2026-06-27 14:03:33.783166+00:00", "lang": "en", "topics": ["computer-vision", "machine-learning", "generative-ai", "developer-tools"], "entities": ["Rerender A Video", "EbSynth", "SIGGRAPH Asia 2023", "ACM ToG 2019", "Yang et al.", "Jamriška et al."], "alternates": {"html": "https://wpnews.pro/news/stopping-the-flicker-when-you-restyle-a-video-frame-by-frame", "markdown": "https://wpnews.pro/news/stopping-the-flicker-when-you-restyle-a-video-frame-by-frame.md", "text": "https://wpnews.pro/news/stopping-the-flicker-when-you-restyle-a-video-frame-by-frame.txt", "jsonld": "https://wpnews.pro/news/stopping-the-flicker-when-you-restyle-a-video-frame-by-frame.jsonld"}}