Hmm, probably:
I think you have found a second, separate problem now.
The earlier rings / nails / bracelets issue was mostly:
object / detail continuity drift
This new issue sounds more like:
color / exposure / skin-tone drift
They are related because both are continuity problems, but they need slightly different fixes.
Your description is very specific:
frames 1 → 80:
very slight gradual light increase
after 4–5 chained clips:
the small increase has accumulated
the character looks much brighter / more tanned
the scene no longer feels like the same scene
That sounds less like a bad prompt and more like cumulative clip-chain drift.
I would not try to solve this only by increasing steps or tweaking the prompt.
On an 8GB GPU, I would first test a pipeline fix:
shorter clip
+
do not pass the raw last frame forward
+
create a corrected "clean continuity frame"
+
color/exposure match that frame to a stable reference
+
use the corrected frame as the next clip's first frame
In other words, I would stop thinking of the last frame as automatically “safe.”
I would treat the last frame as:
a candidate frame that needs inspection and correction before it becomes the next first frame
There are a few reports that look close to what you are seeing.
Wan2GP has a very similar issue report:
The description there is basically:
every time the video is extended
brightness/light increases
the video becomes progressively more overexposed
That is extremely close to your “after 4 or 5 clips” observation.
There is also this one:
That report describes 5–6 second Wan2.1 I2V generations where brightness increases progressively toward the end of the video.
Wan2.1 itself also has a long-frame color-shift/flicker issue report:
That issue specifically mentions long generations such as 121 frames and says the problem appears when going beyond 81 frames.
So your 80-frame / 7-second setup is right near the area where I would be suspicious of frame-length instability.
There was also an older Wan2.1 I2V frame-count discussion:
I would not over-interpret that as “81 is always the magic number,” because workflows and implementations differ. But it does suggest that Wan2.1 I2V has historically been very sensitive around that frame-count region.
I would describe the mechanism like this:
clip 1 starts from a good frame
↓
within clip 1, exposure/skin tone drifts slightly brighter
↓
clip 1 last frame is now a little brighter than the original
↓
clip 2 uses that brighter frame as its new starting point
↓
clip 2 drifts a little brighter again
↓
repeat 4–5 times
↓
the accumulated change becomes obvious
So the problem is not necessarily that any single clip is terrible.
The problem is that a very small change becomes the new baseline each time.
This is why it can be “hardly noticeable” in one 80-frame clip, but obvious after several clips.
I would separate the current issues like this:
| Problem | What changes | Likely cause | Best first response |
|---|---|---|---|
| Object/detail drift | |||
| rings, nails, bracelets, sleeves | model re-invents small details after motion/occlusion | shot design, clean keyframes, still inpaint | |
| Color/exposure drift | |||
| skin tone, brightness, contrast, warmth | small exposure/color shift accumulates through chained clips | shorter clips, color reset, reference matching | |
| Identity drift | |||
| face changes | weak identity constraint over time | better reference frames, face restore/swap, shorter clips | |
| Motion drift | |||
| pose/action becomes different | long generation / weak control | shorter clips, control workflows | |
| Scene drift | |||
| room/background changes | insufficient visual anchor | stronger first frames, fewer chained assumptions |
Right now, #12 is mostly the second one:
color/exposure drift
You can try adding phrases like:
consistent lighting
constant exposure
same skin tone
no exposure change
no brightness shift
same color grading
but I would not expect that to fully solve it.
The issue is happening inside the generated image sequence and then getting inherited by the next clip.
A prompt can bias the model, but it cannot reliably enforce:
frame 80 must have exactly the same exposure distribution as frame 1
So I would put prompt tweaks low on the priority list.
You said:
2-pass KSampler
cfg = 1
KS1 steps 0–2
KS2 steps 2–4
higher steps make a 7 sec clip too slow on 8GB GPU
That is important.
On an 8GB card, trying to fix this by simply raising steps may not be practical. It may help a little, or it may not, but it makes every experiment expensive.
I would first test cheaper variables:
frame count
clip length
raw last frame vs corrected first frame
color match on continuity frame
seed sensitivity
reference-frame color matching
Those tests may teach you more than just increasing steps.
Because there are Wan reports around color shift / flicker at longer frame counts, I would first test shorter clips.
For example:
| Test | Frames | Goal |
|---|---|---|
| A | 80 frames | current baseline |
| B | 48 frames | test whether drift reduces |
| C | 40 frames | stronger short-clip test |
| D | 32 frames | extreme stability test |
Keep everything else as similar as possible:
same source image
same prompt style
same sampler style
same resolution
same approximate scene
Then compare:
frame 1 vs final frame
If the brightness drift is much smaller at 40–48 frames, then the problem is probably strongly related to clip length / frame count.
That would be useful information.
I would test this difference:
Test A:
clip A raw last frame → clip B first frame
Test B:
clip A last frame
→ color/exposure corrected against original reference
→ clip B first frame
If Test B reduces the “sun bed” effect, then the main fix is not prompt or steps.
The main fix is:
color reset before chaining
Pick one frame as the color reference.
Usually one of these:
original input image
clip 1 frame 1
best-looking frame from clip 1
a manually corrected still frame
This becomes your “visual truth” for:
skin tone
exposure
contrast
warmth
color balance
scene brightness
Then every time you create a new continuity frame, compare it against that reference.
The mental model:
reference frame = color anchor
last frame = motion anchor
corrected continuity frame = next first frame
Instead of:
generate clip 1
↓
use raw last frame as clip 2 first frame
↓
generate clip 2
↓
use raw last frame as clip 3 first frame
try:
generate clip 1
↓
pick a good near-final frame
↓
compare it to the reference frame
↓
correct exposure/color/skin tone
↓
save this as clean continuity frame 1
↓
use clean continuity frame 1 as clip 2 first frame
↓
generate clip 2
↓
repeat
This turns clip chaining into a controlled pipeline instead of an automatic drift amplifier.
There are several levels of correction.
| Method | Where | Good for | Notes |
|---|---|---|---|
| Manual correction | editor / image tool | one boundary frame | very controllable |
| ColorMatch | ComfyUI | matching one frame/image to reference | quick test |
| Histogram match | ComfyUI | approximate color distribution match | can help with exposure/color drift |
| LUT / grade | DaVinci / ffmpeg / editor | final clip consistency | good for final polish |
| Full-frame batch correction | ComfyUI / editor | all frames in a clip | more work, but useful |
| Video inpainting | ComfyUI | local objects/details | not the first fix for exposure drift |
ComfyUI-related nodes/resources worth looking at:
External tools:
I would not assume a ComfyUI color-match node will be perfect. It may be enough for boundary frames, though.
Color matching is useful, but it can also fail.
There is a KJNodes issue where Color Match caused problems when later frames were very white:
Also, if the camera angle or content changes too much, a pure color-match operation can produce weird results.
So I would test gently:
use 50% strength first if available
test on one frame
do not apply blindly to a whole long clip
compare before/after
For boundary-frame chaining, I would start with:
correct only the frame that becomes the next first frame
not:
color-match every frame aggressively
There are two different uses of color correction.
Goal:
prevent drift from being passed into the next generation
This is probably the most important for you.
Pipeline:
clip A final frame
↓
color match to reference
↓
use corrected frame as clip B first frame
Goal:
make the finished clips look consistent after generation
Pipeline:
all clips generated
↓
bring into editor
↓
match exposure/contrast/skin tone across clips
↓
export final movie
Both are useful, but they solve different parts of the problem.
Boundary correction prevents the next generation from inheriting the drift.
Final grading makes the already-generated clips look consistent.
I would run a small matrix.
| Test | Change | What it tells you |
|---|---|---|
| 1 | Current 80-frame generation | baseline |
| 2 | 48-frame generation | whether shorter clips reduce drift |
| 3 | 40-frame generation | stronger short-clip check |
| 4 | 80-frame generation, different seed | whether seed matters |
| 5 | 80-frame generation, same source but lower motion | whether motion drives drift |
| 6 | raw last-frame chaining | baseline chaining drift |
| 7 | color-corrected continuity-frame chaining | whether color reset helps |
| 8 | final editor grade only | whether post grade is enough |
The highest-value test is probably:
80 frames vs 40–48 frames
and then:
raw last frame vs corrected continuity frame
A 7-second clip may be fine if you only need one clip.
But if you plan to chain:
clip 1
clip 2
clip 3
clip 4
clip 5
then each clip must not only look good by itself. It must also end in a good state for the next clip.
That is much harder.
A 7-second clip with a tiny internal exposure drift may look acceptable alone, but it becomes a bad building block for chaining.
So for chained scenes, shorter clips can be more stable:
3–4 seconds
or
40–48 frames
Then use editing to assemble the scene.
This may feel slower creatively, but it gives you more control.
This is probably the key rule.
Do not pass this forward:
slightly brighter last frame
Pass this forward:
motion-continuity frame corrected back to reference exposure/color
Once a frame becomes brighter and you use it as the next input, the model may treat the brighter skin/scene as the new normal.
That is the same idea as the earlier rings/nails issue:
if a bad ring becomes part of the next first frame,
the next clip may preserve or amplify the ring
For color:
if a brighter skin tone becomes part of the next first frame,
the next clip may preserve or amplify the brighter skin tone
This is not just a “you configured something wrong” issue.
Long/chained video generation has a general problem called things like:
drift
error accumulation
exposure bias
temporal degradation
memory bottleneck
FramePack discusses this problem directly:
Another related long-video paper:
I am not saying you should switch to those immediately. With 8GB VRAM, that may not be the practical next step.
But these links are useful because they show that the problem category is real:
long video generation tends to accumulate errors over time
Your case is a practical Wan I2V version of that.
I would not start with:
raising steps a lot
trying to fix it with "no overexposure" in the negative prompt
building a giant new workflow
installing several new video systems at once
trying full video inpainting first
Those may be useful later, but they are not the cheapest diagnostic tests.
Given your 8GB GPU, I would first test:
shorter clips
color reset between clips
boundary-frame correction
final grading
Something like:
1. Pick one reference frame.
Usually original image or clip 1 frame 1.
2. Generate clip 1, but test shorter length first.
Try 40–48 frames.
3. Inspect frame 1 vs final frame.
Look at exposure, skin tone, warmth, contrast.
4. Pick a near-final frame for continuity.
Do not automatically use frame 80 if frame 80 is already drifting.
5. Color-correct that frame against the reference.
Use ColorMatch / histogram match / manual editor correction.
6. Use the corrected frame as clip 2 first frame.
7. Repeat.
8. After all clips are generated, do final color grading across the finished clips.
If you must keep 80 frames, then I would be stricter about step 5.
For each generated clip, check:
frame 1
frame 20
frame 40
frame 60
frame 80
Ask:
Is the face brighter?
Is the skin warmer?
Is contrast lower?
Are highlights clipping?
Is the background changing?
Is the whole frame brighter or only the character?
If the whole frame is drifting:
global exposure/color drift
If only the face/skin is drifting:
subject/skin-tone drift
If only highlights are growing:
overexposure/highlight clipping
Different drift types may need different correction.
I would name the two current problems like this:
Problem 1:
object/detail continuity drift
rings, nails, bracelets, sleeves
Problem 2:
cumulative color/exposure drift
brightness, skin tone, contrast over chained clips
Then your workflow goal becomes:
Do not let either kind of drift become the next clip's starting condition.
I would expect the best improvement from:
shorter clips
+
not using raw last frames
+
color-correcting continuity frames
I would expect smaller or inconsistent improvement from:
negative prompt
slightly more steps
minor diffusion tweaks
And I would treat model/workflow changes as later experiments.
You are probably no longer just fighting prompt adherence.
You are now fighting clip-chain drift.
Each clip slightly changes the visual state. If the next clip starts from that changed state, the change accumulates.
For your 8GB setup, the most practical first fix is probably:
try 40–48 frames
use a stable reference frame
correct the boundary frame before chaining
then do final color grading after all clips are assembled
That may not be a perfect in-model fix, but it is a realistic production-style workaround.