A behind-the-scenes look at the technical workflow that keeps eCommerce image editing consistent at scale β and what breaks when you try to fully automate it.
We process over 7,500 product images every single day.
When you're running at that volume, even a 2% error rate means 150 broken images going out to clients' Amazon listings, Shopify stores, and product catalogs. A 2% error rate at that volume is a business problem, not a rounding error.
This post is about what we learned building a pipeline that handles that volume reliably β the parts that automated well, the parts that didn't, and the specific technical decisions that made the difference.
If you're a developer building an eCommerce image processing workflow, or a technical founder trying to figure out how to scale product photo operations, this is the breakdown I wish I'd had earlier.
Every developer who builds an image processing pipeline goes through the same arc.
Phase 1: "AI background removal is incredible. I'll automate everything."
Phase 2: "Why are 30% of these outputs rejected by Amazon?"
Phase 3: "Why is my client's return rate going up?"
Phase 4: "I need to add human review back in. But where?"
The issue isn't that AI image processing is bad. It's that it's inconsistent in ways that are difficult to detect automatically β and the failures that matter most are the ones that look fine at thumbnail resolution but fail at 1:1 zoom or fail RGB verification.
Here's what a naive automated pipeline misses:
result = ai_remove_background(image)
None of these failures are visible in a thumbnail. All of them are visible when a customer zooms in on an Amazon listing. Some of them trigger Amazon's automated image compliance rejection.
After significant iteration, our pipeline looks like this:
INPUT
βββ Client uploads batch (RAW/JPEG/TIFF)
STAGE 1: Automated Pre-processing
βββ Format normalization β TIFF for editing
βββ Metadata extraction (product type, shoot group)
βββ Resolution validation (reject < 1000px)
βββ Color profile normalization β Adobe RGB
βββ Smart grouping by shoot condition/lighting
STAGE 2: Complexity Classification
βββ Edge complexity score (simple / medium / complex)
βββ Color match risk score (product vs background)
βββ Product category tag (jewelry, clothing, electronics...)
βββ Route: simple β automated | medium/complex β human queue
STAGE 3A: Automated Processing (Simple Products)
βββ AI background removal
βββ Background verification (RGB check)
βββ Auto crop to 85% frame fill
βββ Color normalization
βββ QC scoring β pass / flag for review
STAGE 3B: Human Processing (Complex Products)
βββ Manual pen tool clipping path
βββ Hair/fur masking (Select and Mask)
βββ Ghost mannequin compositing (where applicable)
βββ Color correction against reference
βββ Senior editor QC sign-off
STAGE 4: Export Pipeline
βββ Platform-specific export (Amazon / Shopify / print)
βββ WebP conversion + JPEG fallback
βββ Final RGB verification on white background
βββ File naming (ASIN or client convention)
βββ Delivery package assembly
OUTPUT
βββ Marketplace-ready, verified images
The key insight is Stage 2: Complexity Classification. This is the decision point that determines whether a product goes through automated or human processing. Getting this right is what separates a pipeline with 98% quality from one with 70% quality.
The classifier uses a combination of heuristics and a trained model to route images:
def classify_complexity(image_path, metadata):
img = load_image(image_path)
edge_score = calculate_edge_complexity(img)
color_match_risk = check_foreground_background_similarity(img)
category_rules = {
'jewelry': 'complex', # Always human - chains, gems
'fur': 'complex', # Always human - fine edges
'lingerie': 'complex', # Always human - lace, mesh
'transparent': 'complex', # Always human - glass, acrylic
'electronics': 'simple', # Usually box/flat, clean edges
'books': 'simple', # Rectangular, clean
'cosmetics_tube': 'medium' # Curved but usually clean
}
if metadata.get('category') in category_rules:
base_score = category_rules[metadata['category']]
else:
if edge_score > COMPLEX_THRESHOLD:
base_score = 'complex'
elif edge_score > MEDIUM_THRESHOLD or color_match_risk > 0.3:
base_score = 'medium'
else:
base_score = 'simple'
return base_score
We started with pure image analysis and found that metadata rules outperformed it significantly on the categories we see most. Jewelry is always complex. Full stop. No edge detection algorithm reliably handles gold chains against a white background β the color similarity between gold-lit chain and slightly warm white causes consistent AI failures.
Amazon requires background RGB of exactly 255, 255, 255. Not 253, 254, 254. Not 250, 252, 255. Exactly 255.
The problem: AI background removal tools frequently produce backgrounds that look white on screen but fail the actual RGB check. A background of RGB 248, 250, 248 looks white at normal monitor brightness. It causes listing rejection.
def verify_background_compliance(image_path, sample_points=50):
"""
Sample multiple background points and verify against
Amazon's pure white requirement (RGB 255,255,255)
"""
img = Image.open(image_path).convert('RGB')
width, height = img.size
sample_coords = generate_perimeter_samples(width, height, sample_points)
failures = []
for x, y in sample_coords:
r, g, b = img.getpixel((x, y))
if r < 250 or g < 250 or b < 250:
failures.append({
'coord': (x, y),
'rgb': (r, g, b),
'delta': (255-r, 255-g, 255-b)
})
compliance_rate = 1 - (len(failures) / sample_points)
if compliance_rate < 0.95:
return {
'status': 'FAIL',
'compliance_rate': compliance_rate,
'failures': failures,
'action': 'route_to_correction'
}
return {
'status': 'PASS',
'compliance_rate': compliance_rate
}
This runs on every image before it leaves the pipeline. Simple, fast, catches the most common Amazon rejection cause.
Ghost mannequin (hollow mannequin) is one of the most requested services in fashion eCommerce. You photograph the garment on a mannequin, then remove the mannequin so the garment appears three-dimensional and self-supporting.
Every attempt to automate this properly has failed β including our own experiments with segmentation models. Here's why:
The ghost mannequin problem requires:
1. Segmenting the garment from the mannequin (reasonably solvable with AI)
2. Identifying what garment areas were obscured by the mannequin body (not solvable)
3. Using a second "inside label" photo to fill the neck/armhole gaps (requires human composition)
4. Blending interior and exterior photos seamlessly (requires contextual judgment)
5. Adjusting for how the fabric would naturally fall without a body (requires physics/experience)
Step 2 is the blocker. An AI model can identify that there's a gap where the mannequin neck was. It cannot reliably reconstruct what the interior collar or neckline looks like based on the exterior shot alone.
The result of automated attempts: composited images where the neckline looks incorrect, fabric doesn't meet properly, or interior details are hallucinated in ways that don't match the actual garment.
For a fashion brand listing on Amazon or running paid ads, a wrong-looking neckline on a sweater photograph is a conversion killer and a brand credibility issue.
This is why ghost mannequin service at scale still requires human compositing editors β the step that AI cannot reliably handle is precisely the most visible part of the final image.
One of the more tedious parts of eCommerce image processing is managing platform-specific export requirements. We codify these as configurations rather than hardcoding them:
PLATFORM_CONFIGS = {
'amazon_main': {
'background': 'pure_white', # RGB 255,255,255
'min_dimension': 1000,
'recommended_dimension': 2000,
'format': 'JPEG',
'quality': 95,
'color_space': 'sRGB',
'product_fill_min': 0.85, # 85% of frame
'max_file_size_mb': 10,
'naming': 'ASIN'
},
'amazon_secondary': {
'background': 'any',
'min_dimension': 1000,
'recommended_dimension': 2000,
'format': 'JPEG',
'quality': 90,
'color_space': 'sRGB',
'max_file_size_mb': 10
},
'shopify': {
'background': 'white_recommended',
'recommended_dimension': 2048,
'format': 'WebP',
'quality': 85,
'color_space': 'sRGB',
'aspect_ratio': '1:1'
},
'print_catalog': {
'background': 'any',
'format': 'TIFF',
'bit_depth': 16,
'color_space': 'AdobeRGB',
'dpi': 300,
'sizing': 'full_resolution'
},
'google_shopping': {
'background': 'white_or_light_grey',
'min_dimension': 100,
'recommended_dimension': 800,
'format': 'WebP',
'quality': 85,
'color_space': 'sRGB'
}
}
def export_for_platform(image_path, platform, asin=None):
config = PLATFORM_CONFIGS[platform]
img = Image.open(image_path)
img = resize_for_platform(img, config)
img = convert_color_space(img, config['color_space'])
if config['background'] == 'pure_white':
img = ensure_pure_white_background(img)
filename = generate_filename(platform, asin, config)
output_path = f"/exports/{platform}/{filename}"
save_with_config(img, output_path, config)
verify_output_compliance(output_path, config)
return output_path
Rather than binary pass/fail, we use a scoring system that determines routing:
def calculate_qc_score(image_path, platform_config):
scores = {}
bg_result = verify_background_compliance(image_path)
scores['background'] = 25 if bg_result['status'] == 'PASS' else \
int(25 * bg_result['compliance_rate'])
edge_result = analyze_edge_quality(image_path)
scores['edge_quality'] = edge_result['score']
fill_result = check_product_fill(image_path, platform_config)
scores['product_fill'] = 25 if fill_result['fill_ratio'] >= 0.85 else \
int(25 * (fill_result['fill_ratio'] / 0.85))
color_result = check_color_accuracy(image_path)
scores['color_accuracy'] = color_result.get('score', 25)
total_score = sum(scores.values())
return {
'total': total_score,
'breakdown': scores,
'routing': 'auto_approve' if total_score >= 90 else \
'human_review' if total_score >= 70 else \
'reject_and_redo'
}
Images scoring 90+ ship automatically. 70-89 go to a human reviewer who makes the final call. Below 70 go back to the editing queue.
This routing system is what keeps the 7,500 daily volume manageable without a QC team that's checking every single image.
After several years of iterating on this pipeline, here's the honest list of what still requires human judgment:
Color accuracy against physical product. We can normalize colors and correct color casts algorithmically, but verifying that the final image matches the physical product in the box requires a human with a calibrated monitor and the actual product.
Ghost mannequin compositing. As described above β the interior/exterior blend step is not reliably automatable at quality levels that fashion brands accept.
Complex edge products. Jewelry, fine lace, transparent products, and multi-colored products against similar-colored backgrounds β automated edge detection fails too often at 1:1 zoom.
Brand-specific aesthetic judgment. Some clients have specific retouching styles, shadow treatments, or color profiles that require knowing what "looks right for this brand" β which is not something a general model captures well.
** Professional photo retouching at the detail level** β dust removal, surface scratch correction, reflection management on jewelry β these remain human tasks because they require understanding what should and shouldn't be in the final image.
The rule of thumb that's held up: automate the repeatable decisions. Human-review the judgment calls. Verify everything before it ships.
We track one metric above all others: client return rate change after implementing our edited images.
Not throughput. Not AI adoption percentage. Not cost per image.
Whether our clients are seeing fewer returns after switching to our edited product images. Because if our "optimized" pipeline is producing images that look better than the product, we're optimizing for the wrong outcome. Higher conversion from better-looking images + higher return rate from misleading images = zero net benefit, with added customer trust damage.
The only win condition is images that are accurate, compliant, and fast to produce. All three. Not two out of three.
Building a product image pipeline at scale taught us that the question is never "AI or human." It's "which decisions can be automated reliably and which require judgment."
The answer is roughly: pre-processing, routing, format conversion, and QC scoring automate well. Background removal on simple products automates well. Color space management automates well.
Complex edge removal, ghost mannequin, final color accuracy verification, and brand-specific aesthetic decisions do not automate well β at least not to the quality level that fashion brands and Amazon listings require.
The pipeline that works is the one that's honest about those limits and routes accordingly.
Clipp Out Line (clippoutline.com) has provided product photo editing for eCommerce brands since 2010. Services: background removal Β· clipping path Β· ghost mannequin Β· color correction Β· photo retouching Β· Amazon photo editing
#python
#imageprocessing
#automation
#webdev`