AI for Ecommerce: Product Data, Feeds, and Beyond

AI for EcommerceMar 4, 20267 min readDoreid Haddad

In this article

The Google AI Overview for "AI for ecommerce product data" lists four categories — product data enrichment, automated content generation, data standardization, image recognition — and names five vendor tools: Describely, Zoovu, Hypotenuse AI, Feedonomics, Salsify. That's a useful map of where AI is showing up in ecommerce in 2026, but it's missing the broader picture. AI in ecommerce isn't one thing. It's a four-layer stack that runs from raw product data at the bottom up to dynamic pricing and personalization at the top. Each layer has different costs, different vendors, and different failure modes.

This article is the layered architecture map. What lives at each layer, where the spend actually goes, and the buying mistakes that show up at each level.

Layer 1: Product data and feed quality

The bottom layer. Everything else depends on it. Product data is your titles, descriptions, attributes (color, size, material), images, categories, variants, GTINs, and the structured fields that go into every channel — your own site, Google Merchant Center, Amazon, Meta catalog, Pinterest, TikTok Shop. Bad data here makes every layer above it worse.

AI's job at this layer is enrichment and standardization. The current Google AI Overview names the canonical applications: filling in missing attributes, normalizing data from multiple suppliers into a unified format, image recognition for tagging, and SEO-optimized title and description generation. The mature vendors in this space — Describely, Salsify with AI features, Hypotenuse AI, Zoovu — handle this through a combination of LLMs (for content generation) and classical ML (for attribute extraction and standardization).

The single most underrated lever in this layer: the rarely-populated fields in Google Merchant Center. The product_highlights and product_details fields are the ones almost no Shopify-default store fills in. Most ecommerce content generators stop at title and description. The teams who use AI to populate these underused fields end up owning them permanently across every channel that consumes the GMC feed. That's compounding leverage from work most competitors won't do.

Buy at this layer. Don't build. The vendor space is mature, prices are reasonable for what they deliver, and the data structures (PIM schemas, feed formats) are standardized enough that vendor lock-in is moderate. Target spend: 0.1-0.3% of catalog revenue at small scale, lower at enterprise.

Layer 2: Content generation at scale

Sitting on top of clean product data, content generation produces the customer-facing copy — long-form descriptions, category landing pages, blog content tied to products, email subject lines, ad creative variants. This is generative AI's home turf and the layer where ROI is easiest to measure.

The pattern that works: feed your enriched product data to an LLM with a carefully tuned brand voice prompt, generate variants, run them through human review for the first batch, then auto-publish for repeat patterns. Vendors like Hypotenuse AI handle this end-to-end. The build-it-yourself version uses Claude Sonnet 4.6 or GPT-5 directly with your own prompt library and structured output schemas.

The trap at this layer: thin content that all sounds the same. Generative AI on default settings produces forgettable copy. The teams who win at this layer treat prompt engineering as a discipline — a curated style guide, banned-words lists, brand voice examples, and an eval set built from copy that actually converts. That work is the difference between AI content that helps SEO and AI content that gets penalized.

Layer 3: Recommendations and personalization

Where traditional AI does the heavy lifting. Recommender systems are mostly classical machine learning — collaborative filtering, matrix factorization, gradient-boosted scoring — sometimes augmented with deep learning for sequence modeling and embeddings. They take user behavior history (clicks, purchases, dwell time) plus product attributes and predict what each user will engage with.

The build-vs-buy decision at this layer is more nuanced than at Layer 1. Off-the-shelf recommendation engines (Algolia, Klevu, Constructor, Bloomreach, native platforms in Shopify and BigCommerce) are excellent if your business is generic ecommerce — they ship working recommendations in days. Custom-built recommenders earn their seat when you have proprietary data signals (subscription history, B2B account dynamics, niche category structures) that off-the-shelf engines can't model.

Cost picture at this layer: vendor pricing typically 0.5-2% of recommendation-attributed revenue. Custom build cost: 6-12 months of an ML team plus ongoing maintenance, justified at scales where vendor pricing exceeds the build cost.

Layer 4: Dynamic pricing and merchandising

The top layer and the most operationally sensitive. Dynamic pricing engines adjust prices based on demand signals, competitor prices, inventory levels, and margin targets. Done right, they capture revenue most static-pricing systems leave on the table. Done wrong, they trigger price wars, regulatory complaints, or trust-eroding price discrimination.

Most production dynamic pricing in 2026 is rules-based with ML-augmented inputs — not pure ML — for a specific reason: regulators and brand teams need explainability that pure ML can't provide. The AI's job is to surface signals (demand forecast, competitor moves, inventory turn) that humans encode into pricing rules.

This layer rarely fits an off-the-shelf product. The vendors in this space (Pricefx, Vendavo, BlackCurve) are mostly enterprise-only. Mid-market dynamic pricing is usually a custom build by an analytics team using competitive scrape data plus internal margin and inventory feeds.

How the four layers connect

The layers compound. Bad data at Layer 1 makes Layer 2 generate bad content, makes Layer 3 recommend wrong products, makes Layer 4 misprice based on bad signals. Most ecommerce AI projects that fail in 2026 fail because the team invested in Layer 3 or 4 without fixing Layer 1 first. The vendor demos at the top look impressive. The production systems disappoint because the data underneath is messy.

A working sequence for an ecommerce AI roadmap:

Audit Layer 1 first. Run your product feed through a data-quality check. Count missing attributes. Sample 20 product descriptions and ask whether you'd buy from them. Score image quality. The audit is the baseline.
Fix Layer 1 before moving up. Enrichment vendors or in-house automation. Target: every required and recommended field populated, every image tagged, every category standardized.
Build Layer 2. Generate content for the cleaned-up products. Validate against your brand voice and conversion baseline.
Add Layer 3. Recommendations on a clean catalog with rich content perform dramatically better than the same recommender on dirty data.
Approach Layer 4 carefully. Only after Layers 1-3 are mature. Pricing experiments need a control group, regulatory review, and brand-team alignment.

Where the budget actually goes

A reasonable AI budget allocation for a mid-market ecommerce business doing $20-100M in annual revenue:

Layer 1 (data and feeds): 30-40%
Layer 2 (content): 15-25%
Layer 3 (recommendations): 25-35%
Layer 4 (pricing): 5-15%
Engineering glue: 10-15%

The exact mix depends on your business. A fashion brand with constant new products weights Layer 1 heavily. A subscription business with stable catalog weights Layer 3. A high-margin luxury brand may skip Layer 4 entirely. The mix isn't universal but the layered architecture is.

Where I see teams overspend

The most common ecommerce AI mistakes I see in 2026:

Buying a recommendation engine before fixing the catalog. The recommender works on whatever data it has. If the data has gaps, the recommendations will. The investment doesn't compound until the data is right.

Skipping vendor solutions for content generation. Custom-built ecommerce content generators rarely beat the dedicated tools (Describely, Hypotenuse) at any scale below tens of thousands of SKUs. Buy at this layer; build at higher layers.

Putting LLMs behind layer-3 recommendation decisions. Asking an LLM "which products should I show this user" produces fluent, expensive, miscalibrated answers. Recommendations belong in classical ML or specialized recommender systems, not in LLMs.

Dynamic pricing without the legal review. Personalized pricing by user attributes can violate consumer protection rules in EU, UK, and several US states. Don't move forward without legal sign-off on the specific signals being used.

The teams who run ecommerce AI well in 2026 don't pick a single layer. They build out the stack — data first, content second, recommendations third, pricing if and when the rest is mature. Match the architecture to the work and the AI starts compounding instead of just costing.

Frequently Asked Questions

What's the most underrated AI use case in ecommerce?

Product data enrichment. The Google Merchant Center fields like product_highlights and product_details are the most underutilized in ecommerce — Shopify almost never populates them, so once you write them with AI, you own those fields permanently. Better feed data lifts every channel that consumes it.

Should I build my own AI ecommerce tools or buy them?

Buy for product data enrichment, content generation, and basic recommendations — vendors like Describely, Hypotenuse, Salsify, and Feedonomics are mature and cost-effective. Build only for proprietary advantages like personalization tied to your specific customer history or pricing strategy tied to your margin model.

Where do most ecommerce AI projects fail?

On the data layer underneath, not the AI on top. Recommendation models work only as well as the product taxonomy and behavioral data feeding them. Pricing engines work only as well as the competitive scrape and inventory data. Most failed projects had clean models running on dirty data.

Sources

Describely — AI Powered E-commerce Product Data Enrichment
Google AI — E-commerce Product Content Generator (Gemini API)
Cleanlab — Enhancing Product Analytics and E-commerce with Data Quality
Stanford HAI — AI Index Report 2026
McKinsey QuantumBlack — The state of AI in 2026
NIST — AI Risk Management Framework

Written byDoreid Haddad

Founder, Tech10

Doreid Haddad is the founder of Tech10. He has spent over a decade designing AI systems, marketing automation, and digital transformation strategies for global enterprise companies. His work focuses on building systems that actually work in production, not just in demos. Based in Rome.