Amazon-Like E-Commerce System Design
This system models a simplified Amazon-style marketplace supporting sellers, buyers, product discovery, and order placement. The design emphasizes scalability, availability, and correctness for critical operations.
Functional Requirements
The platform must support the following core user capabilities:
Seller Operations
- Sellers can add new items to the catalog
- Sellers can define attributes such as name, description, price, and quantity
- Item onboarding should propagate to downstream systems (search, recommendations)
Buyer Operations
Buyers interact with the platform through several workflows:
- Search Items β Discover products using keywords
- View Item Details β Retrieve complete product information
- Add Item to Cart β Select desired items for purchase
- Checkout / Place Order β Complete purchase with payment processing
- View Orders β Inspect order history and status
Checkout involves coordination with:
- Inventory validation & reservation
- Payment processing
- Order creation
Low-Priority / Secondary Features
While important, these features are not latency-critical:
- Add items to wishlist
- Recommend items on homepage
- Notify users about order updates
These can tolerate relaxed consistency and asynchronous processing.
Non-Functional Requirements
The system must satisfy several critical quality attributes:
- Low Latency β Especially for search and browsing
- High Availability β Search and read paths must remain responsive
- Strong Consistency β Required for order placement & inventory updates
- Eventual Consistency β Acceptable for item onboarding, search indexing, etc.
Core Entities
The system revolves around several fundamental entities:
- Item β Product metadata & pricing
- Seller β Product owners / merchants
- Buyer β End users / customers
- Cart β Temporary selection of items
- Order β Finalized purchase record
API Design
The platform exposes APIs for catalog management, discovery, cart updates, and checkout.
Add Item (Seller)
Creates a new item in the catalog.
Endpoint:
POST /items
Request Body:
{
"name": "Wireless Mouse",
"description": "Ergonomic Bluetooth Mouse",
"price": 999,
"quantity": 100
}
What it does?
-
Validates seller permissions
-
Persists item in catalog database
-
Publishes item-created event for downstream systems
Search Items
Retrieves items matching a search term with pagination support.
Endpoint:
GET /items?term={searchTerm}&cursor={cursorToken}&size={pageSize}
Query Parameters
-
term β User search keyword
-
cursor β Pagination token
-
size β Number of results per page
What id does?
-
Served from search infrastructure (e.g., Elasticsearch)
-
Optimized for low latency and high availability
-
Eventual consistency is acceptable
Add Item to Cart
Adds or updates items in the buyerβs cart.
Endpoint:
PATCH /cart
Request Body:
{
"itemId": "item123",
"quantity": 2
}
What it does?
-
Updates cart storage
-
May perform soft inventory checks
-
Designed for high-frequency updates
Checkout / Place Order
Converts a cart into a finalized order.
Endpoint:
POST /checkout
Request Body:
{
"cartId": "cart456",
"amount": 1998,
"paymentInfo": {
"paymentType": "CARD",
"cardNo": "**** **** **** 1234"
}
}
Critical transactional workflow:
-
Validate inventory availability
-
Atomically reserve / deduct inventory
-
Process payment
-
Create order record
-
Trigger downstream workflows
-
Requires strong consistency and careful failure handling.
Scale Assumptions
An Amazon-scale system must support extreme usage patterns. Let's assume the below numbers.
- Daily Active Users (DAU) 50M β 200M+ globally
- Peak Requests per Second (RPS) 100K β 500K+
- Product Catalog Size 100M+ SKUs
- Orders per Day 5M β 15M+ (event spikes)
- Concurrent Sessions 1M+ active users
- Data Volume Petabytes of catalog & logs
High Level Design
Architectural Implications of Scale
-
Such scale introduces several engineering challenges:
-
Search systems must deliver low-latency results
-
Inventory management must prevent overselling
-
Traffic spikes require elastic scaling
-
Databases must handle high concurrency
-
Failures must be gracefully handled
This architecture illustrates a classic microservices-based e-commerce platform similar to Amazon.
The core idea behind this design is:
Separation of concerns + Independent scalability + Fault isolation
Each service owns a specific responsibility and database.
API Gateway (System Entry Point)
API Gateway is the front door of the system. It has many responsibilities:
- Routes requests to appropriate services
- Handles authentication / authorization
- Rate limiting & throttling
- Centralized logging & monitoring
- Request aggregation
- Prevents clients from calling internal services directly
- Centralizes security rules
- Simplifies frontend interactions
User Service
User Service handles User accounts & identity.
- Registration / login
- Profile management
- Address storage
- Authentication / tokens
- Account settings
Database: User DB
User data can be stored in a separate database because it has a different scaling patterns and security isolation needs.
Search Service
Search Service owns Product discovery & search.
- Keyword search
- Filters & sorting
- Autocomplete
- Relevance ranking
- Synonym / typo handling
Database: ElasticSearch DB
We can use ElasticSearch here because:
- Optimized for full-text search
- Low latency queries
- Built-in ranking & scoring
Why ElasticSearch DB for search?
- Inverted index data structure makes text search extremely fast.
- A typical database would need LIKE '%term%', which is slow. Elasticsearch does it in milliseconds
- Supports fuzzy matching, partial words, typo tolerance, stemming, synonyms, etc
- New data becomes searchable in < 1 second after being indexed
Item (Product) Service
Item Service owns Product catalog.
- Product details
- Descriptions / attributes
- Pricing metadata
- Variants (size/color/etc.)
- Availability flags
Database: Item Mongo DB
Why MongoDB?
- Highly variable schema
- Nested attributes
- Heavy read patterns
- MongoDB is good for
Example Product JSON Schema:
{
{
"_id": ObjectId("..."),
"sku": "TSHIRT-001",
"name": "Basic Cotton T-Shirt",
"brand": "ComfortWear",
"category": "apparel",
"type": "variant",
"description": "Soft 100% cotton T-shirt available in multiple colors and sizes.",
"tags": ["t-shirt", "casual", "cotton", "unisex"],
"attributes": {
"material": "cotton",
"gender": "unisex",
"fit": "regular"
},
"variants": [
{
"variant_id": "TSHIRT-001-RD-M",
"color": "red",
"size": "M",
"stock": 20,
"price": 15.99,
"images": [
"https://cdn.site.com/products/tshirt-red-front.jpg",
"https://cdn.site.com/products/tshirt-red-back.jpg"
]
},
...
...
"created_at": "2025-06-15T10:00:00Z",
"updated_at": "2025-06-15T10:00:00Z"
}
}
{
"_id": ObjectId("..."),
"sku": "TV-4K-001",
"name": "42-inch 4K Smart TV",
"brand": "ViewLux",
"category": "electronics",
"type": "simple",
"description": "Smart 4K UHD TV with HDR, built-in WiFi, and voice assistant support.",
"tags": ["tv", "4k", "smart", "electronics"],
"specifications": {
"screen_size": "42 inches",
"resolution": "3840 x 2160",
"panel_type": "LED",
"smart_tv": true,
"hdmi_ports": 3,
"usb_ports": 2,
"os": "Android TV"
},
"price": 349.99,
"stock": 8,
"images": [
"https://cdn.site.com/products/tv-front.jpg",
"https://cdn.site.com/products/tv-back.jpg"
],
"status": "active",
"created_at": "2025-06-15T10:00:00Z",
"updated_at": "2025-06-15T10:00:00Z"
}
Cart Service
Cart Service owns Shopping carts responsibilities:
- Add / remove items
- Quantity updates
- Price snapshot at add time
- Cart expiration
- Cross-device cart merging
Database: Cart DB
Cart stores Price at time of add, NOT live price. It should be our design discussion to Prevents pricing inconsistencies during checkout (like whether to use cart price or current price).
Checkout Service (System Orchestrator)
The most critical component in the purchase flow is Checkout Service which handles the whole order flow.
- Validates cart
- Calculates totals / taxes / shipping
- Calls inventory service to check if inventory available
- Calls payment service to make actual payment
- Creates orders entry
- Handles retries / failures
Checkout coordinates multiple services and acts like a distributed transaction manager.
Inventory Service
Inventory Service owns Stock management.
- Check availability of items
- Reserve inventory
- Deduct stock
- Prevent overselling
- Release reservations
Database: Inventory DB
Inventory is:
- Highly contested
- Correctness-critical
- Consistency-sensitive
So having a separate database at this scale is impressive.
Payment Service
Payment Service owns Payment processing & gateway integrations.
- Gateway communication (Razorpay, etc.)
- Payment retries / timeouts
- Fraud detection
- Refund handling
- Payment state tracking
Payments involve:
- External dependencies
- High failure probability
- Strict security requirements
Hence, a separate service is necessary.
Order View Service
Owns Orders listing and viewing order details. It will read from Order database but to separate the read and write traffic a separate service is introduced.
Blob Storage (Media Layer)
Stores:
- Product images
- Media assets
Why Not Store in DB?
- Large binary objects
- CDN compatibility
- Better scalability & performance
Example Request Flows
User Searches Product
Client β API Gateway β Search Service β ElasticSearch DB
User Views Product
Client β API Gateway β Item Service β Item DB
User Adds to Cart
Client β API Gateway β Cart Service β Cart DB
User Checks Out
Checkout Service β
- Inventory Service
- Payment Service
- Order Service
This architecture reflects real-world production practices:
- Microservices architecture
- Database per service (no shared DB)
- Independent scaling
- Fault isolation
- Storage optimized per workload
- Orchestration via Checkout
Key Design Principles
-
Separation of Concerns β Search vs transactional workflows
-
Event-Driven Propagation β Catalog β Search / Recommendations
-
Strong Consistency Where Required β Checkout & Inventory
-
Eventual Consistency Where Acceptable β Search & Discovery
-
Independent Scaling β Read-heavy vs write-heavy paths
Item Onboarding Flow
Item onboarding is the process where sellers add new products to the platform.
Seller Page
The Seller Page is the primary entry point for all item creations and updates.
Typical actions include:
- Creating new listings
- Updating product details
- Uploading images
- Setting pricing & inventory
Seller-submitted data is considered untrusted and must pass through verification and validation layers before becoming part of the system.
Inbound Layer & Topic
After submission, item data flows through multiple internal services responsible for:
- Validation & verification
- Business rule enforcement
- Data normalization
- Compliance / fraud checks
Once validated, the system publishes the item event to an Inbound Topic.
This event-driven step enables:
- Loose coupling between services
- Independent scaling of consumers
- Failure isolation
- Asynchronous processing
Downstream services (Item, Inventory, Warehouse, Search, etc.) consume the topic and process item information independently.
Item Service
Responsible for:
- Storing product metadata
- Managing attributes & updates
Uses a flexible DB (e.g., MongoDB) due to variable schemas.
Blob Storage
Images are stored separately for:
- Scalability
- CDN delivery
- Reduced DB load
Inventory Service
Stores:
- Item β Quantity mapping
- Stock updates
Requires strong consistency.
Search Consumer
Updates search indices asynchronously to avoid blocking onboarding.
Deep Dive - Item Onboarding Flow
This deep dive section architecture shows how item-related events power a real-time recommendation engine using a streaming pipeline.
Spark Streaming (Real-Time Processing)
Spark Streaming continuously consumes events from the Inbound Topic.
Responsibilities:
- Parse item events
- Enrich / transform data
- Filter relevant signals
- Generate analytical features
This layer converts raw system events into structured data for downstream analytics.
Hadoop Cluster (Durable Data Lake)
Processed streams are stored in Hadoop.
Why Hadoop?
- Cheap large-scale storage
- Historical event retention
- Batch analytics compatibility
- Replay capability
Acts as the long-term behavioral and item-event repository.
Spark Cluster (Batch / ML Processing)
The Spark Cluster performs deeper computation:
- Feature engineering
- Model training
- Aggregations & correlations
- User-item affinity calculations
This stage transforms stored data into recommendation-ready signals.
Recommendation Service
Consumes outputs from Spark jobs.
Responsibilities:
- Serve recommendations
- Personalize results
- Rank items
- Respond with low latency
Optimized for fast read-heavy workloads.
Why Streaming Is Critical for Recommendations
Recommendations benefit from near-real-time updates:
- New items become discoverable quickly
- Price / availability changes reflected instantly
- User behavior captured continuously
Without streaming β stale recommendations.
Key Design Insight
System events β Stream processing β Data lake β ML computation β Live recommendations
Order Checkout Flow
The order flow coordinates multiple services to safely convert a userβs cart into a confirmed purchase.
Step 1 β Checkout Initiation
The process starts when the user clicks Place Order on the Checkout Page.
Request path:
Client β API Gateway β Checkout Service
The Checkout Service acts as the orchestrator of the transaction.
Step 2 β Inventory Validation
Checkout Service β Inventory Service
Responsibilities:
- Verify stock availability
- Reserve inventory (temporary lock)
- Prevent overselling
If inventory fails β order creation stops.
Step 3 β Order Creation
If validation succeeds:
Checkout Service β Order DB
Stored fields typically include:
iduser_idamountcurrencystatuspayment_statuspayment_txn_id
Initial status can be just 'CREATED'
Step 4 β Payment Processing
Checkout Service β Payment Service
Responsibilities:
- Initiate payment with gateway
- Handle retries / failures
- Return payment status
Payment is treated as an external dependency and may fail independently.
Step 5: Update Order Status
After Payment Success or Failure we need to update the Order Table entry status with either 'CONFIRMED' or 'PAYMENT_FAILED'.
Order Expiry:
Order expiry is a critical safeguard in payment-aware systems because orders may be created while payment is still pending. Users frequently abandon checkout flows due to failures, distractions, or connectivity issues, yet inventory may already be reserved. Without an expiry mechanism, these unpaid orders would indefinitely lock stock, distort inventory accuracy, and degrade system state. By enforcing a time limit, the system can automatically invalidate stale orders and release reserved resources, ensuring inventory remains available to legitimate buyers and the platform stays operationally consistent.
A practical way to implement order expiry is by leveraging cache TTL (Time-To-Live) instead of relying solely on database scans or cron jobs. When an order is created, the system writes a lightweight entry to a distributed cache (such as Redis) with a TTL equal to the payment window. For example, an order created at checkout can be stored with a 5-minute TTL.
If payment succeeds, the cache entry is explicitly cleared.
If the TTL expires, the cache automatically evicts the key, which can trigger expiry handling logic β such as marking the order as EXPIRED and releasing inventory.
This approach avoids expensive polling, scales naturally under high traffic, and provides deterministic timeout behavior with minimal infrastructure overhead. Cache-driven expiry is especially effective because order timeout is inherently time-based, making TTL semantics a perfect fit.

The Problem β Distributed Transactions
Order processing in a microservices architecture is inherently a distributed transaction.
Challenges:
- Multiple services participate (Checkout, Inventory, Payment, Logistics)
- Each service owns its own database
- No shared ACID transaction boundary
- Network calls may fail or timeout
- Partial failures can corrupt system state
Example failure scenario:
- Inventory reserved β
- Payment failed β
Without proper coordination:
- Stock remains locked
- Order state becomes inconsistent
- System correctness breaks
Traditional database transactions cannot solve this across services.
The Solution β Saga Pattern
Saga models the workflow as a sequence of event-driven steps with compensating actions.
Example Saga Flow
- User places order
- CheckoutService creates order β emits
OrderCreated - InventoryService listens β reserves stock β emits
InventoryReserved - PaymentService listens β charges payment β emits
PaymentCompleted - LogisticsService listens β prepares shipment β emits
ShipmentReady - CheckoutService finalizes β marks
OrderCompleted
β Failure Handling (Compensation)
If any step fails:
- InventoryService β Un-reserve stock
- PaymentService β Refund payment
- CheckoutService β Cancel order
Each service reverses its own side effects.
Why Saga Works
- No cross-service locking required
- Failures are recoverable
- Services remain loosely coupled
- Workflow becomes resilient
- Consistency achieved eventually
Order View Flow - Read Path
The viewing order flow is a read-only, latency-sensitive path that allows users to access their order history and details. Unlike checkout, this flow is optimized for speed and availability.
Request Flow
When a user opens the Order View Page:
Client β API Gateway β Order Service
The API Gateway routes authenticated requests to the Order Service.
Order Service Responsibilities
The Order Service handles:
- Fetching orders for a user
- Retrieving order details by
order_id - Filtering by status if needed
All queries are performed against the Order DB, which acts as the source of truth.
Performance Considerations
Since order views are frequent:
- Queries must be indexed by
user_idandorder_id - Responses should be low latency
- Read load must not impact write-heavy flows
Caching Strategy
Caching is commonly applied to reduce DB pressure:
orders:user:{userId}
order:{orderId}

Well we can also utilize the same cache we used for Order TTL.
Order Archival
We can introduce one order archival service that will clear out old orders from Order database and store it in some archival DB.
In case user wants to view such old orders those can be fetched from archival DB.

Overall Architecture
