Amazon-Like E-Commerce System Design

This system models a simplified Amazon-style marketplace supporting sellers, buyers, product discovery, and order placement. The design emphasizes scalability, availability, and correctness for critical operations.

Functional Requirements

The platform must support the following core user capabilities:

Seller Operations

Sellers can add new items to the catalog
Sellers can define attributes such as name, description, price, and quantity
Item onboarding should propagate to downstream systems (search, recommendations)

Buyer Operations

Buyers interact with the platform through several workflows:

Search Items – Discover products using keywords
View Item Details – Retrieve complete product information
Add Item to Cart – Select desired items for purchase
Checkout / Place Order – Complete purchase with payment processing
View Orders – Inspect order history and status

Checkout involves coordination with:

Inventory validation & reservation
Payment processing
Order creation

Low-Priority / Secondary Features

While important, these features are not latency-critical:

Add items to wishlist
Recommend items on homepage
Notify users about order updates

These can tolerate relaxed consistency and asynchronous processing.

Non-Functional Requirements

The system must satisfy several critical quality attributes:

Low Latency → Especially for search and browsing
High Availability → Search and read paths must remain responsive
Strong Consistency → Required for order placement & inventory updates
Eventual Consistency → Acceptable for item onboarding, search indexing, etc.

Core Entities

The system revolves around several fundamental entities:

Item → Product metadata & pricing
Seller → Product owners / merchants
Buyer → End users / customers
Cart → Temporary selection of items
Order → Finalized purchase record

API Design

The platform exposes APIs for catalog management, discovery, cart updates, and checkout.

Add Item (Seller)

Creates a new item in the catalog.

Endpoint:

POST /items

Request Body:

{
  "name": "Wireless Mouse",
  "description": "Ergonomic Bluetooth Mouse",
  "price": 999,
  "quantity": 100
}

What it does?

Validates seller permissions
Persists item in catalog database
Publishes item-created event for downstream systems

Search Items

Retrieves items matching a search term with pagination support.

Endpoint:

GET /items?term={searchTerm}&cursor={cursorToken}&size={pageSize}

Query Parameters

term → User search keyword
cursor → Pagination token
size → Number of results per page

What id does?

Served from search infrastructure (e.g., Elasticsearch)
Optimized for low latency and high availability
Eventual consistency is acceptable

Add Item to Cart

Adds or updates items in the buyer’s cart.

Endpoint:

PATCH /cart

Request Body:

{
  "itemId": "item123",
  "quantity": 2
}

What it does?

Updates cart storage
May perform soft inventory checks
Designed for high-frequency updates

Checkout / Place Order

Converts a cart into a finalized order.

Endpoint:

POST /checkout

Request Body:

{
  "cartId": "cart456",
  "amount": 1998,
  "paymentInfo": {
    "paymentType": "CARD",
    "cardNo": "**** **** **** 1234"
  }
}

Critical transactional workflow:

Validate inventory availability
Atomically reserve / deduct inventory
Process payment
Create order record
Trigger downstream workflows
Requires strong consistency and careful failure handling.

Scale Assumptions

An Amazon-scale system must support extreme usage patterns. Let's assume the below numbers.

Daily Active Users (DAU) 50M – 200M+ globally
Peak Requests per Second (RPS) 100K – 500K+
Product Catalog Size 100M+ SKUs
Orders per Day 5M – 15M+ (event spikes)
Concurrent Sessions 1M+ active users
Data Volume Petabytes of catalog & logs

High Level Design

Architectural Implications of Scale

Such scale introduces several engineering challenges:
Search systems must deliver low-latency results
Inventory management must prevent overselling
Traffic spikes require elastic scaling
Databases must handle high concurrency
Failures must be gracefully handled

This architecture illustrates a classic microservices-based e-commerce platform similar to Amazon.

The core idea behind this design is:

Separation of concerns + Independent scalability + Fault isolation

Each service owns a specific responsibility and database.

API Gateway (System Entry Point)

API Gateway is the front door of the system. It has many responsibilities:

Routes requests to appropriate services
Handles authentication / authorization
Rate limiting & throttling
Centralized logging & monitoring
Request aggregation
Prevents clients from calling internal services directly
Centralizes security rules
Simplifies frontend interactions

User Service

User Service handles User accounts & identity.

Registration / login
Profile management
Address storage
Authentication / tokens
Account settings

Database: User DB

User data can be stored in a separate database because it has a different scaling patterns and security isolation needs.

Search Service

Search Service owns Product discovery & search.

Keyword search
Filters & sorting
Autocomplete
Relevance ranking
Synonym / typo handling

Database: ElasticSearch DB

We can use ElasticSearch here because:

Optimized for full-text search
Low latency queries
Built-in ranking & scoring

Why ElasticSearch DB for search?

Inverted index data structure makes text search extremely fast.
A typical database would need LIKE '%term%', which is slow. Elasticsearch does it in milliseconds
Supports fuzzy matching, partial words, typo tolerance, stemming, synonyms, etc
New data becomes searchable in < 1 second after being indexed

Item (Product) Service

Item Service owns Product catalog.

Product details
Descriptions / attributes
Pricing metadata
Variants (size/color/etc.)
Availability flags

Database: Item Mongo DB

Why MongoDB?

Highly variable schema
Nested attributes
Heavy read patterns
MongoDB is good for

Example Product JSON Schema:

{
  {
  "_id": ObjectId("..."),
  "sku": "TSHIRT-001",
  "name": "Basic Cotton T-Shirt",
  "brand": "ComfortWear",
  "category": "apparel",
  "type": "variant",
  "description": "Soft 100% cotton T-shirt available in multiple colors and sizes.",
  "tags": ["t-shirt", "casual", "cotton", "unisex"],
  "attributes": {
    "material": "cotton",
    "gender": "unisex",
    "fit": "regular"
  },
  "variants": [
    {
      "variant_id": "TSHIRT-001-RD-M",
      "color": "red",
      "size": "M",
      "stock": 20,
      "price": 15.99,
      "images": [
        "https://cdn.site.com/products/tshirt-red-front.jpg",
        "https://cdn.site.com/products/tshirt-red-back.jpg"
      ]
    },
   ...
   ...
  "created_at": "2025-06-15T10:00:00Z",
  "updated_at": "2025-06-15T10:00:00Z"
}
}


{
  "_id": ObjectId("..."),
  "sku": "TV-4K-001",
  "name": "42-inch 4K Smart TV",
  "brand": "ViewLux",
  "category": "electronics",
  "type": "simple",
  "description": "Smart 4K UHD TV with HDR, built-in WiFi, and voice assistant support.",
  "tags": ["tv", "4k", "smart", "electronics"],
  "specifications": {
    "screen_size": "42 inches",
    "resolution": "3840 x 2160",
    "panel_type": "LED",
    "smart_tv": true,
    "hdmi_ports": 3,
    "usb_ports": 2,
    "os": "Android TV"
  },
  "price": 349.99,
  "stock": 8,
  "images": [
    "https://cdn.site.com/products/tv-front.jpg",
    "https://cdn.site.com/products/tv-back.jpg"
  ],
  "status": "active",
  "created_at": "2025-06-15T10:00:00Z",
  "updated_at": "2025-06-15T10:00:00Z"
}

Cart Service

Cart Service owns Shopping carts responsibilities:

Add / remove items
Quantity updates
Price snapshot at add time
Cart expiration
Cross-device cart merging

Database: Cart DB

Cart stores Price at time of add, NOT live price. It should be our design discussion to Prevents pricing inconsistencies during checkout (like whether to use cart price or current price).

Checkout Service (System Orchestrator)

The most critical component in the purchase flow is Checkout Service which handles the whole order flow.

Validates cart
Calculates totals / taxes / shipping
Calls inventory service to check if inventory available
Calls payment service to make actual payment
Creates orders entry
Handles retries / failures

Checkout coordinates multiple services and acts like a distributed transaction manager.

Inventory Service

Inventory Service owns Stock management.

Check availability of items
Reserve inventory
Deduct stock
Prevent overselling
Release reservations

Database: Inventory DB

Inventory is:

Highly contested
Correctness-critical
Consistency-sensitive

So having a separate database at this scale is impressive.

Payment Service

Payment Service owns Payment processing & gateway integrations.

Gateway communication (Razorpay, etc.)
Payment retries / timeouts
Fraud detection
Refund handling
Payment state tracking

Payments involve:

External dependencies
High failure probability
Strict security requirements

Hence, a separate service is necessary.

Order View Service

Owns Orders listing and viewing order details. It will read from Order database but to separate the read and write traffic a separate service is introduced.

Blob Storage (Media Layer)

Stores:

Product images
Media assets

Why Not Store in DB?

Large binary objects
CDN compatibility
Better scalability & performance

Example Request Flows

User Searches Product

Client → API Gateway → Search Service → ElasticSearch DB

User Views Product

Client → API Gateway → Item Service → Item DB

User Adds to Cart

Client → API Gateway → Cart Service → Cart DB

User Checks Out

Checkout Service →

Inventory Service
Payment Service
Order Service

This architecture reflects real-world production practices:

Microservices architecture
Database per service (no shared DB)
Independent scaling
Fault isolation
Storage optimized per workload
Orchestration via Checkout

Key Design Principles

Separation of Concerns → Search vs transactional workflows
Event-Driven Propagation → Catalog → Search / Recommendations
Strong Consistency Where Required → Checkout & Inventory
Eventual Consistency Where Acceptable → Search & Discovery
Independent Scaling → Read-heavy vs write-heavy paths

Item Onboarding Flow

Item onboarding is the process where sellers add new products to the platform.

Seller Page

The Seller Page is the primary entry point for all item creations and updates.

Typical actions include:

Creating new listings
Updating product details
Uploading images
Setting pricing & inventory

Seller-submitted data is considered untrusted and must pass through verification and validation layers before becoming part of the system.

Inbound Layer & Topic

After submission, item data flows through multiple internal services responsible for:

Validation & verification
Business rule enforcement
Data normalization
Compliance / fraud checks

Once validated, the system publishes the item event to an Inbound Topic.

This event-driven step enables:

Loose coupling between services
Independent scaling of consumers
Failure isolation
Asynchronous processing

Downstream services (Item, Inventory, Warehouse, Search, etc.) consume the topic and process item information independently.

Item Service

Responsible for:

Storing product metadata
Managing attributes & updates

Uses a flexible DB (e.g., MongoDB) due to variable schemas.

Blob Storage

Images are stored separately for:

Scalability
CDN delivery
Reduced DB load

Inventory Service

Stores:

Item → Quantity mapping
Stock updates

Requires strong consistency.

Search Consumer

Updates search indices asynchronously to avoid blocking onboarding.

Deep Dive - Item Onboarding Flow

This deep dive section architecture shows how item-related events power a real-time recommendation engine using a streaming pipeline.

Spark Streaming (Real-Time Processing)

Spark Streaming continuously consumes events from the Inbound Topic.

Responsibilities:

Parse item events
Enrich / transform data
Filter relevant signals
Generate analytical features

This layer converts raw system events into structured data for downstream analytics.

Hadoop Cluster (Durable Data Lake)

Processed streams are stored in Hadoop.

Why Hadoop?

Cheap large-scale storage
Historical event retention
Batch analytics compatibility
Replay capability

Acts as the long-term behavioral and item-event repository.

Spark Cluster (Batch / ML Processing)

The Spark Cluster performs deeper computation:

Feature engineering
Model training
Aggregations & correlations
User-item affinity calculations

This stage transforms stored data into recommendation-ready signals.

Recommendation Service

Consumes outputs from Spark jobs.

Responsibilities:

Serve recommendations
Personalize results
Rank items
Respond with low latency

Optimized for fast read-heavy workloads.

Why Streaming Is Critical for Recommendations

Recommendations benefit from near-real-time updates:

New items become discoverable quickly
Price / availability changes reflected instantly
User behavior captured continuously

Without streaming → stale recommendations.

Key Design Insight

System events → Stream processing → Data lake → ML computation → Live recommendations

Order Checkout Flow

The order flow coordinates multiple services to safely convert a user’s cart into a confirmed purchase.

Step 1 – Checkout Initiation

The process starts when the user clicks Place Order on the Checkout Page.

Request path:

Client → API Gateway → Checkout Service

The Checkout Service acts as the orchestrator of the transaction.

Step 2 – Inventory Validation

Checkout Service → Inventory Service

Responsibilities:

Verify stock availability
Reserve inventory (temporary lock)
Prevent overselling

If inventory fails → order creation stops.

Step 3 – Order Creation

If validation succeeds:

Checkout Service → Order DB

Stored fields typically include:

id
user_id
amount
currency
status
payment_status
payment_txn_id

Initial status can be just 'CREATED'

Step 4 – Payment Processing

Checkout Service → Payment Service

Responsibilities:

Initiate payment with gateway
Handle retries / failures
Return payment status

Payment is treated as an external dependency and may fail independently.

Step 5: Update Order Status

After Payment Success or Failure we need to update the Order Table entry status with either 'CONFIRMED' or 'PAYMENT_FAILED'.

Order Expiry:

Order expiry is a critical safeguard in payment-aware systems because orders may be created while payment is still pending. Users frequently abandon checkout flows due to failures, distractions, or connectivity issues, yet inventory may already be reserved. Without an expiry mechanism, these unpaid orders would indefinitely lock stock, distort inventory accuracy, and degrade system state. By enforcing a time limit, the system can automatically invalidate stale orders and release reserved resources, ensuring inventory remains available to legitimate buyers and the platform stays operationally consistent.

A practical way to implement order expiry is by leveraging cache TTL (Time-To-Live) instead of relying solely on database scans or cron jobs. When an order is created, the system writes a lightweight entry to a distributed cache (such as Redis) with a TTL equal to the payment window. For example, an order created at checkout can be stored with a 5-minute TTL.

If payment succeeds, the cache entry is explicitly cleared.

If the TTL expires, the cache automatically evicts the key, which can trigger expiry handling logic — such as marking the order as EXPIRED and releasing inventory.

This approach avoids expensive polling, scales naturally under high traffic, and provides deterministic timeout behavior with minimal infrastructure overhead. Cache-driven expiry is especially effective because order timeout is inherently time-based, making TTL semantics a perfect fit.

The Problem – Distributed Transactions

Order processing in a microservices architecture is inherently a distributed transaction.

Challenges:

Multiple services participate (Checkout, Inventory, Payment, Logistics)
Each service owns its own database
No shared ACID transaction boundary
Network calls may fail or timeout
Partial failures can corrupt system state

Example failure scenario:

Inventory reserved ✅
Payment failed ❌

Without proper coordination:

Stock remains locked
Order state becomes inconsistent
System correctness breaks

Traditional database transactions cannot solve this across services.

The Solution – Saga Pattern

Saga models the workflow as a sequence of event-driven steps with compensating actions.

Example Saga Flow

User places order
CheckoutService creates order → emits OrderCreated
InventoryService listens → reserves stock → emits InventoryReserved
PaymentService listens → charges payment → emits PaymentCompleted
LogisticsService listens → prepares shipment → emits ShipmentReady
CheckoutService finalizes → marks OrderCompleted

❌ Failure Handling (Compensation)

If any step fails:

InventoryService → Un-reserve stock
PaymentService → Refund payment
CheckoutService → Cancel order

Each service reverses its own side effects.

Why Saga Works

No cross-service locking required
Failures are recoverable
Services remain loosely coupled
Workflow becomes resilient
Consistency achieved eventually

Order View Flow - Read Path

The viewing order flow is a read-only, latency-sensitive path that allows users to access their order history and details. Unlike checkout, this flow is optimized for speed and availability.

Request Flow

When a user opens the Order View Page:

Client → API Gateway → Order Service

The API Gateway routes authenticated requests to the Order Service.

Order Service Responsibilities

The Order Service handles:

Fetching orders for a user
Retrieving order details by order_id
Filtering by status if needed

All queries are performed against the Order DB, which acts as the source of truth.

Performance Considerations

Since order views are frequent:

Queries must be indexed by user_id and order_id
Responses should be low latency
Read load must not impact write-heavy flows

Caching Strategy

Caching is commonly applied to reduce DB pressure:

orders:user:{userId}
order:{orderId}

Well we can also utilize the same cache we used for Order TTL.

Order Archival

We can introduce one order archival service that will clear out old orders from Order database and store it in some archival DB.

In case user wants to view such old orders those can be fetched from archival DB.

Amazon-Like E-Commerce System Design

Functional Requirements

Seller Operations

Buyer Operations

Low-Priority / Secondary Features

Non-Functional Requirements

Core Entities

API Design

Add Item (Seller)

What it does?

Search Items

Query Parameters

What id does?

Add Item to Cart

What it does?

Checkout / Place Order

Critical transactional workflow:

Scale Assumptions

High Level Design

Architectural Implications of Scale

API Gateway (System Entry Point)

User Service

Search Service

Item (Product) Service

Cart Service

Checkout Service (System Orchestrator)

Inventory Service

Payment Service

Order View Service

Blob Storage (Media Layer)

Example Request Flows

User Searches Product

User Views Product

User Adds to Cart

User Checks Out

Item Onboarding Flow

Seller Page

Inbound Layer & Topic

Item Service

Blob Storage

Inventory Service

Search Consumer

Deep Dive - Item Onboarding Flow

Spark Streaming (Real-Time Processing)

Hadoop Cluster (Durable Data Lake)

Spark Cluster (Batch / ML Processing)

Recommendation Service

Why Streaming Is Critical for Recommendations

Key Design Insight

Order Checkout Flow

Step 1 – Checkout Initiation

Step 2 – Inventory Validation

Step 3 – Order Creation

Step 4 – Payment Processing

Step 5: Update Order Status

Order Expiry:

The Problem – Distributed Transactions

The Solution – Saga Pattern

❌ Failure Handling (Compensation)

Why Saga Works

Order View Flow - Read Path

Request Flow

Order Service Responsibilities

Performance Considerations

Caching Strategy

Order Archival

Overall Architecture