πŸš€ πŸš€ Launch Offer β€” Courses starting at β‚Ή1499 (Limited Time)
CortexCookie

Amazon-Like E-Commerce System Design

This system models a simplified Amazon-style marketplace supporting sellers, buyers, product discovery, and order placement. The design emphasizes scalability, availability, and correctness for critical operations.

Functional Requirements

The platform must support the following core user capabilities:

Seller Operations

  • Sellers can add new items to the catalog
  • Sellers can define attributes such as name, description, price, and quantity
  • Item onboarding should propagate to downstream systems (search, recommendations)

Buyer Operations

Buyers interact with the platform through several workflows:

  • Search Items – Discover products using keywords
  • View Item Details – Retrieve complete product information
  • Add Item to Cart – Select desired items for purchase
  • Checkout / Place Order – Complete purchase with payment processing
  • View Orders – Inspect order history and status

Checkout involves coordination with:

  • Inventory validation & reservation
  • Payment processing
  • Order creation

Low-Priority / Secondary Features

While important, these features are not latency-critical:

  • Add items to wishlist
  • Recommend items on homepage
  • Notify users about order updates

These can tolerate relaxed consistency and asynchronous processing.

Non-Functional Requirements

The system must satisfy several critical quality attributes:

  • Low Latency β†’ Especially for search and browsing
  • High Availability β†’ Search and read paths must remain responsive
  • Strong Consistency β†’ Required for order placement & inventory updates
  • Eventual Consistency β†’ Acceptable for item onboarding, search indexing, etc.

Core Entities

The system revolves around several fundamental entities:

  • Item β†’ Product metadata & pricing
  • Seller β†’ Product owners / merchants
  • Buyer β†’ End users / customers
  • Cart β†’ Temporary selection of items
  • Order β†’ Finalized purchase record

API Design

The platform exposes APIs for catalog management, discovery, cart updates, and checkout.

Add Item (Seller)

Creates a new item in the catalog.

Endpoint:

POST /items

Request Body:

{
  "name": "Wireless Mouse",
  "description": "Ergonomic Bluetooth Mouse",
  "price": 999,
  "quantity": 100
}

What it does?

  • Validates seller permissions

  • Persists item in catalog database

  • Publishes item-created event for downstream systems

Search Items

Retrieves items matching a search term with pagination support.

Endpoint:

GET /items?term={searchTerm}&cursor={cursorToken}&size={pageSize}

Query Parameters

  • term β†’ User search keyword

  • cursor β†’ Pagination token

  • size β†’ Number of results per page

What id does?

  • Served from search infrastructure (e.g., Elasticsearch)

  • Optimized for low latency and high availability

  • Eventual consistency is acceptable

Add Item to Cart

Adds or updates items in the buyer’s cart.

Endpoint:

PATCH /cart

Request Body:

{
  "itemId": "item123",
  "quantity": 2
}

What it does?

  • Updates cart storage

  • May perform soft inventory checks

  • Designed for high-frequency updates

Checkout / Place Order

Converts a cart into a finalized order.

Endpoint:

POST /checkout

Request Body:

{
  "cartId": "cart456",
  "amount": 1998,
  "paymentInfo": {
    "paymentType": "CARD",
    "cardNo": "**** **** **** 1234"
  }
}

Critical transactional workflow:

  • Validate inventory availability

  • Atomically reserve / deduct inventory

  • Process payment

  • Create order record

  • Trigger downstream workflows

  • Requires strong consistency and careful failure handling.

Scale Assumptions

An Amazon-scale system must support extreme usage patterns. Let's assume the below numbers.

  • Daily Active Users (DAU) 50M – 200M+ globally
  • Peak Requests per Second (RPS) 100K – 500K+
  • Product Catalog Size 100M+ SKUs
  • Orders per Day 5M – 15M+ (event spikes)
  • Concurrent Sessions 1M+ active users
  • Data Volume Petabytes of catalog & logs

High Level Design

Architectural Implications of Scale

  • Such scale introduces several engineering challenges:

  • Search systems must deliver low-latency results

  • Inventory management must prevent overselling

  • Traffic spikes require elastic scaling

  • Databases must handle high concurrency

  • Failures must be gracefully handled

This architecture illustrates a classic microservices-based e-commerce platform similar to Amazon.

The core idea behind this design is:

Separation of concerns + Independent scalability + Fault isolation

Each service owns a specific responsibility and database.

API Gateway (System Entry Point)

API Gateway is the front door of the system. It has many responsibilities:

  • Routes requests to appropriate services
  • Handles authentication / authorization
  • Rate limiting & throttling
  • Centralized logging & monitoring
  • Request aggregation
  • Prevents clients from calling internal services directly
  • Centralizes security rules
  • Simplifies frontend interactions

User Service

User Service handles User accounts & identity.

  • Registration / login
  • Profile management
  • Address storage
  • Authentication / tokens
  • Account settings

Database: User DB

User data can be stored in a separate database because it has a different scaling patterns and security isolation needs.

Search Service

Search Service owns Product discovery & search.

  • Keyword search
  • Filters & sorting
  • Autocomplete
  • Relevance ranking
  • Synonym / typo handling

Database: ElasticSearch DB

We can use ElasticSearch here because:

  • Optimized for full-text search
  • Low latency queries
  • Built-in ranking & scoring

Why ElasticSearch DB for search?

  • Inverted index data structure makes text search extremely fast.
  • A typical database would need LIKE '%term%', which is slow. Elasticsearch does it in milliseconds
  • Supports fuzzy matching, partial words, typo tolerance, stemming, synonyms, etc
  • New data becomes searchable in < 1 second after being indexed

Item (Product) Service

Item Service owns Product catalog.

  • Product details
  • Descriptions / attributes
  • Pricing metadata
  • Variants (size/color/etc.)
  • Availability flags

Database: Item Mongo DB

Why MongoDB?

  • Highly variable schema
  • Nested attributes
  • Heavy read patterns
  • MongoDB is good for

Example Product JSON Schema:

{
  {
  "_id": ObjectId("..."),
  "sku": "TSHIRT-001",
  "name": "Basic Cotton T-Shirt",
  "brand": "ComfortWear",
  "category": "apparel",
  "type": "variant",
  "description": "Soft 100% cotton T-shirt available in multiple colors and sizes.",
  "tags": ["t-shirt", "casual", "cotton", "unisex"],
  "attributes": {
    "material": "cotton",
    "gender": "unisex",
    "fit": "regular"
  },
  "variants": [
    {
      "variant_id": "TSHIRT-001-RD-M",
      "color": "red",
      "size": "M",
      "stock": 20,
      "price": 15.99,
      "images": [
        "https://cdn.site.com/products/tshirt-red-front.jpg",
        "https://cdn.site.com/products/tshirt-red-back.jpg"
      ]
    },
   ...
   ...
  "created_at": "2025-06-15T10:00:00Z",
  "updated_at": "2025-06-15T10:00:00Z"
}
}


{
  "_id": ObjectId("..."),
  "sku": "TV-4K-001",
  "name": "42-inch 4K Smart TV",
  "brand": "ViewLux",
  "category": "electronics",
  "type": "simple",
  "description": "Smart 4K UHD TV with HDR, built-in WiFi, and voice assistant support.",
  "tags": ["tv", "4k", "smart", "electronics"],
  "specifications": {
    "screen_size": "42 inches",
    "resolution": "3840 x 2160",
    "panel_type": "LED",
    "smart_tv": true,
    "hdmi_ports": 3,
    "usb_ports": 2,
    "os": "Android TV"
  },
  "price": 349.99,
  "stock": 8,
  "images": [
    "https://cdn.site.com/products/tv-front.jpg",
    "https://cdn.site.com/products/tv-back.jpg"
  ],
  "status": "active",
  "created_at": "2025-06-15T10:00:00Z",
  "updated_at": "2025-06-15T10:00:00Z"
}

Cart Service

Cart Service owns Shopping carts responsibilities:

  • Add / remove items
  • Quantity updates
  • Price snapshot at add time
  • Cart expiration
  • Cross-device cart merging

Database: Cart DB

Cart stores Price at time of add, NOT live price. It should be our design discussion to Prevents pricing inconsistencies during checkout (like whether to use cart price or current price).

Checkout Service (System Orchestrator)

The most critical component in the purchase flow is Checkout Service which handles the whole order flow.

  • Validates cart
  • Calculates totals / taxes / shipping
  • Calls inventory service to check if inventory available
  • Calls payment service to make actual payment
  • Creates orders entry
  • Handles retries / failures

Checkout coordinates multiple services and acts like a distributed transaction manager.

Inventory Service

Inventory Service owns Stock management.

  • Check availability of items
  • Reserve inventory
  • Deduct stock
  • Prevent overselling
  • Release reservations

Database: Inventory DB

Inventory is:

  • Highly contested
  • Correctness-critical
  • Consistency-sensitive

So having a separate database at this scale is impressive.

Payment Service

Payment Service owns Payment processing & gateway integrations.

  • Gateway communication (Razorpay, etc.)
  • Payment retries / timeouts
  • Fraud detection
  • Refund handling
  • Payment state tracking

Payments involve:

  • External dependencies
  • High failure probability
  • Strict security requirements

Hence, a separate service is necessary.

Order View Service

Owns Orders listing and viewing order details. It will read from Order database but to separate the read and write traffic a separate service is introduced.

Blob Storage (Media Layer)

Stores:

  • Product images
  • Media assets

Why Not Store in DB?

  • Large binary objects
  • CDN compatibility
  • Better scalability & performance
Feature Extraction

Example Request Flows

User Searches Product

Client β†’ API Gateway β†’ Search Service β†’ ElasticSearch DB

User Views Product

Client β†’ API Gateway β†’ Item Service β†’ Item DB

User Adds to Cart

Client β†’ API Gateway β†’ Cart Service β†’ Cart DB

User Checks Out

Checkout Service β†’

  • Inventory Service
  • Payment Service
  • Order Service

This architecture reflects real-world production practices:

  • Microservices architecture
  • Database per service (no shared DB)
  • Independent scaling
  • Fault isolation
  • Storage optimized per workload
  • Orchestration via Checkout

Key Design Principles

  • Separation of Concerns β†’ Search vs transactional workflows

  • Event-Driven Propagation β†’ Catalog β†’ Search / Recommendations

  • Strong Consistency Where Required β†’ Checkout & Inventory

  • Eventual Consistency Where Acceptable β†’ Search & Discovery

  • Independent Scaling β†’ Read-heavy vs write-heavy paths

Item Onboarding Flow

Item onboarding is the process where sellers add new products to the platform.

Seller Page

The Seller Page is the primary entry point for all item creations and updates.

Typical actions include:

  • Creating new listings
  • Updating product details
  • Uploading images
  • Setting pricing & inventory

Seller-submitted data is considered untrusted and must pass through verification and validation layers before becoming part of the system.

Inbound Layer & Topic

After submission, item data flows through multiple internal services responsible for:

  • Validation & verification
  • Business rule enforcement
  • Data normalization
  • Compliance / fraud checks

Once validated, the system publishes the item event to an Inbound Topic.

This event-driven step enables:

  • Loose coupling between services
  • Independent scaling of consumers
  • Failure isolation
  • Asynchronous processing

Downstream services (Item, Inventory, Warehouse, Search, etc.) consume the topic and process item information independently.

Item Service

Responsible for:

  • Storing product metadata
  • Managing attributes & updates

Uses a flexible DB (e.g., MongoDB) due to variable schemas.

Blob Storage

Images are stored separately for:

  • Scalability
  • CDN delivery
  • Reduced DB load

Inventory Service

Stores:

  • Item β†’ Quantity mapping
  • Stock updates

Requires strong consistency.

Search Consumer

Updates search indices asynchronously to avoid blocking onboarding.

Feature Extraction

Deep Dive - Item Onboarding Flow

This deep dive section architecture shows how item-related events power a real-time recommendation engine using a streaming pipeline.

Spark Streaming (Real-Time Processing)

Spark Streaming continuously consumes events from the Inbound Topic.

Responsibilities:

  • Parse item events
  • Enrich / transform data
  • Filter relevant signals
  • Generate analytical features

This layer converts raw system events into structured data for downstream analytics.

Hadoop Cluster (Durable Data Lake)

Processed streams are stored in Hadoop.

Why Hadoop?

  • Cheap large-scale storage
  • Historical event retention
  • Batch analytics compatibility
  • Replay capability

Acts as the long-term behavioral and item-event repository.

Spark Cluster (Batch / ML Processing)

The Spark Cluster performs deeper computation:

  • Feature engineering
  • Model training
  • Aggregations & correlations
  • User-item affinity calculations

This stage transforms stored data into recommendation-ready signals.

Recommendation Service

Consumes outputs from Spark jobs.

Responsibilities:

  • Serve recommendations
  • Personalize results
  • Rank items
  • Respond with low latency

Optimized for fast read-heavy workloads.

Why Streaming Is Critical for Recommendations

Recommendations benefit from near-real-time updates:

  • New items become discoverable quickly
  • Price / availability changes reflected instantly
  • User behavior captured continuously

Without streaming β†’ stale recommendations.

Key Design Insight

System events β†’ Stream processing β†’ Data lake β†’ ML computation β†’ Live recommendations

Feature Extraction

Order Checkout Flow

The order flow coordinates multiple services to safely convert a user’s cart into a confirmed purchase.

Step 1 – Checkout Initiation

The process starts when the user clicks Place Order on the Checkout Page.

Request path:

Client β†’ API Gateway β†’ Checkout Service

The Checkout Service acts as the orchestrator of the transaction.

Step 2 – Inventory Validation

Checkout Service β†’ Inventory Service

Responsibilities:

  • Verify stock availability
  • Reserve inventory (temporary lock)
  • Prevent overselling

If inventory fails β†’ order creation stops.

Step 3 – Order Creation

If validation succeeds:

Checkout Service β†’ Order DB

Stored fields typically include:

  • id
  • user_id
  • amount
  • currency
  • status
  • payment_status
  • payment_txn_id

Initial status can be just 'CREATED'

Step 4 – Payment Processing

Checkout Service β†’ Payment Service

Responsibilities:

  • Initiate payment with gateway
  • Handle retries / failures
  • Return payment status

Payment is treated as an external dependency and may fail independently.

Step 5: Update Order Status

After Payment Success or Failure we need to update the Order Table entry status with either 'CONFIRMED' or 'PAYMENT_FAILED'.

Order Expiry:

Order expiry is a critical safeguard in payment-aware systems because orders may be created while payment is still pending. Users frequently abandon checkout flows due to failures, distractions, or connectivity issues, yet inventory may already be reserved. Without an expiry mechanism, these unpaid orders would indefinitely lock stock, distort inventory accuracy, and degrade system state. By enforcing a time limit, the system can automatically invalidate stale orders and release reserved resources, ensuring inventory remains available to legitimate buyers and the platform stays operationally consistent.

A practical way to implement order expiry is by leveraging cache TTL (Time-To-Live) instead of relying solely on database scans or cron jobs. When an order is created, the system writes a lightweight entry to a distributed cache (such as Redis) with a TTL equal to the payment window. For example, an order created at checkout can be stored with a 5-minute TTL.

If payment succeeds, the cache entry is explicitly cleared.

If the TTL expires, the cache automatically evicts the key, which can trigger expiry handling logic β€” such as marking the order as EXPIRED and releasing inventory.

This approach avoids expensive polling, scales naturally under high traffic, and provides deterministic timeout behavior with minimal infrastructure overhead. Cache-driven expiry is especially effective because order timeout is inherently time-based, making TTL semantics a perfect fit.

image

The Problem – Distributed Transactions

Order processing in a microservices architecture is inherently a distributed transaction.

Challenges:

  • Multiple services participate (Checkout, Inventory, Payment, Logistics)
  • Each service owns its own database
  • No shared ACID transaction boundary
  • Network calls may fail or timeout
  • Partial failures can corrupt system state

Example failure scenario:

  • Inventory reserved βœ…
  • Payment failed ❌

Without proper coordination:

  • Stock remains locked
  • Order state becomes inconsistent
  • System correctness breaks

Traditional database transactions cannot solve this across services.

The Solution – Saga Pattern

Saga models the workflow as a sequence of event-driven steps with compensating actions.

Example Saga Flow

  • User places order
  • CheckoutService creates order β†’ emits OrderCreated
  • InventoryService listens β†’ reserves stock β†’ emits InventoryReserved
  • PaymentService listens β†’ charges payment β†’ emits PaymentCompleted
  • LogisticsService listens β†’ prepares shipment β†’ emits ShipmentReady
  • CheckoutService finalizes β†’ marks OrderCompleted
❌ Failure Handling (Compensation)

If any step fails:

  • InventoryService β†’ Un-reserve stock
  • PaymentService β†’ Refund payment
  • CheckoutService β†’ Cancel order

Each service reverses its own side effects.

Why Saga Works
  • No cross-service locking required
  • Failures are recoverable
  • Services remain loosely coupled
  • Workflow becomes resilient
  • Consistency achieved eventually

Order View Flow - Read Path

The viewing order flow is a read-only, latency-sensitive path that allows users to access their order history and details. Unlike checkout, this flow is optimized for speed and availability.

Request Flow

When a user opens the Order View Page:

Client β†’ API Gateway β†’ Order Service

The API Gateway routes authenticated requests to the Order Service.

Order Service Responsibilities

The Order Service handles:

  • Fetching orders for a user
  • Retrieving order details by order_id
  • Filtering by status if needed

All queries are performed against the Order DB, which acts as the source of truth.

Performance Considerations

Since order views are frequent:

  • Queries must be indexed by user_id and order_id
  • Responses should be low latency
  • Read load must not impact write-heavy flows

Caching Strategy

Caching is commonly applied to reduce DB pressure:

orders:user:{userId}
order:{orderId}

image

Well we can also utilize the same cache we used for Order TTL.

Order Archival

We can introduce one order archival service that will clear out old orders from Order database and store it in some archival DB.

In case user wants to view such old orders those can be fetched from archival DB.

image

Overall Architecture

image

That was a free preview lesson.