Back to Blog
December 15, 2024
3 min read

Building AI at Scale: Lessons from 1B+ Users

Key lessons learned from scaling Adobe's AI Assistant to serve over a billion users. From architecture decisions to team dynamics.

AIScaleEngineeringLeadership

When we set out to build Adobe's AI Assistant, we knew it had to serve our entire user base - over a billion monthly active users. Here's what I learned along the way.

The Challenge

Building AI products is hard. Building AI products that serve a billion users is exponentially harder. You're dealing with:

  • Massive scale: Every millisecond of latency costs millions in user engagement
  • Global distribution: Users across 100+ countries with varying network conditions
  • Cost constraints: LLM API costs can spiral out of control quickly
  • Quality expectations: Users expect ChatGPT-level quality, but with PDF-specific knowledge

Architecture Decisions

1. Multi-Cloud from Day One

We built on AWS, Azure, and GCP simultaneously. This wasn't premature optimization - it was essential:

class ModelGateway:
    def route_request(self, model: str, prompt: str):
        # Intelligent routing based on cost, latency, availability
        if model.startswith("gpt"):
            return self.azure_client.complete(prompt)
        elif model.startswith("claude"):
            return self.aws_bedrock.complete(prompt)
        else:
            return self.gcp_vertex.complete(prompt)

Result: $2M+ saved annually through intelligent routing and cost optimization.

2. Hybrid RAG Architecture

We didn't just use vector search. We combined:

  • Dense retrieval: For semantic similarity
  • Sparse retrieval: For exact keyword matches
  • Reranking: To surface the most relevant chunks
  • Conversational memory: For multi-turn coherence

3. Streaming Everything

Sub-2 second response times at billion-user scale required streaming from end to end:

  • Stream from LLM
  • Stream through our processing pipeline
  • Stream to the client

Users see words appearing instantly, not a loading spinner.

Team Dynamics

Scaling the product also meant scaling the team. At peak, I was directing 30+ engineers across:

  • San Jose (ML pipelines & experimentation)
  • Bangalore (Research)
  • Noida (Infra)

Key lesson: Over-communicate. What works for 5 people breaks at 30. We instituted:

  • Daily async standups (timezone-friendly)
  • Weekly architecture reviews
  • Bi-weekly demo days

What I'd Do Differently

Mistake #1: We built too many models internally before leveraging LLMs. We should have started with GPT-4 and moved to custom models only when justified.

Mistake #2: We underestimated the importance of evaluation. Build your eval harness before your product.

Mistake #3: We didn't invest enough in observability early. When things break at scale, you need deep visibility.

Results

  • 🏆 TIME Best Invention 2024
  • 💰 $5B revenue potential unlocked
  • 👥 1B+ MAU served
  • ⚡ Sub-2s response times
  • 📊 95%+ user satisfaction

Takeaways for Builders

  1. Start simple: MVP with LLM APIs like Gemini, then optimize
  2. Multi-cloud from start: Don't get locked in
  3. Obsess over latency: Every 100ms matters
  4. Build evaluation first: You can't improve what you can't measure
  5. Stream everything: Users perceive streaming as 10x faster

Building at this scale taught me that the hard problems aren't technical - they're about coordination, communication, and keeping quality high while moving fast.

Want to chat about scaling AI? Reach out or ask my AI Twin

Comments