ππ§ How DropBox Built Multimedia Search for 700M Users
PLUS: Design a Rate Limiter β³, What is Cache Warming β¨οΈ, Instagram Staff Career Gold π
Todayβs issue of Hungry Minds is brought to you by:
Happy Monday! βοΈ
Welcome to the 489 new hungry minds who have joined us since last Monday!
If you arenβt subscribed yet, join smart, curious, and hungry folks by subscribing below.
π Software Engineering Articles
Learn how a rate limiter works with this practical system design guide
Build the world's fastest VIN decoder from scratch
Smart cache warming strategies to boost application performance
How to handle duplicates in retry mechanisms
N+1 query problem explained and solved in 4 minutes
ποΈ Tech and AI Trends
Reddit transforms into a search engine powerhouse
Anthropic blocks OpenAI from accessing Claude models
Google launches Gemini Deep Think for parallel reasoning
π¨π»βπ» Coding Tip
Use partial indexes in PostgreSQL to optimize query performance and reduce storage
Time-to-digest: 5 minutes
Using Jenkins? Find a better workflow!
Bitbucket Pipelines is not just a tool, itβs a DevOps force multiplier.
Unlock cloud-native CI/CD with 70% reduction in operational costs, no installation or plugins to manage, and efficient autoscaling.
Let engineers focus on delivering value β not maintaining Jenkins.
How Dropbox Built Multimedia Search For Dash π«¨
Searching through thousands of unlabeled images and videos is a nightmare for knowledge workers. Dropbox tackled this by building a scalable multimedia search system that makes finding visual content as easy as finding text documents in their universal search product, Dash.
The challenge: Build a cost-effective system that can process and search through massive amounts of media files while maintaining low latency and high relevance, despite limited metadata and heavy compute requirements.
Implementation highlights:
Metadata-first indexing: Extract lightweight features first (file paths, EXIF data) to enable basic search with minimal overhead
Just-in-time previews: Generate previews on-demand rather than upfront to optimize storage and compute costs
Location-aware queries: Built custom geocoding logic to enable searching by location where photos were taken
Parallel processing: Run preview generation, ranking, and permission checks concurrently to minimize latency
Smart caching: Store previews for 30 days and leverage existing Dropbox infrastructure for efficiency
Results and learnings:
Scalable performance: Successfully processes 97% of media files while maintaining responsive search
Cost optimization: Achieved 3x storage efficiency for images and 13x for video compared to raw storage
Developer velocity: Parallel workflows and clear API boundaries enabled faster team execution
This solution shows how thoughtful architecture decisions around caching, parallel processing, and metadata extraction can make multimedia search both performant and cost-effective.
Design a Rate Limiter
Written by
Bitbucket Pipelines
Bitbucket Pipelines brings continuous delivery to Bitbucket Cloud, empowering teams with full branching to deployment visibility and faster feedback loops
How to Optimize Performance with Cache Warming?
Written by
How we built the world's fastest VIN decoder
Nine VS Code (or Cursor) Extensions That Make My Daily Work Much Easier
Written by
Write a postmortem like a Senior Engineer
Written by
Instagram Staff (IC6) Promo Despite 10 Team Switches in 9 Years (Career Story)
Written by
Career Frameworks πͺ β part 2
Written by
andESSENTIAL (testing gone wrong)
The #1 Mistake in Unit Testing (and How to Fix It)
ESSENTIAL (retry, retry, oops)
Retries Have an Evil Twin: Duplicates
ESSENTIAL (metrics go brrr)
Essential System Design Performance Metrics
ARTICLE (vim-wizardry unlocked)
Why I Switched to Vim Keybindings
GITHUB REPO (trade like a boss)
A high-performance algorithmic trading platform and event-driven backtester
ARTICLE (database drama solved)
What is the N+1 Query Problem and How to Solve it?
ARTICLE (claudeβs coding spree)
6 Weeks of Claude Code
ARTICLE (chrome goes zoom)
Making Sense of the Performance Extensibility API
ARTICLE (AI hype debunked)
No, AI is not Making Engineers 10x as Productive
ARTICLE (flashy list magic)
FlashList v2: A Ground-Up Rewrite for React Native's New Architecture
Want to reach 190,000+ engineers?
Letβs work together! Whether itβs your product, service, or event, weβd love to help you connect with this awesome community.
π Reddit Aims to Unify Search Interface in Bid to Rival Major Search Engines (2 min)
Brief: Reddit is working on a unified search interface as part of its strategy to compete with dominant search engines, potentially leveraging its vast community-driven content for better results.
π€ Anthropic Blocks OpenAIβs Access to Claude Models in Legal Dispute (2 min)
Brief: Anthropic has revoked OpenAIβs access to its Claude AI models amid an escalating legal dispute, leaving OpenAI to seek alternatives for its AI research and services.
π€ Google Unveils Gemini DeepThink AI for Parallel Reasoning (2 min)
Brief: Google introduces Gemini DeepThink, an AI model designed to evaluate multiple ideas simultaneously, enhancing reasoning and problem-solving efficiency.
π€ OpenAI Launches Customizable Open-Weight AI Models (2 min)
Brief: OpenAI releases gpt-oss-120b and gpt-oss-20b, two open-weight models under Apache 2.0 license, designed for agentic tasks, customization, and commercial use with full chain-of-thought reasoning.
π€ GPT-5 Arrives: Faster, Cheaper, and More Reliable Than Ever (4 min)
Brief: OpenAI's GPT-5 offers three model tiers (regular, mini, nano) with competitive pricing and reduced hallucinations, making it a strong contender in the AI race.
π€ OpenAIβs GPT-OSS Models Now Available on AWS (4 min)
Brief: AWS adds OpenAIβs open-weight GPT-OSS-120B and GPT-OSS-20B models to Bedrock and SageMaker, enabling developers to build AI applications with full infrastructure control and advanced reasoning capabilities.
This weekβs coding challenge:
This weekβs tip:
Create efficient database indices by using partial indexes with WHERE clauses to significantly reduce index size and improve query performance on specific data subsets. PostgreSQL's partial indices allow you to create smaller, more focused indexes that only include rows matching certain conditions.
Wen?
Large tables with skewed data: When certain status values represent a small, frequently-queried subset of all rows.
Time-series data management: Index recent data more aggressively while maintaining lighter indices for historical data.
Multi-tenant systems: Create focused indices for high-traffic customers while keeping the overall index footprint manageable.
βThe harder you work, the harder it is to surrender.β
Vince Lombardi
Thatβs it for today! βοΈ
Enjoyed this issue? Send it to your friends here to sign up, or share it on Twitter!
If you want to submit a section to the newsletter or tell us what you think about todayβs issue, reply to this email or DM me on Twitter! π¦
Thanks for spending part of your Monday morning with Hungry Minds.
See you in a week β Alex.
Icons by Icons8.
*I may earn a commission if you get a subscription through the links marked with βaff.β (at no extra cost to you).
Really smart build.
Starting with metadata and using on-demand previews is a clever way to keep it fast and cheap without overcomplicating things.
Thanks for featuring my article, Alex!