Creating scalable systems is a fun challenge. It is very satisfying to dig into a system, figure out where the bottlenecks are, and open them up to get better performance. With modern systems adding more CPUs, effective use of concurrency is necessary to get good scalability.
When I was running mail servers, we used a cluster of machines behind a load balancer to give good performance to all of our 600,000 users. As we brought more users on, we were able to simply add more machines to our cluster to accomodate them. We also carefully tuned our mail software to minimize the amount of hardware we had to purchase.
At the Whereabouts Project, I created a scalable system for handling location information. Written in Java, it used lock-free data structures from the Java concurrency classes to process large amounts of data with minimal contention, allowing our system to scale well to multi-CPU systems.
At Locomatix, our server software is designed to support a large number of users on minimal hardware. It uses Boost asio to manage high-performance I/O and thread pools, and lock-free data structures from Intel Threading Building Blocks to effectively share data between threads.