Driving individual processor performance to the limit in a given implementation technology is never easy or efficient. Faster clocks, deeper pipelines and bigger caches carry silicon area and power ...