Other than sometimes pedants like myself, most people don’t use Moore’s Law in its literal sense. That sense being Intel co-founder Gordon Moore’s 1965 observation that each year saw a doubling of transistors in an integrated circuit. This doubling in turn meant that transistors kept getting proportionately cheaper and smaller.
But it’s the performance increases - the speed gains that come from denser integrated circuits – that most people focus on when it comes to Moore’s Law. The process shrinks enabling those density increases are only part of the overall performance ramp-up story of microprocessors and other integrated circuits, yet they’ve mostly overshadowed other engineering advances.
However, those process shrinks are slowing down and are running up against fundamental physical limits as features approach a few atoms in length.
Thus, it’s perhaps not surprising that the recent IEEE Hot Chips Conference chose to highlight the many ways “Moore’s Law” is alive and well through a variety of design innovations while largely glossing over the now sedate pace at which components are continuing to shrink.
Throughout the presentations on specific chips as well as the keynotes by Intel Chief Architect Raja Koduri and DeepMind Distinguished Engineer Dan Belov, several related themes about the future of microprocessors emerged.
[ How can automation free up more staff time for innovation? Get the free eBook: Managing IT with Automation. ]
Moving from the megahertz era to the architecture era
In his talk, Intel’s Koduri talked about “exploding heterogeneity” in the context of an evolution from the megahertz era to the multi-core era to the architecture era. While we’ve always had specialized functional units for graphics, floating point, networking tasks, and so forth, the overarching theme for x86 systems during the past few decades has been standardization around a common instruction set.
This dynamic arguably first really started to break down when GPUs, originally developed for graphics, became so important for running machine learning (ML) workloads.
[ Learn how to define ML in terms everyone can grasp: How to explain machine learning in plain English. ]
Machine learning operations are heavily dependent on linear algebra (such as multiplying together large matrices). This is compute-intensive but simple and is almost tailor-made for GPUs, which have been developed for years by companies like Nvidia for use as video cards.
But, today, in addition to specialized ML processors like Google’s Tensor Processing Units (TPU) and the increased use of field programmable gate arrays (FPGA), microprocessor vendors are also enhancing their on-chip vector capabilities. (Vector computing, which dominated early supercomputers, applies repeated operations to a string of data. It’s very efficient for a narrow set of workloads and survives mostly as specialty operations on modern, more general-purpose scalar processors.)
Koduri describes the need for a scalable model and software for scalar, vector, and matrix (as in GPUs) operations to combat the complexity of shifting from a compute model where generality rules to one where heterogeneity does.
Generalized computing is fine for keeping things simple from a software perspective when processing power is growing at a steady clip. It didn’t really make sense to design specialized hardware and write software for it if the general-purpose CPU was just going to catch up in 18 months or so. But, absent that kind of metronomic advance, chip and system makers need to optimize their hardware even if it makes life harder for developers and others.
For software makers, this means that they need to write for (and optimize for) a wider variety of hardware types. In some respects, though not others, it’s a throwback to the era before x86 hardware and a small number of operating systems commoditized the datacenter.
For users, this provides hope of performance and price/performance improvements even in the face of slowing process advances. But it also means they will have to make more choices to optimize their particular applications.
In a similar vein, simply operating a chip as if all workloads have the same needs simplifies design but it may not be the most efficient use of the transistors that make up a microprocessor. Furthermore, modern high-end microprocessors like Intel’s upcoming Xeon Ice Lake and IBM’s upcoming POWER10 have large numbers of cores and other hardware resources; they’re effectively large systems on a chip.
So, it can make sense to dynamically configure resources based on the needs of individual applications and processes. The system configuration that’s best tuned to run a compute heavy workload is probably different from one that needs lots of I/O or one that has massive memory needs. Therefore, high-end microprocessors increasingly have the means to parcel out hardware resources and adjust clocks in order to best match up with the workloads.
It’s another case where, if large increases in transistor counts isn’t going to be so easy to come by in the future, hardware designers need to make sure that they’re wringing the most performance they can out of the transistors they do have.
Automating the complex optimizations across hardware and software to take advantage of features like these is an area of active research in academia, industry, and in collaborations between the two.
Although it’s not as well known as Moore’s Law, an interrelated concept is Dennard scaling: This states that as transistors get smaller, their power density stays constant, so that the power use stays in proportion with area. For a long time, this meant that increases in density provided by process shrinks didn’t lead to a proportional increase in power requirements and cooling needs.
Dennard scaling began to break down in 2006 or so. Today, to the degree that we can continue to fabricate incrementally smaller transistors, powering and cooling them is an increasingly critical need.
As with reconfigurability, very fine-grained and dynamic control of power is increasingly a requirement of microprocessors. It doesn’t do any good to have more transistors if you can’t turn them on.
How will future microprocessors handle applications like machine learning?
CMOS process shrinks have been a singularly important foundation for great swaths of modern technology. The slowing and, presumably, eventual ending of that particular technology tree will have effects. However, as we saw at the Hot Chips conference, there are other paths forward to progress, even if they may not be as regular and at least somewhat predictable as shrinking transistors have been.
They may require rethinking some of our assumptions — especially given that important application classes, such as machine learning, have such different characteristics than many of our more traditional applications.
Architectures built around computing may not be so suitable for applications built more around data, as ML apps often are. Deep Mind’s Belov even wonders if we may not need to rethink the datacenter as a sort of unitary computer and revisit past decades of research with that sort of newly-reimagined data-centric distributed system in mind, for example by reconsidering all the research that went into dataflow computers in the past.
[ Want more on ML? Get our quick-scan primer on 10 key artificial intelligence terms: Cheat sheet: AI glossary. ]