The first personal computers were called microcomputers, and they were powered by a 2MHz 8-bit Z80 chip. If you had a couple thousand dollars to spare, you could load the machine up with a whopping 64 kilobytes of RAM.

Then Moore’s Law kicked in.

Actually, it’s not a law, it’s an observation. In 1965, Gordon E. Moore, the cofounder of Intel, published a paper in which he noted that the number of components in integrated circuits had doubled every year from 1958 to 1965. He predicted that the trend would continue for at least another 10 years.

He understated the case.

Before the end of the 1970s, it was obvious that an 8-bit microprocessor was insufficient. The 16-bit 8088 powered the first IBM PCs. Actually, it was a 16-bit chip with an external 8-bit data bus so it could use cheaper supporting logic chips, but that was the beginning of the x86 processor line.

The 286 chip ran at 6MHz, then 8MHz, and eventually 12.5MHz. This was the real beginning of the race for power. The 286 was followed by the 386, which ran at 33MHz. It was the first 32-bit chip, and it was fast enough to make Windows a practical operating system.

The 486 ran at 50MHz. In those days, every advance in speed was significant. It made a noticeable difference in the response time of every piece of software, and that fueled the hunger for ever more powerful upgrades.

Instead of a 586, Intel released the Pentium chip, which went through several iterations over the next few years, each time getting faster and more powerful. The first Pentium premiered in 1993 and ran at 60MHz. By 1999, the Pentium III was running as fast as 1.13GHz. The Pentium 4, released in 2006, can hit 3.6GHz on the straightaway. The Pentium D hits 3.73GHz.

Throughout the 1990s, processor speeds continued to accelerate. All those extra clock cycles allowed programmers to add more features to their software, and even create whole new categories of software. We moved from word processors and spreadsheets to speech recognition, image processing, CGI rendering, video editing, and powerful 3D games.

At the end of the 1990s, many pundits (myself included) were predicting that the chip speeds would continue to improve for at least another decade. Extrapolating the trend lines of the past, we were able to assume that a state-of-the-art computer would be running 10GHz chips by the year 2010.

Oops. We forgot about the heat ceiling.

Shrinking the die size lets you put more transistors on a chip, which makes it possible for the chip to do more work—and as the distance between the transistors decreases, the chip gets proportionally faster. So a large part of the acceleration of our chips in the last 30 years has come from decreasing the size of the transistors and the paths between them.

But after a certain point, a different set of physics kicks in. A microprocessor is like the burner on an electric stove. It’s a piece of metal or ceramic with circuits running through it. When you run a current through the circuit, the resistance of the circuit shows up as heat. The more transistors on a circuit, the more current flows through that area and the more heat is generated. This is why your phone heats up when you play a processor-intensive game or when you use it as a GPS.

There’s only so much you can do with heat sinks and fans and water-cooling. Above a certain density, the heat isn’t just problematic, it’s a show stopper. Until we get some dramatic new materials, quantum computing or optical chips, we’re going to be pretty much stuck somewhere below 4GHz. Speed is no longer the path toward more power.

If you’re using a chip that can do only one thing at a time, then yes, you do need more speed—and that was the linear thinking that dominated most of 20th-century chip design. But what if you can break a task down into many component parts and assign each of them to a different processor? That’s the 21st-century approach to creating more powerful chips.

So we have gone from dual-core to quad-core to six- and eight-core chips. On the horizon are chips with as many as 48 or 64 cores. And not all cores on a chip have to be identical. Why waste a high-powered core on a small task? Some cores can be optimized for graphics, others for sound, others for floating-point math. Some can be ancillary, for controlling data flow, some can be high-powered number-crunching, data-diddling earth movers. And if and when any core is not being used, it can be turned off or left to idle.

The issue has never been speed, it’s been gigaflops and petaflops and someday teraflops. How many operations can a chip perform in a second? Not just how many, but what tasks are they best suited for? Multicore chips will likely be the backbone of all processing for the foreseeable future.

Moore’s Law is still in play. It’s no longer about the number of transistors on a chip, it’s about the number of operations per second a chip is able to perform. It’s about petaflops. Parallel processing allows us to escape the heat ceiling. As long as there is a demand for more power, then the processing scale of our chips will continue to advance. The petaflop processor is inevitable.

But there is still a bottleneck, and it’s a far more serious bottleneck than the heat ceiling. It’s actually a two-part bottleneck, and both parts are equally daunting.

The first part is the need for programming languages and compilers that can efficiently sort out all the various tasks of a program so they can be delivered to whatever processor cores available: two, four or more. We need tools that can manage that bookkeeping while still allowing programmers to think about the larger goals of the software.

That’s the second part of the bottleneck. It lives between the chair and the keyboard. Once upon a time, programing was linear: The result was spaghetti code. Then we had procedures and functions: That gave us structured code. Then we moved to object-oriented programming, and now we have event-driven programming. Next…?

We are stumbling over the threshold into a still mostly unknown continent of parallel processing. How do we create software with multiple interdependent threads, software that can actually use the power of a 48-core chip effectively? Some problems are going to break down easily into their component parts: image-rendering, for instance. But other problems, like do-what-I-mean speech recognition, will require a complex intermix of audio-decoding, word-sorting, grammar-processing and context-perception functions, each of which requires different levels of processing, and all of which have to achieve a consensus in a reasonably short bit of time. Multiple threads will have to sort through multiple meanings while a meta-processor decides which one is the contextually appropriate one.

With games, the problem becomes even more complex. You have audio and video processing, animation choices, physics, 3D imagery, and artificial intelligence for the actors within the game. Again, these are all different-sized tasks, and they all have to be ready to go at least 30 times a second.

Ultimately, the real challenge lies in the software arena. Will we have software that can use all that power efficiently, software designed for multiple cores? Will we have programmers who think in multiple threads? I don’t think it’s impossible. I do think it’s a game-changer.

I look forward to the next big conceptual breakthrough in programming. Whatever it is, it might be deceptively simple and obvious (in hindsight, anyway) or it might be a mind-boggling paradigm-shift. But whatever happens, it will happen for the simple reason that it has to happen.

What do you think?

David Gerrold is the author of over 50 books, several hundred articles and columns, and over a dozen television episodes, including the famous “Star Trek” episode, “The Trouble with Tribbles.” He is also an authority on computer software and programming, and takes a broad view of the evolution of advanced technologies. Readers may remember Gerrold from the Computer Language Magazine forum on CompuServe, where he was a frequent and prolific contributor in the 1990s.