Trends & Tech
A Terrible Thing to Waste
"Waste not, want not" might be an old saying, but its application to CPUs will help us to make the most of processing.
by Paul Rolich
We live in an age and a world where squandering of natural resources is the standard. Fossil fuel will be depleted before the next millennia; the polar ice caps will have melted by that time; we have littered this planet from one end to the other. During my time at sea in the navy I rarely was able to view an ocean unscarred with floating man-made detritus. There is no need to carry that same philosophy of abuse into the world of data processing. I like my solutions to be clean and elegant. Wasting hardware in applications where it is ill used just goes against my grain. It doesn't make any difference whether you are running the IT shop for a billion-dollar insurance carrier or a midsize agency–there always are efficiencies to be gained by judicious use of computer hardware. Moore's Law may imply we always will have a new pool of computing resources to play in, but that doesn't mean that you always need the biggest, newest pool.
The current state of computer hardware is a sliding window with ever-changing parameters. I remember (painfully) paying $750 of my own money not too long ago for a 16 MB stick of RAM. I spent many days thinking about that expense before I took the plunge. For that much money today, I can purchase a killer off-the-shelf system with 32 times as much RAM–and that is not even using inflation-adjusted dollars. The relatively low cost of hardware has created a generation of lazy programmers and bloated software, both rarely concerned about optimization. Yet that overpriced RAM-loving office suite you are so dependent on as it slows down everything else on your computer is not even managing to use all the CPU cycles available to it. Even on hard-running 24/7 servers, it is estimated average real CPU is somewhere around 30 percent. What about that other 70 percent? Is there any way we can get to that resource? Sure there is, but it will take a lot of hard work and cooperation between CPU makers, hardware manufacturers, OS vendors, and software developers. Let's take a quick look at ways we can optimize CPU use right now.
Big Machines
I love commuting. I speed but constantly am being passed by monster trucks that clearly are not designed for highways or commuting. (Don't let the leather and the chrome fancies fool you.) They come blasting by at 85 mph with huge off-road tires whining in agony; 6+ liter V8s running at 4500 RPM because they are not geared for highway use; air conditioners struggling to drive off the engine heat; and wind buffeting everywhere from the nonaerodynamic bodies (the truck bodies, not the drivers). What is our problem? Why do we want to buy big powerful machines and then not use them for the purpose for which they were intended? We do the same things with computers. Do you really need a 3.1 GHZ 500 MB RAM MT Processor machine to send e-mail and run the occasional spreadsheet? Look at it from a business point of view. Suppose every June your claims department handled double the amount of claims that it does any other month. Would you double the claims staff so that it can work efficiently in June and then look for work the rest of the year? Probably not, that is, not unless the bottom line is unimportant to you.
Multitasking
I used to run a sort routine on an IBM XT with a few megs of RAM using Lotus 123 v. 2.0. There were about 30K line items, and the sort would take all day. There was not even a hint of multitasking. All that poor machine did all day was compare and move over and over and over again. The processor probably was idle most of that time. Moving all that data around was the real resource killer in those days–slow, minuscule RAM work spaces, almost nonexistent cache, and no VM made for very slow data-intensive processing. On the other hand, that machine was a killer when recalculating spreadsheets–it was darn good at floating point math.
Cooperation?
Enter Windows 3.X and early versions of the Mac OS, and we enter the world of cooperative multitasking. The theory is an individual program or process will run for a while, check the queue to see who else is in line, and then relinquish that spot. All running programs must cooperate and agree to share processor time. OK, now we all know how that works. Cooperation in a queue is not an innate human quality. You see the sign: "Right Lane Closed 1/2 Mile Ahead–MERGE LEFT NOW." That is a signal for all those pickup drivers we saw earlier to dash into the right lane and pass everyone already in line, clogging up the whole mess. Same thing goes for software developers. Who in their right mind would release software that willingly would give up CPU time to another program? Of course you had to do it, but it didn't mean you had to do it fairly.
A new generation of OSs introduced preemptive multitasking. Windows 95, OS2, UNIX, and later versions of the Mac OS gave the power to the operating system. The operating system would assign each running process a slice of CPU time. The running process did not need to have any knowledge of any other process running on the machine. As far as it is concerned, it has sole access to the CPU, RAM, VM, hardware devices, etc. This is pretty cool for software vendors. They don't need to be concerned with multitasking at all as it is totally transparent. In fact, it even allowed developers to spawn multiple processes or threads at the same time from a running program. I can create two processes to handle a particularly time-consuming task, and the OS may give my program more CPU time because it sees two processes from my program instead of one. This does not guarantee efficiency–it only guarantees your program may get more time that it can squander inefficiently.
Swapping out processes is not just a matter of slipping in and out of queue. Each process has an entire "state" or "context" it runs in (all those things we talked about like memory, CPU state, etc.) that must be saved and then restored when that process gets a new time slice. Preemptive multitasking gives the impression many processes are running on the same box at once, since we have fast machines with lots of RAM and good VMs, but it still is a very inefficient process. A typical CPU can execute three instructions per cycle–something that rarely occurs. In fact, all available CPU cycles rarely are used. There simply are too many roadblocks on a preemptive multitasking machine to get enough instructions to that CPU queue. Think of the queue as a series of little bins (three abreast) on a rapidly moving conveyor belt. A single supervisor or controller fills the bins with CPU instructions as they flow by. A fully utilized CPU would have all the bins filled all the time.
More CPUs!!!
Machines still were not running fast enough to meet the ever-increasing demands, so Symmetric Multiprocessing (SMP) machines came to the rescue. An SMP machine has multiple processors that can be used to run any process. Any idle processor can be used to run any process. That means multithreaded apps now really can execute quicker. I can spawn multiple threads for a processor-intensive application, and each thread will be able to run independently. All we have done is thrown another engine into the mix, though. Each processor on an SMP machine still is restricted and throttled by the operating system as it preemptively schedules processes out to the individual CPUs. Using our conveyor belt analogy, each individual belt still is feeding empty bins to the CPU. So, now we have a faster machine running multiple inefficient processors.
Both Intel with its Xeon processors and Motorola/IBM with the Power G5 chip introduced a concept called thread-level parallelism (TLP) on a single processor. Called Hyperthreading for Intel and Simultaneous Multithreading, it is a form of simultaneous multithreading technology (SMT) that allows multiple processes to run simultaneously on one processor. Taking the conveyor belt one step further, we can imagine two supervisors dumping instructions into the bins instead of one. Thus, the heart of the CPU–the execution unit–is able to achieve greater unitization. This is accomplished by sharing some resources on the processor and replicating others. The "core" parts of a processor–executing units, registers, and caches–are shared. All other bits of the processor (pointers, queues, buffers, stacks, etc.) either are shared or replicated. The shared and replicated services work independently feeding instructions to the working bits.
This was and is a very nice enhancement to processor efficiency, but it still doesn't make for 100 percent utilization. Intel admits to maybe a 30 percent increase in efficiency. Scale that back a bit for reality, and it still is a nice little chunk of power to grab from a single CPU. Unfortunately, very little software is written to take advantage of SMT. In the first place, most software isn't even aware it is running in an SMT environment instead of an SMP environment. In a multiprocessor world, it makes sense to spawn multiple floating-point-intensive processes because they will run on separate processors. In a hyperthreaded world, it would not make sense because running two threads that zap the same CPU resource simultaneously on the same CPU zaps it only twice and may be less efficient than a single thread. Enabling software to take full advantage of SMT machines consists of dropping down to assembly code to query the processor directly to determine just what it is capable of doing. Machine BIOS hides a lot of information from the OS, so it is likely your application may not be able to distinguish an SMP box from an SMT box. Of course, most of us are running multiple processor hyperthreaded servers these days, which adds to the confusion.
Enter the Dragon
The latest and greatest from our friends at Intel is a Hyper-Threaded Dual-Core Technology. A dual-core processor is a single physical package that contains two microprocessors. These beasts will share some resources such as high-level cache. So, now we need to write software that is optimized to use simultaneous multiple threads on a single CPU as well as to use multiple processors that share some common resources. Then we will have multiple dual-core proc machines. I get freaked out now when I look at performance monitor. I don't know if I could handle that.
I should have been a pair of ragged claws scuttling across the floors of silent seas.
OK, maybe I am a nut. Why should I care about efficient use of CPUs or RAM or anything else? After all, it's not my money, and it isn't even expensive–buy the biggest and best box you can and load it up. Nobody seems to care about global warming, so that would push computer efficiency way down the list of things we should worry about. Maybe I care because I have been around computers for a long time. I remember reading Claude Shannon's works on information theory while I was cranking out assembly code on an IBM mainframe and being struck by the beautiful simplicity of computers and data processing. Unlike most science, there always is not just a better but a best way to do things, and I think we have lost sight of that elegance.
Want to continue reading?
Become a Free PropertyCasualty360 Digital Reader
Your access to unlimited PropertyCasualty360 content isn’t changing.
Once you are an ALM digital member, you’ll receive:
- Breaking insurance news and analysis, on-site and via our newsletters and custom alerts
- Weekly Insurance Speak podcast featuring exclusive interviews with industry leaders
- Educational webcasts, white papers, and ebooks from industry thought leaders
- Critical converage of the employee benefits and financial advisory markets on our other ALM sites, BenefitsPRO and ThinkAdvisor
Already have an account? Sign In Now
© 2024 ALM Global, LLC, All Rights Reserved. Request academic re-use from www.copyright.com. All other uses, submit a request to [email protected]. For more information visit Asset & Logo Licensing.