Categories

Introducing Intel’s Clarkdale Core i5-661


Intel’s next-generation CPU arrives, ringing in the era of the integrated graphics core

In the Intel galaxy, the CPU is an inexorable black hole. A gravity well so strong that nothing can escape it as it consumes every function of the PC.

Don’t believe us? Witness add-in MPEG-2 decoders, hardware modems, hardware-accelerated soundcards, and Ethernet controllers, all of which have been swallowed by the all-powerful CPU. With Intel’s last CPU, the Lynnfield LGA1156 processor, the memory controller and even PCI-E functions were eaten by the CPU, too.

Now with Intel’s new Clarkdale (and its mobile equivalent, Arrandale) the company is taking the first step in trying to eat a gas-giant of functionality by moving a GPU core directly inside of the CPU.

But not only is Clarkdale the first Intel chip with graphics, it’s also our first glimpse at a CPU using Intel’s new, smaller-process technology. Current Core i7 and Core i5 CPUs are based on the original 45nm Nehalem design that Intel introduced more than a year ago. Clarkdale uses a newer 32nm process that is part of the Westmere family. For the most part, Westmere is an evolutionary step forward and a simple die-shrink of Nehalem, but Intel did add some interesting performance enhancements.

Read on for details about what makes Clarkdale unique.

   

 Clarkdale Desktop Lineup

Core i5-670
Core i5-661*
Core i5-660
Core i5-650
 Core i3-540
Core i3-530
Base Clock 3.46GHz 3.33GHz 3.33GHz 3.20GHz 3.06GHz 2.93GHz
Turbo Clock 3.73GHz 3.60GHz 3.60GHz 3.46GHz  N/A N/A
Cores / Threads
2/4 2/4 2/4 2/4  2/4 2/4
Cache
4MB 4MB 4MB 4MB  4MB 4MB
Socket LGA1156  LGA1156  LGA1156  LGA1156  LGA1156  LGA1156
Memory Controller  Dual channel DDR3/1333  Dual channel DDR3/1333   Dual channel DDR3/1333   Dual channel DDR3/1333   Dual channel DDR3/1333  Dual channel DDR3/1333
TDP 73 watts 87 watts 73 watts 73 watts  73 watts 73 watts
Volume Price  $284  $196  $196  $176 $133 $113

*Graphics core runs at 900MHz

Hey, You Got Graphics in My Processor 

Until now, PC graphics have either resided in the PCI-E slot or in the motherboard’s core-logic chipset. With Clarkdale, Intel moves the GPU core directly into the CPU socket. It does this by packaging a new 45nm GPU core alongside the 32nm compute core, connecting the two via a high-speed QPI. It’s a method reminiscent of the company’s first quad-core proc, the Core 2 Extreme QX6700. Back then, Intel took a shortcut to quad-core land by combining two 65nm dual-core Core 2 Duo dies to make a “quad-core.” While chip purists scoffed that the multichip package was an inelegant hack, and AMD fanboys called it cheating, the move gave Intel a year-and-a-half lead over AMD to store shelves. (Interestingly, a parallel scenario exists today: AMD is working on its own integration of GPU and CPU, dubbed Fusion. As before, AMD’s plan is far more ambitious and elegant in its integration of GPU and CPU functionality. That product won’t see the light of day until 2011. See more on AMD’s Fusion efforts below.)

Clarkdale’s setup puts most of the logic in the GPU, which has a built-in single x16 PCI-E 2.0 controller, as well as the memory controller for both the graphics and compute core. Why use a multichip package instead of building a 32nm chip with graphics in it? It’s likely a matter of cost, technology, and timing. This move, again, gets Intel a CPU with graphics capability more than a year before AMD will deliver its version.

Got Speed?

You probably only care about one thing: How fast is the GPU inside the chip? By rough estimates, it’s about 1.5x times faster than the graphics in a current Intel G45 chipset found in most laptops and mainstream motherboards. If that sounds great, remember that Intel’s integrated graphics history hasn’t been stellar. Put plainly, Intel’s integrated graphics have stunk up the joint for years and it’s probably an insult to graphics cards to actually call Intel’s integrated parts graphics accelerators. A 3-year-old $65 discrete graphics card with a hairball jammed in the fan is slightly faster than what you get from the G45 chipset. In fact, we’ve long blamed Intel’s subpar integrated graphics for helping to push mainstream gamers to console gaming.

Intel’s reasoning is that if people are buying systems with integrated graphics, they probably don’t care about graphics. Sadly, that’s probably true. Mainstream consumers browse the web, use Microsoft Works, and don’t play anything more graphically intensive than Yahoo Bingo before heading down to the social hall for a game of bridge with Madge, Maude, and Betty.

Intel bluntly says the graphics core in Clarkdale is definitely not meant for hardcore gamers. We wholeheartedly agree. We first tested the Clarkdale using 3DMark Vantage on default and after getting a score of 0, abandoned all hope of it being capable of serious gaming.

To see if it was even capable of playing more moderate games, we fired up Left 4 Dead 2 and found the frame rate almost playable at 800×600 with the graphics set on maximum ugly. Borderlands at 1280×1024 was also over the Clarkdale’s head, but almost playable at a mobile phone resolution of 800×600. We did actually see 60fps in Counter Strike: Source at 1280×1024, with somewhat compromised graphics. Still, that’s better than nothing. As easy as it is to make fun of integrated graphics, it’s a moot point for someone who doesn’t play games.


Makes Good with Media

While Clarkdale may not shine in gaming, it certainly holds its own at media acceleration. Intel paid attention to the shortcomings of the Core 2’s G45 chipset. When released, the G45 accelerated Blu-ray content but it was short on features. Clarkdale adds a new sharpness filter, 24Hz refresh rates, HDMI 1.3a with Deep Color support, lossless Dolby TrueHD and DTS HD audio, and even dual HDMI output. With a 3.33GHz Core i5-661 Clarkdale, we were able to watch a Blu-ray disc with the processor running in its SpeedStep low-power mode.
 
By moving the GPU into the CPU, beefier centralized cooling can be used to keep both parts cool. (Unified cooling has even greater ramifications for notebooks with Arrandale). The relocation of graphics and PCI-E in the core also reduces the core-logic chipset from a north- and south-bridge design to a single chip. This lets board vendors design more compact boards with greater capability than traditional chipset-based graphics. If you don’t care about games, Clarkdale could let you build an extremely small yet Blu-ray-capable HTPC that can outperform many budget quad-cores.

One thing home builders will need to remember, though: Clarkdale is a unique part that could make shopping for a motherboard confusing. Today, pretty much any motherboard with integrated graphics will work with any CPU that fits into the LGA775 or AM3 socket. With Clarkdale, not only will you have to pair it with an LGA1156 board, but you’ll also need to make sure it supports the graphics capabilities of the CPU. And if you decide to, say, replace the Clarkdale with a quad-core Core i7 in a year, you will also have to install a graphics card to the system because you’ve just yanked out the GPU with the old CPU. You should be able to run Clarkdale in motherboards that don’t have graphics ports on back but the GPU will be disabled.

Clarkdale as a CPU

But enough about the GPU. The real gem here is the 32nm Westmere core in Clarkdale. Why? It basically gives us a preview of the performance we’ll see next year when Intel releases quad-core and hexa-core CPUs based on the Westmere core. So far, we like what we see. For starters, the CPU offers six new instructions to accelerate Advanced Encryption Standard encryption and decryption. In addition to that, Westmere retains Intel’s auto-overclocking mode, now called Turbo Boost. Originally dubbed Turbo Mode when Core i7 was first released, the feature has been fine-tuned to its current Turbo Boost iteration (yes, we know, Knight Rider’s KITT had a Turbo Boost, too).

One thing we’re not too impressed by is the CPU’s memory performance. In Clarkdale, the dual-channel DDR3 memory controller resides in the GPU side, and it’s apparently not up to the snuff of the memory controllers in 45nm Core i7 and Core i5 parts. With our Clarkdale sample CPU, memory bandwidth was about 25 to 30 percent lower than with a 45nm part. Latency was also significantly worse at 82ns in the 32nm Clarkdale versus 53ns in a 45nm Core i7-870. Why? We suspect it’s the result of having one memory controller manage both graphics and compute cores, but it’s not really clear to us yet.

Still, it’s easy to imagine what you can get if you put six of the Westmere cores along with a better triple-channel DDR3 memory controller and a high-thermal budget in the Core i9 next year.

Under the Clarkdale Heat Spreader

The transistor count and die size of a CPU have long been a fascination of chip addicts trying to glean insights about a processor’s capabilities. With Clarkdale, the simple question of how many transistors it packs gets quite complicated since there is now a GPU under the heat spreader, too. We dug into the spec sheets of the new processor and found out that the new 32nm core measures a very diminutive 81mm2 but packs 383 million transistors. The 45nm GPU is 114mm2 yet has just 177 million transistors in it. So the short answer is 560 million transistors.

 

AMD’s Take on the Hybrid Processor

AMD hasn’t been in the CPU hunt for several years, but that could change in 2011, when the company’s combination CPU and GPU is released. AMD calls its hybrid part an APU, or accelerated processing unit, and it looks to be a far more elegant approach than Intel’s method of jamming a graphics core and a compute core into the same CPU package. AMD’s Fusion platform is a true integration of the functions of a CPU and GPU. And we don’t mean simply because both are built on the same contiguous die. AMD’s vision is to closely enmesh the strengths of the GPU at running parallel code with the strengths of the CPU for general-purpose code. The first part is code-named Llano and will feature 1 billion transistors—roughly twice the number of transistors of Intel’s Clarkdale—and support for DirectCompute and OpenCL, which will let applications leverage the parallel portions of the APU for such tasks as encoding. Compared to Clarkdale’s GPU-in-a-CPU trick, Fusion appears to be far more forward-thinking. However, recall that AMD’s Phenom also seemed elegant and advanced when compared to Intel’s clunky Core 2 Quad. Though cruder, the Core 2 Quad was still faster, which is all that will matter in 2011 when Fusion hits the shelves.

 

Clarkdale in Action

After running the benchmarks, we declare it the fastest dual-core ever!

For our testing, we used an Intel DH55TC motherboard, 4GB of Corsair DDR3/1333, a Core i5-661, a Western Digital Raptor 150, and 64-bit Windows Vista Home Premium. Our benchmarks consisted of a standard suite of 3D rendering, encoding, photo editing, gaming, and memory-bandwidth and -latency benchmarks.

To be frank, we didn’t expect much from the Core i5-661. After all, a dual-core CPU in a quad-core world is asking for a beat-down, right? We’ve seen overclocked Core 2 Duo’s get spanked or barely break even with far lower-clocked quad-cores, so we didn’t think this was much of a match. Well, as Gomer says, “Surprise, surprise, surprise!”

The 3.33GHz Core i5-661 is actually faster than AMD’s budget quad-core, the $99 2.6GHz Athlon II X4 620, as well as its own sibling, the 2.33GHz Core 2 Quad Q8200. Against both chips, the Core i5-661 plowed ahead in the multithreaded tests thanks to its Hyper-Threading, and was significantly faster in gaming thanks to its Turbo Boost. So mark it an eight for dual-cores, dude.

The battle wasn’t so easy once we compared Clarkdale with CPUs in its own price range. At roughly $200, the Core i5-661’s real competition is against the $266 2.83GHz Core 2 Quad Q9550, the $200 2.66GHz Core i5-750, and the $200 3.2GHz Phenom II X4 965 Black Edition. Surprisingly, the 2.83GHz Core 2 Quad didn’t surpass Clarkdale in everything. While the Core 2 Quad was faster in multithreaded tasks, the higher clocks of the Core i5-661 gives it the edge in gaming. So if you’re still not convinced that Core 2 is dead, this should give you another sign.

Against the Core i5-750 and the Phenom II X4 965 BE, the dual-core Core i5-661 is clearly outclassed. Since all three CPUs are $200, we had to wonder if Intel didn’t fire a blank when pricing the Core i5-661. Sure, it’s a good chip for folks looking to build an ultra-quiet and ultra-small home theater PC, but it simply can’t run with the other $200 chips. The chip should really be priced about $30 to $40 cheaper.

We were far more interested in seeing how the lower-clocked Core i5 and Turbo Boost–denied Core i3 Clarkdales would do, but that wasn’t possible. Our Intel board didn’t allow us to underclock our sample processor and none of our P55 boards had BIOSes that support the new chip yet.

The upshot is that Clarkdale is the fastest dual-core today—and competitive with quad-cores, too. That’s damned impressive, and a testament to the power of the new Westermere core. Based on this glimpse of next year’s six-core Core i9, we can tell that it’s going to be a monster.

The problem is the pricing on the Core i5 dual-cores. With more competitive quad-cores priced the same, the Core i5-661’s only advantage is in HTPC or small formfactor designs.

   

 
BENCHMARKS

3.33GHz Core i5-661
2.6GHz Athlon II X4 620
2.33GHz Core 2 Quad Q8200
3.2GHz Phenom II X4 965 BE
2.66GHz Core i5-750
2.83GHz Core 2 Quad Q9550
Volume Pricing
$196 $99 $163 $195 $196
$266
Main Concept Reference 1.0 (sec)
1,717 1,772 1,976 1,388 1,337 1,644
Premiere Pro CS3 (sec)
837 899 888 733 620 741
Cinebench 10 64-bit
10,812 9,941 10,184 14,083 14,442 12,280
Handbrake iPod Classic (sec)
1,569 1,559 1,681 1,220 1,198 1,366
PCMark Vantage 64-bit Overall
6,802 5,792 5,299 6,824 7,208 6,241
POV Ray 3.7 b33
2,150 2,334 2,191 3,045 2,773 2,669
Photoshop CS3 (sec)
122 165 146 123 128
132
ProShow Producer (sec)
 1,045  1,224  997  911  700  862
Evereset Ultimate MEM Copy (MB/s)
 9,244  10,028  7,397  10,246  14,684  7,455
Everest Ultimate MEM Latency (ns)
 82.3  52.5  66.7  54.3  30.9  64
Sisoft Sandra RAM Bandwidth (GB/s)  12  12.3  7.2  12.7  16.8  7.2
Fritz Chess Benchmark  13.07  12.93  13.79  17.04  17.38  16.97
3DMark Vantage Overall
 14,848  13,727  14,260  14,544  14,947  14,681
3DMark Vantage CPU
 38,149  36,269  36,863  40,679  44,066  40,644
Valve Particle Test (fps)
 107  71  81  95  124  99
Valve Map Compilation (sec)
 152  157  163  125  121  129
Crysis CPU 10×7 Low (fps)
 118  83.1  99.5  104  147  119
Resident Evil 5 Fixed DX10 (fps)
 90.7  69.5  69.5  89.2 109.4
 83.8
Resident Evil 5 Variable DX10 (fps)  142.9  113.7  112.2  140.2  160  133.9
World in Conflict (fps)
 168  137  155  160  266  159
WinRar 3.20 (sec)  889  1,067  1,110  805  706  868

Best scores are bolded.

White Paper: Building a Modern CPU


From concept to design to manufacturing and everything in between, the processor inside your rig was years in the making

Designing and manufacturing a modern CPU is a huge project. It requires both backward compatibility and an understanding of where PC workloads are going in the future—a delicate balancing act made more difficult by the huge engineering staffs and massive dollar outlays involved. Let’s take a look at the steps needed to build a Core i7 or AMD Phenom II processor.

Before the manufacturing plant starts churning out chips, there are a few critical preliminary steps. Prior to the first circuit being laid out or the first simulation run, the designers need to know exactly what it is they’re designing. This phase takes input from many sources. Marketing gets involved, with predictions of what users will need when the CPU actually ships, usually two to four years in the future. Engineering and performance teams feed in billions of traces of actual applications being run on current-gen CPUs, so the designers can see how existing CPUs perform under real-world conditions.

The Design Process

After the specification phase, the design phase begins in earnest. Design involves creating a design document, validating the design with simulations, and laying out the design.

The architecture team begins by defining how the CPU is supposed to work. How many registers will it have? What’s the power budget? How many cores? How much cache? These and thousands of smaller details are all ironed out in the design document, which becomes the bible from which the final product is created.

Once the design is in place, it needs to be tested. How do you test a CPU that doesn’t exist yet? You run simulations. There are specific programming languages that chip designers use to build simulations of a CPU. Actual code is compiled and run on the simulated CPU, albeit much more slowly than on the final product. Those applications-code traces collected during the specification process are re-run on the simulation to make sure everything works as expected.

In the layout phase, the real process of building the CPU begins. Engineers use special software to route circuits into patterns that can then be processed in the lithography step. With high-performance PC processors, some elements of the logic layout are hand-tuned, while other aspects, such as cache line layout, may be automated. Chip companies often have prebuilt blocks in libraries that can just be dropped into the overall CPU layout.

Today’s processors also utilize multiple layers of semiconductors. Each layer needs to be laid out so that it can be connected to the others. The primary goal of the layout step is to create circuit patterns that are efficient yet simple enough that they can be manufactured. The first draft of the design undergoes verification, which runs more virtual tests on the layout to make sure connections are correctly made and circuits completed. The final layout is known as tape out, where the layout is compiled into an industry standard format and sent to manufacturing.

Note that these design-phase steps aren’t linear. Simulations, for example, will be run constantly, up until the first working silicon returns from the fab. Design is an iterative process, continuing to the point when the first chips come off the assembly line.

The Manufacturing Process

Here’s where we get into the physical processes of building our CPU. First, ultra-pure wafers of silicon are coated with the conductive material that will make up the final circuitry. Then the chip is baked at temperatures above 200 degrees C to remove any water or volatile contaminants.

Building a chip is essentially a photographic process. Photoresist—material that is light sensitive—is applied uniformly to the wafer, usually by spraying it onto the wafer while it’s spinning at high speed. The layer must be thin and very uniform. Once applied, the chip is again baked to dry the photoresist and make it more uniform.


(click to enlarge)

The lithography step marks the chip’s design on the wafer by exposing the photoresist to light of specific frequencies. These intense beams of light, which shine through masks, define the layout of the circuits on the chip. Note that these beams are very narrow, so either the beam scans across the wafer, or the wafer is moved slightly (stepped) under the light beam. Today’s modern process technologies often use a hybrid of the scanning and stepping techniques. Another bake cycle removes imperfections left over from the lithography process.

The develop step removes the exposed photoresist, leaving behind patterns of circuits. Now the wafer has a layer of material with narrow “channels” laid out in the pattern of the CPU circuitry. But these patterns are not yet circuits. Next, chemicals are applied to the wafer that permanently remove the now exposed conductive material, which was initially coated on the chip in the wafer prep phase. The photoresist still on the chip resists the etching process, so only the circuit patterns are implanted into the wafer substrate.

The final step in the actual chip making process is stripping the remaining photoresist from the wafer surface. What’s left are many dies on the wafer, cleaned and ready to be processed.

Final Steps

Next, the entire wafer is tested to ensure it meets quality standards. The dies are then cut and sent to the packaging line, where the different layers are assembled into the chip packages we’re all familiar with. During the packaging process, function and validation tests are performed, which allow the manufacturer to sort according to clock speed and functional bins. This is where a Core 2 Quad Q9650 may be differentiated from a lower-clocked Q9550, for example.

Of course, this is a simplified overview of the process for building a modern CPU. You can find more details at websites including entries on Wikipedia for photolithography, photoresist, wafer creation, and more. One fairly technical, but still understandable overview of the lithography process can be found at Lithoguru (www.lithoguru.com/scientist/lithobasics.html).

Freeware Files: Five Free Distributed Computing Projects for your Idle PC!

Distributed computing is one of the wonderful ways that you can use your PC to contribute to more thoughtful, worldly causes than keeping your room warm during a cloudy summer day. These projects, made up of members from all corners of the world (even Maximum PC’s own forums), make use of your computer during its idle periods. Whether they’re come as a screensaver that launches after a set period of time, or a background application that launches after a certain period of CPU inactivity, these free applications divvy out the tasks of a large, complicated project to a number of people at once.

Why should you care? Because distributed computing is a nice way to use a minimal amount of your system’s resources–resources that you wouldn’t be using anyway–to contribute to something greater than yourself. It’s entirely altruistic in its purpose. Very, very few distributed computing projects have some kind of monetary award attached to the work, and you’d have to score a major knock-out in your individual contribution to the project to see the result. That is, your computer would have to be the one that finds the next huge prime number, or major breakthrough in protein analysis, or something to that effect. If you’re in it for a reward, you might as well develop a program that estimates lottery odds.

You’ll find that entities like Maximum PC, amongst others, have teams of people contributing to these distributed computing projects. It’s a great way to make friends and fellow geeks–in fact, I’d probably be strung up by this site’s forum folk if I didn’t include a shout-out to their work on the Folding@Home project. +10 Light Side points for you.

Folding@home

What it is: Stanford University says it best. "Proteins are biology’s workhorses — its "nanomachines." Before proteins can carry out these important functions, they assemble themselves, or "fold." The process of protein folding, while critical and fundamental to virtually all of biology, in many ways remains a mystery.

Moreover, when proteins do not fold correctly (i.e. "misfold"), there can be serious consequences, including many well known diseases, such as Alzheimer’s, Mad Cow (BSE), CJD, ALS, Huntington’s, Parkinson’s disease, and many Cancers and cancer-related syndromes."

Your goal? Use your computer to fold proteins (as a part of Maximum PC’s team, if you so desire). You can set the program to use as much or as little of your CPU as you desire, and you can even download versions of Folding@home that make use of your GPU as well. Crazy, high-performance stuff–for a good cause, of course.

Download it here!

 

Climateprediction.net

 

What it is: Unlike chaos theory’s Butterfly Effect, popularized by the speculation that the beating of a butterfly’s wings could trigger a tornado in a distant location on the Earth, Climateprediction.net has nothing to do with trying to plot out storm predictions or anything super-fun like that. Instead, the program helps scientists gain a deeper understanding of the variables that affect future climate change. You’re helping them to run the subtle tweaks in their experiments on a grand scale, improving the ability of these complex projections to accurately reflect future possibilities.

Still, no tornados.

Download it here!

 

GIMPS

What it is: You’re too late to earn the $100,000 cash prize, but the Electronic Frontier Foundation still has other monetary rewards up for grabs. The catch? You have to be the person that helps discover prime numbers with exceedingly large numbers of digits in them. Give ‘er a shot as part of the GIMPS distributed computing network–many, many computers all contributing to the goal of finding increasingly larger prime numbers. How large? The $100,000 winner’s 3.0 GHz Intel Core 2 Duo-based PC took 29 days to run the calculations on the 12,837,064-digit prime number. That’s quite a hefty number.

Download it here!

 

SETI@home

What it is: Insert your favorite science-fiction theme here. SETI@home is a distributed computing project that uses the computers of many to help scan the stars for signs of extraterrestrial life. Although it’s not your computer that’s doing the star-searching per se. Rather, you’re merely helping to analyze the data that’s already been collected by radio telescopes. Who knows–you could be the one to start a war with an intergalactic species!

Download it here!

 

Muon1

What it is: Ever feel like turning your PC into a particle accelerator? That’s one mighty overclock. Sadly, you won’t be crashing real atoms into each other as part of the Muon1 project. However, you will be helping to run simulations of the following scenario: "You are simulating the part of the process where the proton beam hits the target rod and causes pions to be emitted, which decay into muons. These would then proceed to a storage ring and decay into electrons and the neutrinos that are used for experiments. "

But don’t think that you’re just doing this for the heck of it. The results of the distributed computing effort will affect the chances of funding for the project’s ultimate goal: firing particles through Earth’s interior, then measuring the changes to determine a neutrino’s mass.

Just try not to create any black holes, eh?

Download it here!

David Murphy (@ Acererak) is a technology journalist and former Maximum PC editor. He writes weekly columns about the wide world of open-source and roundups of awesome, freebie software. Shoot him a message via Twitter, especially if you have an awesome app or game you’re dying to recommend!