Saturday, June 30, 2007

Intel's low power plays

There is an interesting "Power Plays" discussion of Intel's focus on low power at Ars Technica.

There is a lot of work going on to optimize current enterprise server designs to use less power, this is good, but its not the order of magnitude difference that a move to Millicomputing based designs would provide.

The most interesting new technology described is an interconnect that uses very low power and which scales its clock rate and power consumption according to the bandwidth demand. This brings variable capacity to the network layer, and I'd love to see some very low power Intel CPUs with this technology integrated.

Thursday, June 28, 2007

The Flashiest Storage for the Millicluster



Per-module Flash uses the tiny microSDHC format which is about half an inch square (the picture shown is about three times actual size), see http://www.getflashmemory.info/category/microsdhc/. The older microSD format limits to 2GB (available one off for less than $20 each), and microSDHC expands this limit to 32GB using a FAT32 derived on-card filesystem. At present 4GB cards are available and 8 GByte cards have been announced. Streaming read and write performance for microSDHC is much higher than before at about 20MByte/s. Writes are just as fast as reads, and the file-system automatically avoids wearing out any one location in the flash memory.

There is no seek time! Random access at 1000’s of IOPS is only limited by the device driver efficiency, and will be benchmarked. Raw performance is 112 x 4 GB = 448 Gbytes/RU, 18.8 TB/Rack. 112 x 20 MB/s = 2240 MB/s/RU, 94 GB/s/Rack. The implications for storage performance in general are profound. The reason it is so fast is that the storage capacity is solid state, in a single chip and it is directly connected to the CPU chip. There is nothing getting in the way!

Enterprise Millicomputer Server Comparisons



To provide a competitive comparison two high end 1U Enterprise servers were priced online at http://www.sun.com - one Opteron and one low power Niagara SPARC. The Sun x4100 Opteron server uses about 400W. Its CPU performance is probably double that of an ARM at the same GHz so 2.8 GHz four cores x 2 = 22.4 GHz. Configured with the maximum of 32 GB RAM results in a $13K list price. The Sun T1000 Niagara uses about 200W of power. Its 1.0 GHz 8 core CPU has 32 threads. Lets call this 32 GHz, which is quite optimistic. With a maximum of 16 GB RAM it has a $15K list price. The Enterprise Millicomputer with OPiuM i.MX31 based modules uses less than 160W, probably much less. 532 MHz x 112 = 60 GHz and 256MB x 112 = 28GB RAM. With modules and flash costing perhaps $130 each, lower cost power supplies etc., a similar price point around $15K seems plausible.

Millicomputer networking has higher network bandwidth and there is a big additional saving as there is no need for an external load balancer appliance.

Millicomputer Storage wins with no contest! 2x146GB disks 240 IOPS vs. ~500000 IOPS, 448 GB Flash.

This is all on paper, actual benchmarks are needed, but the point is that the raw performance looks interesting enough to make it worth running the benchmarks....

Wednesday, June 27, 2007

Enterprise Millicluster System Specifications



The specifications below are very tentative, I have tried to be conservative, but this is a paper design at this point, and a real design could end up significantly more or less dense in terms of compute and power usage per rack unit (RU).

The Millicluster board I described is half the width of a typical enterprise motherboard. Its only 0.4" thick, so we can stack four of them high (4x0.4=1.6") in the 1.75" height of a 1U package.

Hence a standard 1U Enterprise Server Package contains Eight Milliclusters. This has a compute density of 112 OPiuM modules per RU, 4704 modules in a 42RU rack. The power consumption peaks under 160 Watts/RU, and idles at less than 24 Watts/RU. Maximum rating would be less than 6.7KW/Rack, which is quite reasonable. The CPU performance totals 60 GHz/RU, 2,520 GHz/Rack. There is 28 GBytes/RU of RAM, 1,172 GBytes/Rack.

The network has 8 Load balancer/bridge-routers per RU with 8 Gbits/RU module bandwidth on 16 redundant Gbit ports. An Ethernet switch could be added to the design to reduce the port count at a cost of a few watts and dollars. For storage a microSDHC flash memory socket at each module would hold a 2 GB microSD for very low cost, 4 GB for capacity, 8 GB in 2008.

There are many optional interfaces that could be used for specific applications. All modules include an ATA disk controller if needed, so each Millicluster could have connectors to support hard disks and DVD-ROM players. For graphics the i.MX31 modules include an OpenGL based 3D graphics accelerator and an LCD display driver with touch screen input. There is a camera input and video compression engine, with stereo audio and video playback. Modules also include multiple USB and serial interfaces.

Saturday, June 23, 2007

Enterprise Millicluster Board Layout

I came up with the board layout shown to indicate how we might arrange the millicomputer modules, interface bridge, USB network and microSD card holders on a conveniently sized board. Its just under half the size of a typical enterprise server motherboard. The total size is about 5.5” Wide x 12” Deep x 0.4” High. The Ethernet network bridge increases the idle power consumption to no more than 3 Watts, and fully active power consumption is unlikely to exceed 20 Watts, so I don't think heat sinks will be needed.

Enterprise Millicluster


Taking the components I have already menioned in previous posts, we can assemble them into a small cluster that seems to be a useful size and specification for an Enterprise Server building block. Its a cluster of millicomputers, so we may as well coin the name Millicluster as we go along (and register the millicluster.com etc. domains to point here :-)

Using 8-port USB switches, we could lay out 14 i.MX31 based Millicomputer modules behind a PPC440EPx based Ethernet Bridge that runs Linux so it is general purpose, but it will be pre-configured as a Load Balancer. This gives us a 1 Gbit/sec redundant network (it has two 1 Gbit links in, but only two 480Mbit links to the millicomputer modules). There is a total of 7.5 GHz of CPU, 3.5 GBytes of RAM, and 56 Gbytes of Storage using 4 GByte microSDHC flash memory cards on each Millicomputer.

This depends upon having a high speed 8-port switch, and so far I have found some products from Belkin and D-link that have one upstream port and seven downstream. I'm not sure what chipset they use, but they are inexpensive and have been available for a few years, so this seems reasonable.

Friday, June 8, 2007

x86 Millicomputers are on the way

News about the x86 architecture working its way down into the millicomputer space via the Pico ITX. This board is 1.8x3" with a 1GHz CPU and 256 or 512MB RAM, but it doesn't give power consumption. However they are positioning it for use in mobile devices so its likely to end up under a watt once they get the chip count down a bit further.