Millicomputing - Open Hardware by the milliWatt

Wednesday, September 5, 2007

Qualcomm Scorpion - ARM Cortex based CPU

The first devices based on the new ARM Cortex generation are announced from Qualcomm, their Scorpion design runs at 1GHz, 250-500mW, and the Cortex based pipeline is dual issue, so the raw instruction issue rate is twice per MHz that of older ARM designs such as the PXA320.

Tuesday, August 28, 2007

Millicomputing Applications - ETL

Millicomputers have a very different balance of compute/memory/network/io resources compared to more conventional architectures. They are lower in absolute terms for compute/memory/network, but much higher for random access io.

The performance per watt and the price/performance are very competitive for compute/memory/network, as long as applications can be run at a smaller "grain size". However for io, the aggregate performance of a large number of direct attached flash devices is amazing.

One possible application is from the data warehousing space. Known as ETL, this is the Extract Transform and Load step that pulls data from online transaction processing systems, such as the collection of database back-ends for a web site, and puts it into a form that can be queried to answer questions about the business. There has been a lot of work put into making the ETL processes into decomposable parallel applications, and there is an open source ETL implementation in Java called KETL. KETL was originally written several years ago, when the typical systems of the day were similar in capacity to the millicomputers we have today, so I'm hopeful that the grain size will fit.

KETL is io intensive and it also supports running on a cluster of networked computers, so overall it looks like a plausible fit for an enterprise millicomputer application.

Monday, August 27, 2007

The Future of Millicomputing

There is a gap between the performance and memory capacity of Millicomputers and mainstream CPUs, that gap is shrinking but how fast, and what are the next steps?

The base technology from ARM can be seen in their Cortex designs. These were disclosed in late 2005, but have yet to appear in actual products. The overall performance is around 3-4 times the performance of the current generation of ARM based devices.

Since Intel sold off their ARM based CPU business to Marvell, it leaves them clear to move their core 32bit x86 platform architecture down into the millicomputing space.

So in the next few years, I expect to see x86 and ARM based system on a chip architectures with overlapping performance and power consumption characteristics in the millicomputing space.

Friday, August 24, 2007

Lower Power x86 Systems

There is quite a lot of activity in the low power x86 compatible space. The latest CPU from VIA is touted as a 1W CPU, with 0.1W standby power, but when the complete chipset and RAM are added its substantially higher, more like 10W. This article in LinuxDevices.com surveys the whole space very nicely.

The power trend is downwards, these chip sets are aimed at consumer devices, but not in the battery powered space. For commodity devices we can divide by the order of magnitude in power consumption and environmental conditions.

100-1000W Datacenter Server (air conditioned room)

10-100W Home PC/Laptop Space (fan cooled, on when in use, ambient room temp)

1-10W Home consumer devices (fanless, always on, ambient room temp)

100-1000 mW Battery powered millicomputers (always on, cool enough for your pocket)

Interesting technology and product disruptions occur when we mix these spaces. In some ways, the original compute farms that Google built were leveraging the low end home PC power/price/performance point into the datacenter. There are additional opportunities to leverage home consumer devices and millicomputers into the enterprise space.

Sunday, July 22, 2007

Millicomputer Performance Benchmarks

Performance benchmarks for several Millicomputer CPUs are being measured by several Homebrew Mobile Club members at our wiki.

The benchmark used is an old simple CPU benchmark called nbench, which is basically the original Byte magazine benchmark collection from ten years ago. Its very easy to get running. I looked at the industry standard SPEC benchmarks, but they are more vendor oriented and are not freely available. They don't have any useful results posted for ARM architecture systems.

At this point the results are hard to compare since the compiler options in use are somewhat varied and results for recent enterprise server CPUs have not been posted.

Saturday, June 30, 2007

Intel's low power plays

There is an interesting "Power Plays" discussion of Intel's focus on low power at Ars Technica.

There is a lot of work going on to optimize current enterprise server designs to use less power, this is good, but its not the order of magnitude difference that a move to Millicomputing based designs would provide.

The most interesting new technology described is an interconnect that uses very low power and which scales its clock rate and power consumption according to the bandwidth demand. This brings variable capacity to the network layer, and I'd love to see some very low power Intel CPUs with this technology integrated.

Thursday, June 28, 2007

The Flashiest Storage for the Millicluster

Per-module Flash uses the tiny microSDHC format which is about half an inch square (the picture shown is about three times actual size), see http://www.getflashmemory.info/category/microsdhc/. The older microSD format limits to 2GB (available one off for less than $20 each), and microSDHC expands this limit to 32GB using a FAT32 derived on-card filesystem. At present 4GB cards are available and 8 GByte cards have been announced. Streaming read and write performance for microSDHC is much higher than before at about 20MByte/s. Writes are just as fast as reads, and the file-system automatically avoids wearing out any one location in the flash memory.

There is no seek time! Random access at 1000’s of IOPS is only limited by the device driver efficiency, and will be benchmarked. Raw performance is 112 x 4 GB = 448 Gbytes/RU, 18.8 TB/Rack. 112 x 20 MB/s = 2240 MB/s/RU, 94 GB/s/Rack. The implications for storage performance in general are profound. The reason it is so fast is that the storage capacity is solid state, in a single chip and it is directly connected to the CPU chip. There is nothing getting in the way!