Menlow looks as if its heading down into the millicomputer territory (i.e. entire CPU and memory less than 1000mW), and moorestown is another 10x reduction in power consumption.
Monday, December 17, 2007
Intel's Low Power Roadmap from CES - Menlow, Moorestown, PATA Z-P140
Some new information released from Intel emphasizes low power devices with their Menlow architecture for 2008, and Moorestown for 2009/10, and states that Solid State Disks based on Flash are the future. They have a high speed parallel ATA interface (PATA) to a module that is 12x18mm, i.e. around the size of a miniSD format, bigger than microSD. However it runs at 40MByte/s read and 30Mbyte/s write speed with an ATA command set, rather than microSDHC C4 at 13MByte/s or C6 at 20Mbyte/s. The Z-P140 runs at 1.1mW idle, and 300mW operating, and has a 2.5Million hour MTBF.
Sunday, November 11, 2007
Flash based SSD from Samsung
A nice review of a Flash based SSD in Engadget. This 64GB drive from Samsung is a drop-in replacement for a 2.5" hard drive. Its fast, but still too expensive for common use.
Monday, November 5, 2007
The Tera-Millicomputer from Sicortex
If you take a 1GHz, 1 GFLOP MIPS based core that uses 600mW of power, that fits the definition of a millicomputer (depending upon how much RAM you also add). Now if you put 6 of these cores on a single chip, along with DRAM and PCI-Express controllers and a high speed fabric interconnect for 10W it's quite interesting. Packaging almost 6000 cores on an interconnect fabric in a single rack takes it to the logical conclusion. Thats what Sicortex have done.
Even better, they have the *best* web site architecture tour I have ever seen. Its worth browsing it just for fun. Visit the page here, and click on the buttons to animate the architecture. Don't miss the "Linux" button and step through the Hardware Environment starting at the "Node Level" button.
Even better, they have the *best* web site architecture tour I have ever seen. Its worth browsing it just for fun. Visit the page here, and click on the buttons to animate the architecture. Don't miss the "Linux" button and step through the Hardware Environment starting at the "Node Level" button.
Wednesday, October 31, 2007
New advances in non-volatile memory - PMC, PRAM and NRAM
The end is in sight for spinning rust....
While flash based memory is nibbling at the edges of the disk industry, some new techniques are opening up prospects of even greater capacity and speeds in the next few years.
PRAM stands for Phase-change Random Access Memory, some recent news seems to indicate that good progress is being made on PRAM as well as larger and faster Flash memory.
Update:
This SC07 article from The Register states that NRAM stands for Nanotube RAM, the makers claim that they will beat Flash on every metric in a few years time, and several of the big semiconductor companies are looking into the technology.
Wednesday, October 3, 2007
More on the Cortex A9
Another update from Ashlee Vance at The Register gives more details on the Cortex A9. They say its a four issue superscalar (the Cortex A8 is dual issue), 8 times the performance of the iPhone CPU, runs at 250mW and should be in devices in around 2010. The A8 was announced in 2005 and three years is a typical lead time for CPU architectures to get into production, so I expect A8 based devices sometime next year, perhaps even a faster iPhone...
There is also some discussion of Intel moving down into this space over the next few years. Excellent! Competition will drive the market to develop faster and I don't personally care what the instruction set is any more (I used to...)
There is also some discussion of Intel moving down into this space over the next few years. Excellent! Competition will drive the market to develop faster and I don't personally care what the instruction set is any more (I used to...)
Multicore ARM Chips - Cortex A9
Performance is cranking up, now we have four core ARM chips on the horizon...
Infoworld article on the announcement.
Each of these cores seems to be based on the dual issue 1GHz Cortex A8 design that was announced a few years ago, and which isn't quite shipping yet in products.
So to put this in perspective, the CPU in the Gumstix Verdex and the iPhone is around 600MHz single issue, the Cortex A8 is around three times the raw performance and the new announcement is about twelve times the raw performance. These all seem to be around the same levels of power consumption, in the few hundred milliwatt range.
I would not expect to have the multicore ARM Cortex A9 in actual products in my pocket for a few years, but its good to have a roadmap into the future of millicomputing.
Infoworld article on the announcement.
Each of these cores seems to be based on the dual issue 1GHz Cortex A8 design that was announced a few years ago, and which isn't quite shipping yet in products.
So to put this in perspective, the CPU in the Gumstix Verdex and the iPhone is around 600MHz single issue, the Cortex A8 is around three times the raw performance and the new announcement is about twelve times the raw performance. These all seem to be around the same levels of power consumption, in the few hundred milliwatt range.
I would not expect to have the multicore ARM Cortex A9 in actual products in my pocket for a few years, but its good to have a roadmap into the future of millicomputing.
Wednesday, September 5, 2007
Qualcomm Scorpion - ARM Cortex based CPU
The first devices based on the new ARM Cortex generation are announced from Qualcomm, their Scorpion design runs at 1GHz, 250-500mW, and the Cortex based pipeline is dual issue, so the raw instruction issue rate is twice per MHz that of older ARM designs such as the PXA320.
Tuesday, August 28, 2007
Millicomputing Applications - ETL
Millicomputers have a very different balance of compute/memory/network/io resources compared to more conventional architectures. They are lower in absolute terms for compute/memory/network, but much higher for random access io.
The performance per watt and the price/performance are very competitive for compute/memory/network, as long as applications can be run at a smaller "grain size". However for io, the aggregate performance of a large number of direct attached flash devices is amazing.
One possible application is from the data warehousing space. Known as ETL, this is the Extract Transform and Load step that pulls data from online transaction processing systems, such as the collection of database back-ends for a web site, and puts it into a form that can be queried to answer questions about the business. There has been a lot of work put into making the ETL processes into decomposable parallel applications, and there is an open source ETL implementation in Java called KETL. KETL was originally written several years ago, when the typical systems of the day were similar in capacity to the millicomputers we have today, so I'm hopeful that the grain size will fit.
KETL is io intensive and it also supports running on a cluster of networked computers, so overall it looks like a plausible fit for an enterprise millicomputer application.
Monday, August 27, 2007
The Future of Millicomputing
There is a gap between the performance and memory capacity of Millicomputers and mainstream CPUs, that gap is shrinking but how fast, and what are the next steps?
The base technology from ARM can be seen in their Cortex designs. These were disclosed in late 2005, but have yet to appear in actual products. The overall performance is around 3-4 times the performance of the current generation of ARM based devices.
The base technology from ARM can be seen in their Cortex designs. These were disclosed in late 2005, but have yet to appear in actual products. The overall performance is around 3-4 times the performance of the current generation of ARM based devices.
Since Intel sold off their ARM based CPU business to Marvell, it leaves them clear to move their core 32bit x86 platform architecture down into the millicomputing space.
So in the next few years, I expect to see x86 and ARM based system on a chip architectures with overlapping performance and power consumption characteristics in the millicomputing space.
Friday, August 24, 2007
Lower Power x86 Systems
There is quite a lot of activity in the low power x86 compatible space. The latest CPU from VIA is touted as a 1W CPU, with 0.1W standby power, but when the complete chipset and RAM are added its substantially higher, more like 10W. This article in LinuxDevices.com surveys the whole space very nicely.
The power trend is downwards, these chip sets are aimed at consumer devices, but not in the battery powered space. For commodity devices we can divide by the order of magnitude in power consumption and environmental conditions.
100-1000W Datacenter Server (air conditioned room)
10-100W Home PC/Laptop Space (fan cooled, on when in use, ambient room temp)
1-10W Home consumer devices (fanless, always on, ambient room temp)
100-1000 mW Battery powered millicomputers (always on, cool enough for your pocket)
Interesting technology and product disruptions occur when we mix these spaces. In some ways, the original compute farms that Google built were leveraging the low end home PC power/price/performance point into the datacenter. There are additional opportunities to leverage home consumer devices and millicomputers into the enterprise space.
Sunday, July 22, 2007
Millicomputer Performance Benchmarks
Performance benchmarks for several Millicomputer CPUs are being measured by several Homebrew Mobile Club members at our wiki.
The benchmark used is an old simple CPU benchmark called nbench, which is basically the original Byte magazine benchmark collection from ten years ago. Its very easy to get running. I looked at the industry standard SPEC benchmarks, but they are more vendor oriented and are not freely available. They don't have any useful results posted for ARM architecture systems.
At this point the results are hard to compare since the compiler options in use are somewhat varied and results for recent enterprise server CPUs have not been posted.
The benchmark used is an old simple CPU benchmark called nbench, which is basically the original Byte magazine benchmark collection from ten years ago. Its very easy to get running. I looked at the industry standard SPEC benchmarks, but they are more vendor oriented and are not freely available. They don't have any useful results posted for ARM architecture systems.
At this point the results are hard to compare since the compiler options in use are somewhat varied and results for recent enterprise server CPUs have not been posted.
Saturday, June 30, 2007
Intel's low power plays
There is an interesting "Power Plays" discussion of Intel's focus on low power at Ars Technica.
There is a lot of work going on to optimize current enterprise server designs to use less power, this is good, but its not the order of magnitude difference that a move to Millicomputing based designs would provide.
The most interesting new technology described is an interconnect that uses very low power and which scales its clock rate and power consumption according to the bandwidth demand. This brings variable capacity to the network layer, and I'd love to see some very low power Intel CPUs with this technology integrated.
There is a lot of work going on to optimize current enterprise server designs to use less power, this is good, but its not the order of magnitude difference that a move to Millicomputing based designs would provide.
The most interesting new technology described is an interconnect that uses very low power and which scales its clock rate and power consumption according to the bandwidth demand. This brings variable capacity to the network layer, and I'd love to see some very low power Intel CPUs with this technology integrated.
Thursday, June 28, 2007
The Flashiest Storage for the Millicluster
Per-module Flash uses the tiny microSDHC format which is about half an inch square (the picture shown is about three times actual size), see http://www.getflashmemory.info/category/microsdhc/. The older microSD format limits to 2GB (available one off for less than $20 each), and microSDHC expands this limit to 32GB using a FAT32 derived on-card filesystem. At present 4GB cards are available and 8 GByte cards have been announced. Streaming read and write performance for microSDHC is much higher than before at about 20MByte/s. Writes are just as fast as reads, and the file-system automatically avoids wearing out any one location in the flash memory.
There is no seek time! Random access at 1000’s of IOPS is only limited by the device driver efficiency, and will be benchmarked. Raw performance is 112 x 4 GB = 448 Gbytes/RU, 18.8 TB/Rack. 112 x 20 MB/s = 2240 MB/s/RU, 94 GB/s/Rack. The implications for storage performance in general are profound. The reason it is so fast is that the storage capacity is solid state, in a single chip and it is directly connected to the CPU chip. There is nothing getting in the way!
Enterprise Millicomputer Server Comparisons
To provide a competitive comparison two high end 1U Enterprise servers were priced online at http://www.sun.com - one Opteron and one low power Niagara SPARC. The Sun x4100 Opteron server uses about 400W. Its CPU performance is probably double that of an ARM at the same GHz so 2.8 GHz four cores x 2 = 22.4 GHz. Configured with the maximum of 32 GB RAM results in a $13K list price. The Sun T1000 Niagara uses about 200W of power. Its 1.0 GHz 8 core CPU has 32 threads. Lets call this 32 GHz, which is quite optimistic. With a maximum of 16 GB RAM it has a $15K list price. The Enterprise Millicomputer with OPiuM i.MX31 based modules uses less than 160W, probably much less. 532 MHz x 112 = 60 GHz and 256MB x 112 = 28GB RAM. With modules and flash costing perhaps $130 each, lower cost power supplies etc., a similar price point around $15K seems plausible.
Millicomputer networking has higher network bandwidth and there is a big additional saving as there is no need for an external load balancer appliance.
Millicomputer Storage wins with no contest! 2x146GB disks 240 IOPS vs. ~500000 IOPS, 448 GB Flash.
This is all on paper, actual benchmarks are needed, but the point is that the raw performance looks interesting enough to make it worth running the benchmarks....
Labels:
comparisons,
millicluster,
niagara,
opteron
Wednesday, June 27, 2007
Enterprise Millicluster System Specifications
The specifications below are very tentative, I have tried to be conservative, but this is a paper design at this point, and a real design could end up significantly more or less dense in terms of compute and power usage per rack unit (RU).
The Millicluster board I described is half the width of a typical enterprise motherboard. Its only 0.4" thick, so we can stack four of them high (4x0.4=1.6") in the 1.75" height of a 1U package.
Hence a standard 1U Enterprise Server Package contains Eight Milliclusters. This has a compute density of 112 OPiuM modules per RU, 4704 modules in a 42RU rack. The power consumption peaks under 160 Watts/RU, and idles at less than 24 Watts/RU. Maximum rating would be less than 6.7KW/Rack, which is quite reasonable. The CPU performance totals 60 GHz/RU, 2,520 GHz/Rack. There is 28 GBytes/RU of RAM, 1,172 GBytes/Rack.
The network has 8 Load balancer/bridge-routers per RU with 8 Gbits/RU module bandwidth on 16 redundant Gbit ports. An Ethernet switch could be added to the design to reduce the port count at a cost of a few watts and dollars. For storage a microSDHC flash memory socket at each module would hold a 2 GB microSD for very low cost, 4 GB for capacity, 8 GB in 2008.
There are many optional interfaces that could be used for specific applications. All modules include an ATA disk controller if needed, so each Millicluster could have connectors to support hard disks and DVD-ROM players. For graphics the i.MX31 modules include an OpenGL based 3D graphics accelerator and an LCD display driver with touch screen input. There is a camera input and video compression engine, with stereo audio and video playback. Modules also include multiple USB and serial interfaces.
Labels:
millicluster,
millicomputing,
MX31,
OPiuM,
rack,
system
Saturday, June 23, 2007
Enterprise Millicluster Board Layout
I came up with the board layout shown to indicate how we might arrange the millicomputer modules, interface bridge, USB network and microSD card holders on a conveniently sized board. Its just under half the size of a typical enterprise server motherboard. The total size is about 5.5” Wide x 12” Deep x 0.4” High. The Ethernet network bridge increases the idle power consumption to no more than 3 Watts, and fully active power consumption is unlikely to exceed 20 Watts, so I don't think heat sinks will be needed.
Enterprise Millicluster
Taking the components I have already menioned in previous posts, we can assemble them into a small cluster that seems to be a useful size and specification for an Enterprise Server building block. Its a cluster of millicomputers, so we may as well coin the name Millicluster as we go along (and register the millicluster.com etc. domains to point here :-)
Using 8-port USB switches, we could lay out 14 i.MX31 based Millicomputer modules behind a PPC440EPx based Ethernet Bridge that runs Linux so it is general purpose, but it will be pre-configured as a Load Balancer. This gives us a 1 Gbit/sec redundant network (it has two 1 Gbit links in, but only two 480Mbit links to the millicomputer modules). There is a total of 7.5 GHz of CPU, 3.5 GBytes of RAM, and 56 Gbytes of Storage using 4 GByte microSDHC flash memory cards on each Millicomputer.
This depends upon having a high speed 8-port switch, and so far I have found some products from Belkin and D-link that have one upstream port and seven downstream. I'm not sure what chipset they use, but they are inexpensive and have been available for a few years, so this seems reasonable.
Labels:
microSDHC,
millicluster,
millicomputer,
OPiuM,
PPC440EPx,
USBNet
Friday, June 8, 2007
x86 Millicomputers are on the way
News about the x86 architecture working its way down into the millicomputer space via the Pico ITX. This board is 1.8x3" with a 1GHz CPU and 256 or 512MB RAM, but it doesn't give power consumption. However they are positioning it for use in mobile devices so its likely to end up under a watt once they get the chip count down a bit further.
Friday, May 18, 2007
PowerTop monitoring tool
New tool for Intel/Linux systems monitors power use on a per-process basis. This is excellent, as a few months ago I decided I wanted to have a tool like this, and this saves me from writing it!
There is also a new NO_HZ feature in the latest Linux kernel 2.6-21 that stops the clock when the CPU is idle. PowerTop can be used to see how much power that saves.
There is also a new NO_HZ feature in the latest Linux kernel 2.6-21 that stops the clock when the CPU is idle. PowerTop can be used to see how much power that saves.
Thursday, May 17, 2007
microSD and microSDHC Storage for Millicomputers
Its been a while since I last posted here, mostly due to my change in employer, as mentioned on my personal blog... Now that is settled, I have a backlog of things to discuss here. Also, I'll be at the Techshop pavillion at Maker Faire this Saturday and Sunday afternoons.
There is an interesting price graph for 2GByte microSD at Nextag. It shows an introduction at about $120 last September, trending down to about $25 now. Amazon has it for $25 and Amazon Marketplace for $16.50. Meanwhile, larger devices are shipping using the microSDHC format at 4GByte, and announcements have been made for 8GByte. The microSD format gives a performance of around 20MBytes/s, so for 2KB reads that works out at a maximum of 10,000 random reads/second. Writing at 6MB/s works out at a maximum of 3000 random 2KB writes/s. Even allowing for operating system overhead, several thousand IOPS should be possible at under ten cents per IO per second and $10/GByte.
Of course, most millicomputer CPUs interface directly to devices like microSD, so there is no support device or interface needed. Just a microSD carrier or two, per millicomputer module.
There is an interesting price graph for 2GByte microSD at Nextag. It shows an introduction at about $120 last September, trending down to about $25 now. Amazon has it for $25 and Amazon Marketplace for $16.50. Meanwhile, larger devices are shipping using the microSDHC format at 4GByte, and announcements have been made for 8GByte. The microSD format gives a performance of around 20MBytes/s, so for 2KB reads that works out at a maximum of 10,000 random reads/second. Writing at 6MB/s works out at a maximum of 3000 random 2KB writes/s. Even allowing for operating system overhead, several thousand IOPS should be possible at under ten cents per IO per second and $10/GByte.
Of course, most millicomputer CPUs interface directly to devices like microSD, so there is no support device or interface needed. Just a microSD carrier or two, per millicomputer module.
Labels:
microSD,
microSDHC,
millicomputer,
storage
Tuesday, April 17, 2007
Millicomputer Based Load Balancers
If we build systems that contain hundreds of modules for web based applications, we need a way to manage the workflow distribution for incoming network traffic. Commercial load balancers cost more than millicomputing modules we want to send load to, so I've been looking around for open source projects that implement various kinds of load balancing. I have found a very good detailed summary article on this subject by Willy Tarreau, author of HAproxy, which he describes as:
At the http/application level, I found a description of a simple but powerful tool called balance.
In the array of modules scenario, I would dedicate a few modules to provide load balancing services. If the modules are all connected via Ethernet, then any module can be used. If we use the USB network then the central USB master that provides an Ethernet gateway is the natural place to install load balancer services.
HAProxy is a free, very fast and reliable solution offering high availability, load balancing, and proxying for TCP and HTTP-based applications. It is particularly suited for web sites crawling under very high loads while needing persistence or Layer7 processing. Supporting tens of thousands of connections is clearly realistic with today's hardware. Its mode of operation makes its integration into existing architectures very easy and riskless, while still offering the possibility not to expose fragile web servers to the Net...
At the http/application level, I found a description of a simple but powerful tool called balance.
Balance is our surprisingly successful load balancing solution being a simple but powerful generic tcp proxy with round robin load balancing and failover mechanisms. Its behaviour can be controlled at runtime using a simple command line syntax.Another http load balancer that claims high performance and more features is XLB. It states
XLB is a high performance HTTP load balancer. connection management, caching, ssl, scripting. 300 mbit/sec / 4000 reqs/sec takes 30% cpu on a 2GhZ Xeon. connection pooling to backend servers reduces memory and cpu usage on backends.One problem with load balancers, is that if they fail, a potentially large number of modules would be out of action. The Ultra Monkey load balancer addresses this issue.
Ultra Monkey 3 makes use of The Linux Virtual Server (LVS) to provide fast load balancing. The Linux-HA framework is used to monitor the linux-directors - the hosts running LVS and doing the load balancing. This is combined with ldirectord which monitors real-server - the hosts that accept end-user's connections. These three core components allow Ultra Monkey 3 to provide highly available and/or load balanced network services.I haven't used any of these options, so I'm very interested to get recommendations, please comment if you have experience or alternatives to share, and I'll update this post.
In the array of modules scenario, I would dedicate a few modules to provide load balancing services. If the modules are all connected via Ethernet, then any module can be used. If we use the USB network then the central USB master that provides an Ethernet gateway is the natural place to install load balancer services.
Labels:
balance,
haproxy,
load balancer,
millicomputer,
Ultra Monkey,
XLB
Gumstix initial bringup
I bought a Gumstix Verdex GS-270, along with a small motherboard that has serial, USB and power connectors. For initial bringup I also installed Ubuntu 6.10 Linux on a PC to act as my development host. I've figured out how to get logged in to Linux on the Gumstix, and I'm documenting it step by step here.
This was my goal!
Configuration messages at boot:
The 32MB flash memory is mounted as a filesystem, with 8MB taken up by the default installation.
This was my goal!
# uname -aThe basic sequence included getting at the serial port on the Dell, configuring it correctly, and figuring out which of the two serial ports on the motherboard has the console output.
Linux gumstix 2.6.18gum #1 Wed Feb 28 18:05:43 PST 2007 armv5tel unknown
- Download Ubuntu 600MB CD image - I used OSX, then used Disk Utility to burn it to a CD, and installed it on the PC, quite straightforward.
- Ubuntu doesn't include comms by default. I ran aptitude to search for programs and found a comms package that includes mincom and cu, so I installed both of them.
- The gumstix Wiki eventually revealed these setup instructions, which are to use mincom, turn off hardware and software flow control and set 115200-N-1 mode.
- This picture shows a similar motherboard, with the console port connected to the second serial port, which also worked for me.
- I connected the serial and USB cables, plugged in the power supply, and a small green LED glowed on the motherboard, nice confirmation that its on.
- After watching various boot messages, I logged in as root, with the initial password gumstix.
Configuration messages at boot:
U-Boot 1.1.4 (Mar 1 2007 - 17:10:55) - PXA270@600 MHz - 1321Interesting information on 32-way cache associativity, which I did not see mentioned in the specs.
*** Welcome to Gumstix ***
U-Boot code: A3F00000 -> A3F25850 BSS: -> A3F5AE70
RAM Configuration:
Bank #0: a0000000 128 MB
Flash: 32 MB
.... some more messages then:
Linux version 2.6.18gum (craig@azazel) (gcc version 4.1.1) #1 Wed Feb 28 18:05:7
CPU: XScale-PXA270 [69054117] revision 7 (ARMv5TE), cr=0000397f
Machine: The Gumstix Platform
Memory policy: ECC disabled, Data cache writeback
Run Mode clock: 208.00MHz (*16)
Turbo Mode clock: 624.00MHz (*3.0, active)
Memory clock: 104.00MHz (/2)
System bus clock: 104.00MHz
CPU0: D VIVT undefined 5 cache
CPU0: I cache: 32768 bytes, associativity 32, 32 byte lines, 32 sets
CPU0: D cache: 32768 bytes, associativity 32, 32 byte lines, 32 sets
The 32MB flash memory is mounted as a filesystem, with 8MB taken up by the default installation.
# dfThe system supports IP networking over USB, which I have plugged in but I don't have working yet (its supposed to come up automatically, but doesn't). Thats next.
Filesystem Size Used Available Use% Mounted on
/dev/mtdblock1 31.8M 8.0M 23.8M 25% /
Thursday, April 12, 2007
Vertical and Horizontal Module Arrangements
Modules are available with edge connectors that can be mounted in bulk on a mother board as shown in the image below. The dimensions match the standard motherboard found in 1U Enterprise server designs, about 12x13 inches. The diagram shows 120 modules, but its quite likely to be possible to pack them in more densely than this.
The alternative is to mount modules flat on the boards as shown in the second diagram. This has the same 12x13 inch area, but is a very thin board, and at least four of them could be stacked in a 1U package, which also comes out to 120 modules.
In practice these board sizes and layouts will need to be adjusted to take into account the mechanical problems of flexing, mounting, cable routing etc. In each case the power and cooling management should be relatively simple, since there is a total peak power of around 100 watts for the entire 1U package, and no localized hot spots.
Some of the module designs have built-in temperature sensors and they all have power voltage sensors, so they can detect and report on environmental conditions across the motherboard.
The alternative is to mount modules flat on the boards as shown in the second diagram. This has the same 12x13 inch area, but is a very thin board, and at least four of them could be stacked in a 1U package, which also comes out to 120 modules.
In practice these board sizes and layouts will need to be adjusted to take into account the mechanical problems of flexing, mounting, cable routing etc. In each case the power and cooling management should be relatively simple, since there is a total peak power of around 100 watts for the entire 1U package, and no localized hot spots.
Some of the module designs have built-in temperature sensors and they all have power voltage sensors, so they can detect and report on environmental conditions across the motherboard.
Wednesday, April 11, 2007
Millicomputer Module Interconnects
There are two basic approaches.
One is to get modules that have ethernet built-in (or to add ethernet interfaces to a motherboard) and use ethernet switch chips such as the 8-24 port solutions from Vitesse to cluster the modules together. The individual modules would connect at 100Mbit, and the switches and external interfaces would interconnect at 1Gbit. The single chip ethernet switches have lots of features but can be run as unmanaged devices, so there is very little software needed to implement or manage the network. By directly connecting the networks on a motherboard there is no need to drive the full physical ethernet wire standard between the devices, saving a lot of power. These devices cost a few dollars a port, and dissipate about half a watt per port for fully driven gigabit links. if we can avoid using the Ethernet "PHY" (physical driver) a lot more power can be saved.
Another option is to use the built-in high speed USB2.0 interfaces which run at up to 480Mbit/s and connect them to a USB based central router that has ethernet support, then run IP over USB. This is a bit more complex to implement, but could be faster, lower power and cheaper since it uses an interface that is directly built into the millicomputer CPU. There are other kinds of devices like the AMCC PPC440EPx that are more PC-like, and have ethernet, PCI-bus and high speed USB built-in that could be used to implement a board level controller/router/interface. This device is more powerful than the mobile oriented millicomputer CPUs but dissipates about 3W so its in the next bracket up from a power consumption viewpoint.
One is to get modules that have ethernet built-in (or to add ethernet interfaces to a motherboard) and use ethernet switch chips such as the 8-24 port solutions from Vitesse to cluster the modules together. The individual modules would connect at 100Mbit, and the switches and external interfaces would interconnect at 1Gbit. The single chip ethernet switches have lots of features but can be run as unmanaged devices, so there is very little software needed to implement or manage the network. By directly connecting the networks on a motherboard there is no need to drive the full physical ethernet wire standard between the devices, saving a lot of power. These devices cost a few dollars a port, and dissipate about half a watt per port for fully driven gigabit links. if we can avoid using the Ethernet "PHY" (physical driver) a lot more power can be saved.
Another option is to use the built-in high speed USB2.0 interfaces which run at up to 480Mbit/s and connect them to a USB based central router that has ethernet support, then run IP over USB. This is a bit more complex to implement, but could be faster, lower power and cheaper since it uses an interface that is directly built into the millicomputer CPU. There are other kinds of devices like the AMCC PPC440EPx that are more PC-like, and have ethernet, PCI-bus and high speed USB built-in that could be used to implement a board level controller/router/interface. This device is more powerful than the mobile oriented millicomputer CPUs but dissipates about 3W so its in the next bracket up from a power consumption viewpoint.
Labels:
ethernet,
interconnect,
millicomputer,
usb
PXA270 Module for testing
I just ordered a Gumstix GS270-XL6P module with 600MHz PXA270 and 128MB RAM. I'll run benchmarks on it, then end up building it into one of the mobile phone designs I'm working on. More later...
Thursday, April 5, 2007
Millicomputer Module Specifications
Here is another Google spreadsheet table of millicomputing module specifications.
There are several approaches, but some of these are edge connector based, include on-board ethernet, and could be stacked on a motherboard in a very dense array.
I think that on a standard 1U motherboard, if we could get five rows of 24 connectors that is 120 individual modules, using less than 100W maximum. The motherboard would just need to provide power and ethernet switch chips. If we also want per node storage, there are many very dense NAND flash ships in the multi-Gigabyte range that could be added to the design.
So is that interesting? I think so...
There are several approaches, but some of these are edge connector based, include on-board ethernet, and could be stacked on a motherboard in a very dense array.
I think that on a standard 1U motherboard, if we could get five rows of 24 connectors that is 120 individual modules, using less than 100W maximum. The motherboard would just need to provide power and ethernet switch chips. If we also want per node storage, there are many very dense NAND flash ships in the multi-Gigabyte range that could be added to the design.
So is that interesting? I think so...
Tuesday, April 3, 2007
Millicomputer CPU Specifications
I've started a table of CPU specifications as a google spreadsheet.
I'm mostly interested in the CPU clock rate, CPU caches, RAM bandwidth and size.
All these devices are very flexible, and are mostly configured with relatively small amounts of memory for embedded applications. However, they have a decently fast clock rate, and can interface to at least two SDRAM chips. These chips are 32bits wide and currently contain a total of 128MB each. The CPUs support up to 256MB per chip, so the next generation SDRAM devices can double overall capacity.
Compared to current "enterprise CPUs" they are much slower than Opterons but probably comparable to a single thread on a Niagara.
The next comparison table I want to put together is for board level devices, such as the Gumstix range.
I'm mostly interested in the CPU clock rate, CPU caches, RAM bandwidth and size.
All these devices are very flexible, and are mostly configured with relatively small amounts of memory for embedded applications. However, they have a decently fast clock rate, and can interface to at least two SDRAM chips. These chips are 32bits wide and currently contain a total of 128MB each. The CPUs support up to 256MB per chip, so the next generation SDRAM devices can double overall capacity.
Compared to current "enterprise CPUs" they are much slower than Opterons but probably comparable to a single thread on a Niagara.
The next comparison table I want to put together is for board level devices, such as the Gumstix range.
Labels:
millicomputer,
performance,
specifications,
table
Friday, March 30, 2007
Freescale i.MX31 Overview
The full specifications are available on the web, without needing an NDA. Freescale should be commended for their open attitude.
See below for the block diagram of the functions of this device, taken from the Freescale web site, a picture really is worth a thousand words in this case...
All this and the price is about $20 in small volume!
See below for the block diagram of the functions of this device, taken from the Freescale web site, a picture really is worth a thousand words in this case...
All this and the price is about $20 in small volume!
Thursday, March 29, 2007
Flash Solid State Disk (SSD) for Millicomputers
The days of keeping bits on spinning rust are coming to an end....
Both Samsung and SANdisk have announced 32GB SSDs. SANdisk's comsumes 0.9W max (competing disks take 1.9W) and fits in a 1.8 or 2.5" drive form factor with ATA interface. The sequential performance of these SSDs is similar to normal disks for reads, a bit slower for pure writes, but as soon as you start doing random reads or writes they are an order of magnitude faster than disks. The smaller the random accesses the bigger the relative speedup. The latest announcement from Samsung is a 1.8" 64GB version, and there is some discussion about the growth of this market in the press release.
This makes perfect sense for millcomputers. Small millicomputers can be directly connected to gigabytes of NAND flash via the SDIO interface, and larger millicomputers can use ATA interfaces to connect to flash-SSDs. The extra random performance of the SSD offsets the lack of disk spindles in a compact design and will make IO intensive workloads extremely competitive for millicomputing.
The MTBF (reliability) of SSDs is also far higher than disks. A mirrored pair of disks may be replaced with a single SSD since it has much higher reliability. This helps offset the current price premium paid for the SSD.
In the past SSDs have been built using technologies that were far more expensive than disks. Flash based SSDs have now reduced the gap, and the trend is that SSDs will eventually become bigger and cheaper than disks, the only question is when, and my answer is sooner than you think!
Update: here is a detailed benchmark review from Tomshardware.com.
Both Samsung and SANdisk have announced 32GB SSDs. SANdisk's comsumes 0.9W max (competing disks take 1.9W) and fits in a 1.8 or 2.5" drive form factor with ATA interface. The sequential performance of these SSDs is similar to normal disks for reads, a bit slower for pure writes, but as soon as you start doing random reads or writes they are an order of magnitude faster than disks. The smaller the random accesses the bigger the relative speedup. The latest announcement from Samsung is a 1.8" 64GB version, and there is some discussion about the growth of this market in the press release.
This makes perfect sense for millcomputers. Small millicomputers can be directly connected to gigabytes of NAND flash via the SDIO interface, and larger millicomputers can use ATA interfaces to connect to flash-SSDs. The extra random performance of the SSD offsets the lack of disk spindles in a compact design and will make IO intensive workloads extremely competitive for millicomputing.
The MTBF (reliability) of SSDs is also far higher than disks. A mirrored pair of disks may be replaced with a single SSD since it has much higher reliability. This helps offset the current price premium paid for the SSD.
In the past SSDs have been built using technologies that were far more expensive than disks. Flash based SSDs have now reduced the gap, and the trend is that SSDs will eventually become bigger and cheaper than disks, the only question is when, and my answer is sooner than you think!
Update: here is a detailed benchmark review from Tomshardware.com.
Labels:
flash,
millicomputer,
samsung,
sandisk,
ssd
What is a Millicomputer? Why talk about Millicomputing?
While researching devices for my home brew mobile phone, I've realized that the current generation of CPUs for mobile devices are actually seriously powerful, very low cost and use almost no power. The performance per watt and per dollar seems to be an order of magnitude better than the PC-class CPUs that are common in commodity servers nowadays. The absolute performance and memory capacity is lower, but is comparable to common PC hardware from a few years ago, and could be useful for more than running a high end phone or portable games machine. Devices such as the Marvel PXA270 and Freescale i.MX31 run at over 500MHz, some include floating point units, they support at least 128MB of RAM (a single chip), and a myriad of I/O interfaces, with Linux 2.6 support.
While the current mainstream CPUs were driven by the development of the home PC market, this generation is driven by the development of the mobile, battery powered device market, which is a very large. For example the worldwide cellphone market is something like a billion devices a year.
I think that there could be some interesting general purpose computer systems built from low power devices (CPUs that use less than one watt). I looked around but wasn't sure what to search for... I do know about the systems that are sold for embedded use, but they are typically configured using lower speed and lower memory options.
Does anyone know of vendors selling general purpose millicomputer based systems?
I need a name for this class of system, so I'm going to call them Millicomputers, and I'm going to explore this area in public on this blog, and using the principles of open hardware that we have adopted for the homebrew mobile phone club, I expect to help build some.
I originally asked this question on my main blog, and asked a lot of people in person, but didn't find a pre-existing name or any objections to this concept.
While the current mainstream CPUs were driven by the development of the home PC market, this generation is driven by the development of the mobile, battery powered device market, which is a very large. For example the worldwide cellphone market is something like a billion devices a year.
I think that there could be some interesting general purpose computer systems built from low power devices (CPUs that use less than one watt). I looked around but wasn't sure what to search for... I do know about the systems that are sold for embedded use, but they are typically configured using lower speed and lower memory options.
Does anyone know of vendors selling general purpose millicomputer based systems?
I need a name for this class of system, so I'm going to call them Millicomputers, and I'm going to explore this area in public on this blog, and using the principles of open hardware that we have adopted for the homebrew mobile phone club, I expect to help build some.
I originally asked this question on my main blog, and asked a lot of people in person, but didn't find a pre-existing name or any objections to this concept.
Subscribe to:
Posts (Atom)