Tuesday, August 28, 2007

Millicomputing Applications - ETL

Millicomputers have a very different balance of compute/memory/network/io resources compared to more conventional architectures. They are lower in absolute terms for compute/memory/network, but much higher for random access io.

The performance per watt and the price/performance are very competitive for compute/memory/network, as long as applications can be run at a smaller "grain size". However for io, the aggregate performance of a large number of direct attached flash devices is amazing.

One possible application is from the data warehousing space. Known as ETL, this is the Extract Transform and Load step that pulls data from online transaction processing systems, such as the collection of database back-ends for a web site, and puts it into a form that can be queried to answer questions about the business. There has been a lot of work put into making the ETL processes into decomposable parallel applications, and there is an open source ETL implementation in Java called KETL. KETL was originally written several years ago, when the typical systems of the day were similar in capacity to the millicomputers we have today, so I'm hopeful that the grain size will fit.

KETL is io intensive and it also supports running on a cluster of networked computers, so overall it looks like a plausible fit for an enterprise millicomputer application.

No comments: