Parallel File Systems for 'Extreme' Enterprise Applications ~ Reviews

Money, Money, Money…

In the financial sector, revenue is all about numbers, speed and making the best decision at the right time while controlling risk.

We are seeing that in financial services firms, data capture, algorithm development, testing and risk management projects are all pushing the performance boundaries of traditional storage. Hedge funds and trading firms are starting to take advantage of parallelism in order to analyze more positions faster and deploy competitive trading strategies. Using scalable systems that support massively parallel data access, researchers can analyze larger data sets and test more scenarios delivering faster, more effective models. Similarly, risk managers are increasing their ability to assess total market exposure from only once or twice a day to much shorter intervals.

All of this goes straight to the bottom line and provides competitive advantage.

Extremely Cloudy Applications

If there is such thing as “normal” cloud storage today, it’s considered to be slower than “Web speed.” But it makes sense that businesses considering extreme applications will seek the agility and elasticity of cloud hosting rather than building internal infrastructure, especially where the main source of data is a Web 2.0 application.

As cloud providers like Amazon Web Services overcome data IO and storage challenges to provide cloud hosting for IO-intensive big data and video translation, we expect to see many service providers vying to support even more extreme applications.

Parallel File Systems to the Rescue/Rescue/Rescue/…

Extreme applications provide several interesting storage system challenges that can be answered by parallel file systems.

Parallel file systems are based on scale-out storage nodes, with an ability to spread and then serve huge files from many nodes and spindles at once. Unlike scale-out clustered NAS, which is designed for serving many files independently each to different clients at the same time (e.g. hosting home directories in a large enterprise or fully partitioned/shared big data blocks), fully parallel file systems are great for serving huge shared files to many inter-related processing nodes at once.

Big data solutions based on Apache Hadoop (with HDFS) are also designed around scale-out storage. But these essentially carve up data into distributed chunks. They are aimed at analytics that can be performed by isolated “mapped” jobs on each node’s assigned local data chunk. This batch style approach enables a commodity-hardware architecture because localized failures are simply reprocessed asynchronously before cluster-wide results are collected and “reduced” to an answer.

However, extreme apps, including many machine-learning and simulation algorithms, rely on high levels of inter-node communication and sharing globally accessed files. This synchronized cluster processing requires high parallel access throughput, low latency to shared data, and enterprise-class data protection and availability—far different characteristics than HDFS provides.

Industrialization of Extreme Performance

Robust supercomputer parallel file systems are emerging from academia and research and are ready to deploy in commercial enterprise data centers. There are now a number of commercialized Linux-centric parallel file systems based on open source Lustre (e.g. from DDN, Terascala, et.al.) for Linux-based cluster computing. And for IT enterprise adoption of extreme applications supporting multiple operating systems with enterprise data protection, we see GPFS (General Parallel File System from IBM) setting the gold standard.

Parallel file systems can be procured and deployed on many kinds of storage nodes, from homegrown clusters to complete appliances. For example, DDN has industrialized a number of parallel file systems to host extreme applications in the enterprise market. Their GRIDScaler solution integrates and leverages parallel file services on their specialized HPC-performing storage hardware. This kind of integrated “appliance” solution can provide a lower TCO for enterprises due to baked-in management, optimized performance, reduced complexity, and full system support.

Extremely Compelling

New data-intensive solutions are enabling the exploitation of huge amounts of data to extract new forms of knowledge and insight. These new extreme applications can ultimately create new revenue streams that could disrupt and change whole markets.

Big data analysis is one type of extreme application, but it is only the tip of the iceberg when it comes processing large amounts of new data in new ways. New applications that demand parallel file access, high throughput, low latency, and high availability are also on the rise, and more and more enterprises (and service providers) will be tasked to deploy and support them.

Luckily, IT can support these challenging extreme applications by leveraging the vendor trends in industrializing technologies like parallel file systems. Technical excuses are diminishing, and the competition is heating up—it is definitely time for all enterprises to move forward with their own extreme applications.

If you are in IT and haven’t been asked to support an extreme application yet, you should expect to very soon.

Reviews

Contributors

Categories

Tuesday, 14 May 2013