h5part AT lists.psi.ch

Subject: H5Part development and discussion

List archive

[H5part] Re: [H5part] A high performance implementation of MPI-IO for a Lustre ﬁle system environment

From: John Shalf <jshalf AT lbl.gov>
To: Andreas Adelmann <andreas.adelmann AT psi.ch>
Cc: h5part AT lists.psi.ch, Mark Howison <MHowison AT lbl.gov>
Subject: [H5part] Re: [H5part] A high performance implementation of MPI-IO for a Lustre ﬁle system environment
Date: Mon, 29 Mar 2010 02:49:44 -0700
List-archive: <https://lists.web.psi.ch/pipermail/h5part/>
List-id: H5Part development and discussion <h5part.lists.psi.ch>

Very strange...

I would like to see the text of the article to see more specifically what they are claiming.

It all depends on their definition of "large" and "small". If you think of the collective buffering, it does look like its breaking some large I/O's into a bunch of small sparse (1-4mb) operations when viewed from the collective buffering processes. I'd be pretty surprised if their solution uses transactions that are < 1mb in size (the Lustre stripe width).

Do you have the full paper?

On Mar 29, 2010, at 12:18 AM, Andreas Adelmann wrote:

Is that relevant for us ?

Here the abstract:

CONCURRENCY AND COMPUTATION: PRACTICE AND EXPERIENCE
Concurrency Computat.: Pract. Exper. (2009)
Published online in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/cpe.1491

A high performance implementation of MPI-IO for a Lustre ﬁle system environment
Phillip M. Dickens∗, † and Jeremy Logan

It is often the case that MPI-IO performs poorly in a Lustre ﬁle system environment, although the reasons
for such performance have heretofore not been well understood. We hypothesize that such performance is
a direct result of the fundamental assumptions upon which most parallel I/O optimizations are based. In
particular, it is almost universally believed that parallel I/O performance is optimized when aggregator
processes perform large, contiguous I/O operations in parallel. Our research, however, shows that this
approach can actually provide the worst performance in a Lustre environment, and that the best perfor-
mance may be obtained by performing a large number of small, non-contiguous I/O operations. In this
paper, we provide empirical results demonstrating these non-intuitive results and explore the reasons for
such unexpected performance. We present our solution to the problem, which is embodied in a user-level
library termed Y-Lib, which redistributes the data in a way that conforms much more closely with the
Lustre storage architecture than does the data redistribution pattern employed by MPI-IO. We provide a
large body of experimental results, taken across two large-scale Lustre installations, demonstrating that
Y-Lib outperforms MPI-IO by up to 36% on one system and 1000% on the other. We discuss the factors
that impact the performance improvement obtained by Y-Lib, which include the number of aggregator
processes and Object Storage Devices, as well as the power of the system’s communications infrastructure.
We also show that the optimal data redistribution pattern for Y-Lib is dependent upon these same factors.

This is my favorite: We hypothesize that such performance is
a direct result of the fundamental assumptions upon which most parallel I/O optimizations are based. In
particular, it is almost universally believed that parallel I/O performance is optimized when aggregator
processes perform large, contiguous I/O operations in parallel. Our research, however, shows that this
approach can actually provide the worst performance in a Lustre environment, and that the best perfor-
mance may be obtained by performing a large number of small, non-contiguous I/O operations.

Their report does not go beyond 1024 cores.

What about the Cray implementation?

Thoughts ?

AA
-- 
Dr. sc. math. Andreas (Andy) Adelmann
Staff Scientist
Paul Scherrer Institut WBGB/132 CH-5232 Villigen PSI
Phone Office: xx41 56 310 42 33 Fax: xx41 56 310 50 90
Phone Home: xx41 62 891 91 44
-------------------------------------------------------
Wednesday: ETH CAB H ??.?  xx41 44 632 ?? ??
=======================================================
The more exotic, the more abstract the knowledge, 
the more profound will be its consequences."
Leon Lederman 
=======================================================

[H5part] A high performance implementation of MPI-IO for a Lustre ﬁle system environment, Andreas Adelmann, 03/29/2010
- [H5part] Re: [H5part] A high performance implementation of MPI-IO for a Lustre ﬁle system environment, John Shalf, 03/29/2010