h5part AT lists.psi.ch

Subject: H5Part development and discussion

List archive

[H5part] [Fwd: Re: [Fwd: Re: H5Part]]

From: Andreas Adelmann <andreas.adelmann AT psi.ch>
To: h5part AT lists.psi.ch
Subject: [H5part] [Fwd: Re: [Fwd: Re: H5Part]]
Date: Fri, 11 Apr 2008 19:41:48 +0200
List-archive: <https://lists.web.psi.ch/pipermail/h5part/>
List-id: H5Part development and discussion <h5part.lists.psi.ch>

Dear colleagues I forward important comments and suggestions from John Shalf FYI,

Andreas

--
Dr. sc. math. Andreas (Andy) Adelmann
Staff Scientist
Paul Scherrer Institut WLGB/132 CH-5232 Villigen PSI
Phone Office: xx41 56 310 42 33 Fax: xx41 56 310 31 91
Phone Home: xx41 62 891 91 44
-------------------------------------------------------
Wednesday: ETH CAB F 10.1 xx41 44 632 82 76
=======================================================
The more exotic, the more abstract the knowledge, the more profound will be its consequences."
Leon Lederman =======================================================

--- Begin Message ---

From: John Shalf <JShalf AT lbl.gov>

To: Andreas Adelmann <andreas.adelmann AT psi.ch>

Cc: oswald <benedikt.oswald AT psi.ch>, mike folk <mfolk AT ncsa.uiuc.edu>, Elena Pourmal <epourmal AT hdfgroup.org>

Subject: Re: [Fwd: Re: H5Part]

Date: Fri, 11 Apr 2008 10:15:15 -0700

Hi Andreas,
I saw your email earlier about John Biddiscombs comments, but was too busy to respond appropriately. I wanted to make the following comments.

Regarding use of COLLECTIVE vs. INDEPENDENT I/O, is an artifact of our benchmarking on LUSTRE filesystem and AIX systems where collective actually performed slower than independent I/O (not expected). However, it should not have become a feature of H5Part. Its good that John B. noticed this. Its also a good sign that a single flag in the H5Part enabled tuning for BG/L. We should make sure that we keep track of architecture tunable parameters. Some other important tunables are the block-alignment parameters.

%%%%

Regarding the use of compound types, we considered this early in the design process. Certainly using the compound type enables us to write datasets using larger transactions, and therefore improve performance. However, H5T_Compound restricts your ability to pull out individual fields for data analysis. If you store all of your fields in the COMPOUND structure, then you are forced to read them all back in when analyzing the data, even if you only want one of the fields.

Secondly, we chose not to use the COMPOUND structure because we wanted to develop a file format that is useful across more than one simulation code. COMPOUND implies a particular ordering of data fields in memory that can be very specific to one code, and would make it very difficult for codes that have different numbers of fields or different orderings of fields in memory. Consider the type management head-aches if every code has to define its own H5T_COMPOUND structure to treat its own native set of fields and data structures. It sounded unmanageable, we chose to lose some performance for the sake of portability and long-term data provenance.

%%%%

I would like to understand the motivation for storing metadata in an external XML format (e.g. the Xdml that stores heavy data in one file and light data in another???). This doesn't make as much sense to me.

1) You are replacing a compact, hierarchical object database representation of the metadata with hat amounts to an ASCII flat-file. This seems to be backward

2) It is much better for data provenance issues to have a single definitive source for your metadata. There is the danger that the metadata and data files can get mis-matched or misplaced. You can always generate an XML dump of the HDF metadata using h5dump -xml.

3) ASCI text of XML is an extraordinarily inaccurate method for representing floating point data, and is very slow to parse

There are a number of archival systems, such as SRM, that benefit from having a replicated copy of the metadata to facilitate searching of data where the bulk of the data is on tape. However, it is bad practice to allow the possibility of inconsistency in the representation of the metadata.

-john

On Apr 2, 2008, at 8:47 AM, Andreas Adelmann wrote:
FYI, Andreas

--
Dr. sc. math. Andreas (Andy) Adelmann
Staff Scientist
Paul Scherrer Institut WLGB/132 CH-5232 Villigen PSI
Phone Office: xx41 56 310 42 33 Fax: xx41 56 310 31 91
Phone Home: xx41 62 891 91 44
-------------------------------------------------------
Wednesday: ETH CAB F 10.1 xx41 44 632 82 76
=======================================================
The more exotic, the more abstract the knowledge, the more profound will be its consequences."
Leon Lederman =======================================================

From: John Biddiscombe <biddisco AT cscs.ch>
Date: April 2, 2008 7:43:14 AM PDT
To: Jean Favre <jfavre AT cscs.ch>
Cc: Achim Gsell <achim.gsell AT psi.ch>, Andreas Adelmann <andreas.adelmann AT psi.ch>
Subject: Re: H5Part

Jean, Andreas, Achim, please forward to others if necessary. (I have not posted this to the H5Part list, but please do so if you think it is of interest to other readers).

In response to a query from Jean, and in time to provoke discussion at the CSCS User assembly on Friday where I believe you will meet, here's a brief synopsis of my H5Part related experiences, both recently and further back.

I have been using H5Part for some time now and have developed a reader and writer class for vtk/paraview which I use on a daily basis. I have also developed converters which allow me to convert virtually any ASCII file (which is common for the SPH community since mostly they are in their relative infancy and still do tests on small numbers of particles with ASCII IO) into H5Part. Full details can be found on the pv-meshless wiki in at https://twiki.cscs.ch/twiki/bin/view/ParaViewMeshless in the section on Data formats.

On the whole I have had no problems with H5Part and find it a convenient library to use. I really only use the open file and open time step (group) functions within the library as I have implemented most of the hyperslab selection stuff myself within vtk reader/writer classes. I have not yet looked at H5Block - and I do not see myself doing so as I have managed to store my volume data using hdf5/Xdmf calls and then using the Xdmf Readers within vtk/paraview to read the data (subject to some fixes I made and am in the process of extending).

My most recent tests were at EDF in France, where I had access to the BlueGene machine. They converted their IO to dump data in HDF5 files, but did not use H5Part, instead used an H5T_COMPOUND structure to write all data out in a single call. I modified their IO to use an H5Part form (for my convenience/compatibility), but we quickly found that performance was very poor in comparison. After some further testing last week, I discovered that my stripped down test was setting the collective IO flag always false, so the expected speed up never happened. Having fixed this, the speed difference between compound and H5Part dropped to a factor of about 2. I set a number of tests running on BG using 1,2,3....20 scalar arrays, on 1000,5000,10000,50000,100000,500000,1E6,5E6,1E7,5E7 particles using collective/independent and running on 32,64,128,256,512,1024,2048 processors - giving a total of 10*7*20*2 combinations of timings (might have got these numbers wrong as I'm writing from memory, and some of the larger particle write tests failed and were skipped). The timing results are sitting on BG at montpellier and I'm awaiting an IBM engineer to send them to me as I cannot acccess the machine from here. I expect the results to maintain a 2:1 difference (or thereabouts), but I'll compile a full document when I have the data. If the results prove interesting enough and worth discussing, then I will try writing a short paper to submit somewhere with observations about efficient data IO and strategies etc. I will rerun tests with different cache options and other configuration tweaks as and when I can.

Faced with the choice of implementing IO using H5Part - or using H5Compound type with a factor of 2 speed difference. And based on my estimate of a few days work to implement a reader for paraview to support the new type. We decided to use the new format instead of H5Part for further IO (writes). This decision can be easily changed by simply swapping the IO calls in their SPH code to use the H5Part compatible version should we find that timings do in fact come out in favour of H5Part. Typically they anticipate using anything from 1000 to 100,000 particles per processor, but using very large numbers of processors - the tests I performed were designed to excercise this pattern - which I suspect differs somewhat from the anticipated use cases of PSI/and their colleagues etc. Now that I have a working reader for the new H5 Compound data format (which I refer to as H5SPH), I will put together a set of code snippets that other SPH users can use for their IO.
In fact, using H5Part style interface for OpenFile, SetTimeStep, SetNumberOfParticles, SetView, is basically the same for H5Part and H5SPH, but these 4 functions themselves are not much work to implement, so whilst its a shame to redo this work for an alternative format/library, it isn't actually much work.

As mentioned, I have also been using Xdmf format to store volume blocks of data. Xdmf is simply an hdf5 file for heavy data, and xml for light data. This has proven to be very flexible, and I am considering adopting it as a wrapper around all my hdf5 based data (including H5Part) - implying that I could probably read H5Part formatted particle data using the XdmfReader (though I have not actually tried this - it would require the generation of an xml wrapper for H5Part files, but then the vtkH5PartReader would not be needed and all maintenance could be shifter to one place). I'm not sure at the moment if Xdmf is capable of wrapping Compound data types as used by the new format, so I still have too many readers and too many formats, I will be looking into this during the coming weeks.
One reason for liking Xdmf is that within Xdmf it is simple to store N timesteps of data in one hdf5 file, the next N in another, etc etc. One issue I've had with H5Part is that file sizes keep growing, and limiting individual files to around 50GB is convenient for us. So (say) 50 time steps in one file, 50 in the next, etc etc is useful. It would of course be quite simple to add this functionality to the H5Part libraries (and perhaps is already in there), but since I am already a heavy vtk/paraview and now xdmf user with commit priviliges to these repositories, it makes my life easier to focus now on Xdmf. I therefore see myself moving towards Xdmf in the longer term as it allows a greater variety of storage forms and flexibility. I will not stop using H5Part for existing data and will continue to follow the developments of the extra features that keep going in...but for now, It already does all that I need and needs no real improvement. For our partners who are now switching to much bigger simulations, I keenly await the timing results from BG and the opportunity to run more tests which should also include timings for reading data back into visualization or other post processing software, which will usually occur on fewer processors.

-------------

To summarize : H5part works very well for all the data I already have and any new stuff that comes my way. H5SPH may be used by future big data generators and I will gradually shift to using this myself if I get more H5SPH data than H5Part. For other hdf data types, I will focus on Xdmf so that as much as possible code can be unified into a single package with hopefully less maintenance.

Disclaimer : I have not explored some of the recent 'features' or developments for extracting subsets of data in H5Part, however, I will look into this as and when users have requests could benefit from it/them.

JB

--
John Biddiscombe, email:biddisco @ cscs.ch
http://www.cscs.ch/about/BJohn.php
CSCS, Swiss National Supercomputing Centre | Tel: +41 (91) 610.82.07
Via Cantonale, 6928 Manno, Switzerland | Fax: +41 (91) 610.82.82

--- End Message ---

[H5part] [Fwd: Re: [Fwd: Re: H5Part]], Andreas Adelmann, 04/11/2008
- Re: [H5part] [Fwd: Re: [Fwd: Re: H5Part]], John Biddiscombe, 04/11/2008