Skip to Content.
Sympa Menu

h5part - Re: [H5part] SVN access available for contributions?

h5part AT lists.psi.ch

Subject: H5Part development and discussion

List archive

Re: [H5part] SVN access available for contributions?


Chronological Thread 
  • From: John Shalf <JShalf AT lbl.gov>
  • To: John Biddiscombe <biddisco AT cscs.ch>
  • Cc: h5part AT lists.psi.ch
  • Subject: Re: [H5part] SVN access available for contributions?
  • Date: Thu, 25 Jan 2007 21:54:10 -0800
  • List-archive: <https://lists.web.psi.ch/pipermail/h5part/>
  • List-id: H5Part development and discussion <h5part.lists.psi.ch>

On Jan 25, 2007, at 1:42 AM, John Biddiscombe wrote:
John

So I assume you want a convenience routine that can reassemble a list of arrays (stored on disk as scalar fields) and interlace them as vectors in memory. I assume you are *not* storing the data as N-component vector fields in the file though (am I correct?). If you've already done that, then it would be an excellent addition to the API. But we definitely want the disk image of the fields to be distinct vectors for various reasons.

So, to see if I understand this correctly, you would like a convenience function that allows you to specify a vector of dataset names (rather than a single name) and it would naturally interlace them in memory upon read? Is that the request? I think that should be pretty reasonable as well given HDF's support for strided memory spaces. Do you have a proposed appearance for this API? (something that doesn't use var_args since it would complicate the F90 bindings). Overall, this is also quite reasonable provided the on-disk image has them laid out as scalars (they can be reconstituted in memory as vectors).
I have already implemented the write and read back of N component variables as N single components fields. I've tested it in parallel with the combination of the memory space for the N-tuple arrays in memory and the Dataspace for parallel IO and it's good.
I would like to contribute this to the main H5Part API. At the moment, the writing out is fine, but I am still playing with the read back and the interface is something along these lines
ReadNComponentArray(int NComponents, float/double/etc *dest, char **arrayOfNames)
so the array of names chosen to be read is passed in and assumed to be the same length as the number of components desired.

Thats great. We should get it integrated straight away.


The current sorting algorithm for the file format will be able to accommodate the different numbering formats you propose, so your proposed change would be backward compatible with existing readers (that's a good thing). So this addition could be implemented as a convention rather than a requirement.
OK. I'm not familiar with existing sorting algorithms in the context of H5Part. I find that I often browse files using NCSA's (I think) HDF5 viewer package and it lists things using a straight alphabetical sort.

Its not so much for the viewer as for how the API sorts the steps as a sequence. The algorithm currently used within the API to sort the steps for reading will accept either format (the sorting algorithm used by h5dump and such is more picky).

We do encourage liberal use of attributes to serve the individual needs of groups though, so you should definitely implement storage of the TimeValue attribute. We should probably document the attributes that various groups have proposed for their own local conventions.
OK.


When I say "convention" I mean additional features that can be used to extend the content of the file format using attributes. When it is a convention, then readers can be coded to look for them for added value, but they should also be prepared for their absence. We are trying to minimize the "requirements" so as to keep the readers as simple as possible.
I do think that a primary dataset group "Name"/prefix should be a requirement. You already have backward imcompatibility between "particles1" ans "steps1" - had this been a requirement previously, then the files would be compatible. ($0.02)

Well "partlclesX" was in the prototype (not the production release per se).
The dataset group must be a requirement because it is the schema for the object storage format (irrespective of whether HDF5 is the underlying storage format). Attributes are quite different from data schemas.

The decision to go with limited type support for the API was for two reasons
understood, I'd still like to add prototypes for the main types commonly supported on all platforms. Not complex user defined structures.

No problem. We expected to expand the API on an as-needed basis as more types were encountered. So if you are encountering those types, then the API should expanded to accomodate.


For various reasons, it is better for us to keep each vector component as separate scalar arrays on disk
I'm happy with this and am already doing it.

I did find some other bugs which I'm hoping have been fixed. I had problems when I compiled the code with parallel support, but was not using Parallel IO and put a couple of extra checks in. I also found a bug when the number of particles is dynamic and new mem/ data spaces are needed which wasn't handled correctly.

I think those are bugs that Achim has dealt with. Also, Achim made the error checking far more robust.

I can see the H5Part repository, but cannot access it. May I please have access so that I can bring my current code base up to date with your svn Head version.

I think the folks at PSI/CSCS are working on that. (should be fixed soon)

-john





Archive powered by MHonArc 2.6.19.

Top of Page