Skip to Content.
Sympa Menu

h5part - Re: [H5part] H5Part performance problem

h5part AT lists.psi.ch

Subject: H5Part development and discussion

List archive

Re: [H5part] H5Part performance problem


Chronological Thread 
  • From: Kurt Stockinger <KStockinger AT lbl.gov>
  • To: Kurt Stockinger <KStockinger AT lbl.gov>
  • Cc: h5part AT lists.psi.ch
  • Subject: Re: [H5part] H5Part performance problem
  • Date: Tue, 05 Dec 2006 10:05:52 -0800
  • List-archive: <https://lists.web.psi.ch/pipermail/h5part/>
  • List-id: H5Part development and discussion <h5part.lists.psi.ch>

One correction below:
> Hi Thomas,
>
> Thanks for the detailed code description. It's an excellent way for me
> to see how H5Part is currently being used. I have a few comments below.
>
> Thomas Schietinger wrote:
>
>> Dear all,
>>
>> let me add some more info on the performance problem reported
>> at yesterday's phone meeting.
>>
>> The routines I use for reading in an HDF5 file in H5root are:
>> (see file TH5Dataset.cc)
>>
>> get the file handle:
>>
>> fH5File = H5PartOpenFile(fFullFilename.Data(),H5PART_READ);
>>
>> (fFullFilename.Data() evaluates to a char string)
>>
>> get the number of datasets:
>>
>> H5PartSetStep(fH5File,0);
>> Int_t nDataset = H5PartGetNumDatasets(fH5File);
>>
>> get the dataset names:
>>
>> for(Int_t i=0;i<nDataset;i++){
>> H5PartGetDatasetName(fH5File,i,name,maxLength);
>> ...
>> }
>>
>> get the number of steps, step attributes and file attributes:
>>
>> fNStep = H5PartGetNumSteps(fH5File);
>> fNStepAttr = H5PartGetNumStepAttribs(fH5File);
>> fNFileAttr = H5PartGetNumFileAttribs(fH5File);
>>
>> get file attribute info:
>>
>> for (Int_t n = 0; n < fNFileAttr; n++) {
>> H5PartGetFileAttribInfo(fH5File,n,an,32,0,nElem);
>>
> Is it intended that you don't return the type of the attribute (the 5th
> parameter) or do you assume that you know the type, i.e. all are of type
> char?
>
>> ...
>> }
>>
>> get step attribute info for step 0 - ASSUME THEY WILL BE THE SAME
>> FOR ALL STEPS!
>>
>> H5PartSetStep(fH5File,0);
>>
>> for (Int_t n=0; n<fNStepAttr; ++n) {
>> H5PartGetStepAttribInfo(fH5File,n,an,32,0,nElem);
>>
> Same comment about type as above.
>
>> ...
>> // distinguish scalar and vector attributes, ignore others:
>> TString type("?");
>> if (*nElem==1 ) { // scalar variable
>> fScalarName.AddLast(new TObjString(attrName));
>> fNScalarAttr++;
>> type = TString(" (scalar)");
>> } else if (*nElem==3 ) { // vector variable
>> fVectorName.AddLast(new TObjString(attrName));
>> fNVectorAttr++;
>> type = TString(" (vector)");
>> } else {
>> ...
>> }
>>
>> Read in the unit strings for the attributes found:
>>
>> for (Int_t i = 0; i < 9; i++) fPartVarUnit[i] =
>> GetUnit(fPartVarName[i]);
>> for (Int_t i = 0; i < fNScalarAttr; i++)
>> fScalarUnit.AddLast(new TObjString(GetUnit(static_cast<TObjString*>
>> (fScalarName.At(i))->GetString())));
>> for (Int_t i = 0; i < fNVectorAttr; i++)
>> fVectorUnit.AddLast(new TObjString(GetUnit(static_cast<TObjString*>
>> (fVectorName.At(i))->GetString())));
>>
>> where GetUnit(...) is a method that contains a loop over the file
>> attributes to
>> find the associated unit name:
>>
> In the H5Part doc I didn't find anything specific about unit names. Can
> you explain this a bit? Do you assume that the attribute name is unified
> to char[32]?
>
>> for (Int_t i = 0; i < fNFileAttr; i++) {
>>
>> TObjString* s = static_cast<TObjString*>(fFileAttr.At(i));
>> if (s->GetString() == varNameU) {
>> char u[32];
>>
>> H5PartReadFileAttrib(fH5File,const_cast<char*>(s->GetString().Data()),
>> &u);
>> unit = TString(u);
>> }
>> ...
>> }
>>
>> Now loop over steps and retrieve attribute and particle data for each
>> step:
>>
>> for (Int_t step=0; step<fNStep; step++) {
>> H5PartSetStep(fH5File,step);
>> unsigned long n = H5PartGetNumParticles(fH5File);
>>
>> // read in scalar attributes
>> for (int i=0; i<fNScalarAttr; ++i) {
>> TObjString* s = static_cast<TObjString*>(fScalarName.At(i));
>>
>> H5PartReadStepAttrib(fH5File,const_cast<char*>(s->GetString().Data()),
>> &val);
>> ...
>> }
>>
>> // read in vector attributes
>> for (int i=0; i<fNVectorAttr; ++i) {
>> TObjString* s = static_cast<TObjString*>(fVectorName.At(i));
>>
>> H5PartReadStepAttrib(fH5File,const_cast<char*>(s->GetString().Data()),
>> arr);
>> ...
>> }
>>
>> Sorry if this is too much information, I just wanted to let you know
>> what routines
>> I am using, so you can see if I am doing something very inefficient. I
>> have not
>> given much thought to the choice of routines, I just looked for what I
>> needed,
>> typically found it rather quickly and then only made sure it does what
>> I want it
>> to do without measuring performance or anything. The files I look at
>> load very
>> quickly anyway, it is only when Andreas tries to simulate the whole
>> world that
>> he ends up waiting a couple of minutes ;-)
>>
>> Now some numbers: a 3 Giga file with some 500 time steps (21 file
>> attr., 12 step
>> attr.) takes 19.1 s to read in (the second time only 0.5 s since the
>> file is
>> buffered).
>>
> This the time for reading the attributes only, right? If yes, then the
> whole 3GB is not read entirely but only the 21*12 (char) attributes, right?
>
Just looked at your code again: In total, you read ~500 * (21+12) =
16500 attributes, right?

Kurt
> Can you send me a pointer to the file to download so that we can look at
> this together? This would also help me understand your performance a bit
> better.
>
> Thanks,
> Kurt
>
>> A 85 Giga file with 18263 time steps (same number of attributes)
>> takes 29:38.52 to read in (half an hour). That's where it starts to hurt!
>> It should be noted that a simple h5dump also groans under that file
>> and won't
>> produce anything before several minutes (I am still waiting in fact).
>> (These figures are for our merlin00 machine at PSI.)
>>
>> Achim suggested to use H5PartGetStepAttribInfo instead of
>> H5PartReadStepAttrib,
>> but I don't see how I can replace the functionality of ReadStepAttrib
>> with
>> GetStepAttribInfo ,i.e. read a value...
>>
>> Regards,
>>
>> Thomas
>>
>> _______________________________________________
>> H5Part mailing list
>> H5Part AT lists.psi.ch
>> https://lists.web.psi.ch/mailman/listinfo/h5part
>>
>
>
>


--
Kurt Stockinger
Computational Research Division
Lawrence Berkeley National Laboratory
Mail Stop 50B-3238, 1 Cyclotron Road
Berkeley, California 94720, USA

Tel: +1 (510) 486 5208, Fax: +1 (510) 486 4004
email: KStockinger AT lbl.gov
http://sdm.lbl.gov/kurts/





Archive powered by MHonArc 2.6.19.

Top of Page