Skip to Content.
Sympa Menu

h5part - [H5part] H5Part performance problem

h5part AT lists.psi.ch

Subject: H5Part development and discussion

List archive

[H5part] H5Part performance problem


Chronological Thread 
  • From: Thomas Schietinger <thomas.schietinger AT psi.ch>
  • To: h5part AT lists.psi.ch
  • Subject: [H5part] H5Part performance problem
  • Date: Tue, 05 Dec 2006 10:58:44 +0100
  • List-archive: <https://lists.web.psi.ch/pipermail/h5part/>
  • List-id: H5Part development and discussion <h5part.lists.psi.ch>

Dear all,

let me add some more info on the performance problem reported
at yesterday's phone meeting.

The routines I use for reading in an HDF5 file in H5root are:
(see file TH5Dataset.cc)

get the file handle:

fH5File = H5PartOpenFile(fFullFilename.Data(),H5PART_READ);

(fFullFilename.Data() evaluates to a char string)

get the number of datasets:

H5PartSetStep(fH5File,0);
Int_t nDataset = H5PartGetNumDatasets(fH5File);

get the dataset names:

for(Int_t i=0;i<nDataset;i++){
H5PartGetDatasetName(fH5File,i,name,maxLength);
...
}

get the number of steps, step attributes and file attributes:

fNStep = H5PartGetNumSteps(fH5File);
fNStepAttr = H5PartGetNumStepAttribs(fH5File);
fNFileAttr = H5PartGetNumFileAttribs(fH5File);

get file attribute info:

for (Int_t n = 0; n < fNFileAttr; n++) {
H5PartGetFileAttribInfo(fH5File,n,an,32,0,nElem);
...
}

get step attribute info for step 0 - ASSUME THEY WILL BE THE SAME
FOR ALL STEPS!

H5PartSetStep(fH5File,0);

for (Int_t n=0; n<fNStepAttr; ++n) {
H5PartGetStepAttribInfo(fH5File,n,an,32,0,nElem);
...
// distinguish scalar and vector attributes, ignore others:
TString type("?");
if (*nElem==1 ) { // scalar variable
fScalarName.AddLast(new TObjString(attrName));
fNScalarAttr++;
type = TString(" (scalar)");
} else if (*nElem==3 ) { // vector variable
fVectorName.AddLast(new TObjString(attrName));
fNVectorAttr++;
type = TString(" (vector)");
} else {
...
}

Read in the unit strings for the attributes found:

for (Int_t i = 0; i < 9; i++) fPartVarUnit[i] = GetUnit(fPartVarName[i]);
for (Int_t i = 0; i < fNScalarAttr; i++)
fScalarUnit.AddLast(new TObjString(GetUnit(static_cast<TObjString*>

(fScalarName.At(i))->GetString())));
for (Int_t i = 0; i < fNVectorAttr; i++)
fVectorUnit.AddLast(new TObjString(GetUnit(static_cast<TObjString*>

(fVectorName.At(i))->GetString())));

where GetUnit(...) is a method that contains a loop over the file attributes
to
find the associated unit name:

for (Int_t i = 0; i < fNFileAttr; i++) {

TObjString* s = static_cast<TObjString*>(fFileAttr.At(i));
if (s->GetString() == varNameU) {
char u[32];
H5PartReadFileAttrib(fH5File,const_cast<char*>(s->GetString().Data()),
&u);
unit = TString(u);
}
...
}

Now loop over steps and retrieve attribute and particle data for each step:

for (Int_t step=0; step<fNStep; step++) {
H5PartSetStep(fH5File,step);
unsigned long n = H5PartGetNumParticles(fH5File);

// read in scalar attributes
for (int i=0; i<fNScalarAttr; ++i) {
TObjString* s = static_cast<TObjString*>(fScalarName.At(i));
H5PartReadStepAttrib(fH5File,const_cast<char*>(s->GetString().Data()),
&val);
...
}

// read in vector attributes
for (int i=0; i<fNVectorAttr; ++i) {
TObjString* s = static_cast<TObjString*>(fVectorName.At(i));
H5PartReadStepAttrib(fH5File,const_cast<char*>(s->GetString().Data()),
arr);
...
}

Sorry if this is too much information, I just wanted to let you know what
routines
I am using, so you can see if I am doing something very inefficient. I have
not
given much thought to the choice of routines, I just looked for what I needed,
typically found it rather quickly and then only made sure it does what I want
it
to do without measuring performance or anything. The files I look at load very
quickly anyway, it is only when Andreas tries to simulate the whole world that
he ends up waiting a couple of minutes ;-)

Now some numbers: a 3 Giga file with some 500 time steps (21 file attr., 12
step
attr.) takes 19.1 s to read in (the second time only 0.5 s since the file is
buffered). A 85 Giga file with 18263 time steps (same number of attributes)
takes 29:38.52 to read in (half an hour). That's where it starts to hurt!
It should be noted that a simple h5dump also groans under that file and won't
produce anything before several minutes (I am still waiting in fact).
(These figures are for our merlin00 machine at PSI.)

Achim suggested to use H5PartGetStepAttribInfo instead of
H5PartReadStepAttrib,
but I don't see how I can replace the functionality of ReadStepAttrib with
GetStepAttribInfo ,i.e. read a value...

Regards,

Thomas





Archive powered by MHonArc 2.6.19.

Top of Page