Skip to Content.
Sympa Menu

opal - Re: [Opal] overhead

opal AT lists.psi.ch

Subject: The OPAL Discussion Forum

List archive

Re: [Opal] overhead


Chronological Thread 
  • From: "Adelmann Andreas (PSI)" <andreas.adelmann AT psi.ch>
  • To: Nicole Neveu <nneveu AT hawk.iit.edu>
  • Cc: "opal AT lists.psi.ch" <opal AT lists.psi.ch>, Christof Metzger-Kraus <christof.j.kraus AT gmail.com>
  • Subject: Re: [Opal] overhead
  • Date: Tue, 17 Jul 2018 05:48:07 +0000
  • Accept-language: en-US, de-CH
  • Authentication-results: localhost; dmarc=skipped

Just for the record: 

Wall time is ~ 30% larger than Cpu time only on KNL. This hints to a KNL problem,
hopefully we will hear soon from the ANL consultants.

Cheers Andreas 



------
Dr. sc. math. Andreas (Andy) Adelmann
Head a.i. Labor for Scientific Computing and Modelling 
Paul Scherrer Institut OHSA/D17 CH-5232 Villigen PSI
Phone Office: xx41 56 310 42 33 Fax: xx41 56 310 31 91
Phone Home: xx41 62 891 91 44
-------------------------------------------------------
Friday: ETH HPK G 28   +41 44 632 75 22
============================================
The more exotic, the more abstract the knowledge, 
the more profound will be its consequences.
Leon Lederman 
============================================

On 17 Jul 2018, at 00:11, Nicole Neveu <nneveu AT hawk.iit.edu> wrote:

​Hi Christof and Andreas,

Hdf5 output is off for those runs. I​ was hoping it was the flag...

I rebuilt making sure -O3 was forced, and didn't see any timing difference there.
I'm starting to suspect my physics setup, but also noticed something weird.
BDW's do not have the same problem/overhead.
It's only on KNL. Does this point to compiler or tool chain?

Also, none of the values in the KNL timing file seem odd except the main timer (see attached file).
I think it's safe to say it's not stat or h5 write.
What could be slowing down on KNL only?

Thanks,

Nicole


On Sat, Jul 14, 2018 at 5:59 AM, Adelmann Andreas (PSI) <andreas.adelmann AT psi.ch> wrote:
I agree the space charge solver has not changed. 
Please check the compiler flags and make sure you compile with -O3 
Cheers Andreas 
------
Dr. sc. math. Andreas (Andy) Adelmann
Head a.i. Labor for Scientific Computing and Modelling 
Paul Scherrer Institut OHSA/D17 CH-5232 Villigen PSI
Phone Office: xx41 56 310 42 33 Fax: xx41 56 310 31 91
Phone Home: xx41 62 891 91 44
-------------------------------------------------------
Friday: ETH HPK G 28   +41 44 632 75 22
============================================
The more exotic, the more abstract the knowledge, 
the more profound will be its consequences.
Leon Lederman 
============================================



On 13 Jul 2018, at 08:22, Christof Metzger-Kraus <christof.j.kraus AT gmail.com> wrote:

Hi Nicole, 

I have to analyze your numbers in detail. Your description suggests that the space charge solver behaves different in version 2.0. As far as I know this part of the code is still the same as in version 1.9 

Christof 


Nicole Neveu <nneveu AT hawk.iit.edu> schrieb am Fr., 13. Juli 2018, 06:28:
Hi All,

I typically run small cases on 8 cores, and I'm noticing more overhead on 2.0 than 1.9 for xy parallelization:

1.9 vs 2.0 xy parallelization on 8 cores:
[neveu@beboplogin1 old_opal]$ grep --include=*.out -rnw './' -e "mainTimer"
2.0-> ./standalone_newopal/optLinac_40nC.569053.knld-0046.out:315:Timings{0}> mainTimer........... Wall tot =    570.886, CPU tot =     348.68
1.9-> ./standalone_myopal/optLinac_40nC.569040.knld-0037.out:295:Timings{0}> mainTimer........... Wall tot =    313.199, CPU tot =     313.04


2.0 z parallel only:
[neveu@beboplogin1 rand_sample_small_test]$ grep --include=*.out -rnw './' -e "mainTimer"
./optLinac_40nC_CORES=1/optLinac_40nC.569081.knld-0043.out:217:Timings> mainTimer........... Wall tot =     938.19, CPU tot =     935.99
./optLinac_40nC_CORES=2/optLinac_40nC.569080.knld-0042.out:303:Timings{0}> mainTimer........... Wall tot =    632.148, CPU tot =     631.86
./optLinac_40nC_CORES=4/optLinac_40nC.569079.knld-0041.out:307:Timings{0}> mainTimer........... Wall tot =    491.218, CPU tot =     432.94
./optLinac_40nC_CORES=8/optLinac_40nC.569078.knld-0037.out:315:Timings{0}> mainTimer........... Wall tot =    504.177, CPU tot =     329.67

2.0 xy parallel only:
./optLinac_40nC_CORES=1/optLinac_40nC.569059.knld-0041.out:217:Timings> mainTimer........... Wall tot =    934.293, CPU tot =      930.2
./optLinac_40nC_CORES=2/optLinac_40nC.569050.knld-0043.out:303:Timings{0}> mainTimer........... Wall tot =    654.175, CPU tot =     654.01
./optLinac_40nC_CORES=4/optLinac_40nC.569049.knld-0042.out:307:Timings{0}> mainTimer........... Wall tot =    522.647, CPU tot =     461.41
./optLinac_40nC_CORES=8/optLinac_40nC.569048.knld-0041.out:315:Timings{0}> mainTimer........... Wall tot =    520.827, CPU tot =     383.14


Did something change with the way decomposition is done?

Thanks,

Nicole



On Sat, Jul 14, 2018 at 5:59 AM, Adelmann Andreas (PSI) <andreas.adelmann AT psi.ch> wrote:
I agree the space charge solver has not changed. 
Please check the compiler flags and make sure you compile with -O3 
Cheers Andreas 
------
Dr. sc. math. Andreas (Andy) Adelmann
Head a.i. Labor for Scientific Computing and Modelling 
Paul Scherrer Institut OHSA/D17 CH-5232 Villigen PSI
Phone Office: xx41 56 310 42 33 Fax: xx41 56 310 31 91
Phone Home: xx41 62 891 91 44
-------------------------------------------------------
Friday: ETH HPK G 28   +41 44 632 75 22
============================================
The more exotic, the more abstract the knowledge, 
the more profound will be its consequences.
Leon Lederman 
============================================



On 13 Jul 2018, at 08:22, Christof Metzger-Kraus <christof.j.kraus AT gmail.com> wrote:

Hi Nicole, 

I have to analyze your numbers in detail. Your description suggests that the space charge solver behaves different in version 2.0. As far as I know this part of the code is still the same as in version 1.9 

Christof 


Nicole Neveu <nneveu AT hawk.iit.edu> schrieb am Fr., 13. Juli 2018, 06:28:
Hi All,

I typically run small cases on 8 cores, and I'm noticing more overhead on 2.0 than 1.9 for xy parallelization:

1.9 vs 2.0 xy parallelization on 8 cores:
[neveu@beboplogin1 old_opal]$ grep --include=*.out -rnw './' -e "mainTimer"
2.0-> ./standalone_newopal/optLinac_40nC.569053.knld-0046.out:315:Timings{0}> mainTimer........... Wall tot =    570.886, CPU tot =     348.68
1.9-> ./standalone_myopal/optLinac_40nC.569040.knld-0037.out:295:Timings{0}> mainTimer........... Wall tot =    313.199, CPU tot =     313.04


2.0 z parallel only:
[neveu@beboplogin1 rand_sample_small_test]$ grep --include=*.out -rnw './' -e "mainTimer"
./optLinac_40nC_CORES=1/optLinac_40nC.569081.knld-0043.out:217:Timings> mainTimer........... Wall tot =     938.19, CPU tot =     935.99
./optLinac_40nC_CORES=2/optLinac_40nC.569080.knld-0042.out:303:Timings{0}> mainTimer........... Wall tot =    632.148, CPU tot =     631.86
./optLinac_40nC_CORES=4/optLinac_40nC.569079.knld-0041.out:307:Timings{0}> mainTimer........... Wall tot =    491.218, CPU tot =     432.94
./optLinac_40nC_CORES=8/optLinac_40nC.569078.knld-0037.out:315:Timings{0}> mainTimer........... Wall tot =    504.177, CPU tot =     329.67

2.0 xy parallel only:
./optLinac_40nC_CORES=1/optLinac_40nC.569059.knld-0041.out:217:Timings> mainTimer........... Wall tot =    934.293, CPU tot =      930.2
./optLinac_40nC_CORES=2/optLinac_40nC.569050.knld-0043.out:303:Timings{0}> mainTimer........... Wall tot =    654.175, CPU tot =     654.01
./optLinac_40nC_CORES=4/optLinac_40nC.569049.knld-0042.out:307:Timings{0}> mainTimer........... Wall tot =    522.647, CPU tot =     461.41
./optLinac_40nC_CORES=8/optLinac_40nC.569048.knld-0041.out:315:Timings{0}> mainTimer........... Wall tot =    520.827, CPU tot =     383.14


Did something change with the way decomposition is done?

Thanks,

Nicole


<timing.dat>




Archive powered by MHonArc 2.6.19.

Top of Page