opal AT lists.psi.ch
Subject: The OPAL Discussion Forum
List archive
- From: Nicole Neveu <nneveu AT hawk.iit.edu>
- To: "Adelmann Andreas (PSI)" <andreas.adelmann AT psi.ch>
- Cc: opal <opal AT lists.psi.ch>, Christof Metzger-Kraus <christof.j.kraus AT gmail.com>
- Subject: Re: [Opal] overhead
- Date: Mon, 16 Jul 2018 17:11:34 -0500
Hi Christof and Andreas,
Hdf5 output is off for those runs. I was hoping it was the flag...
I rebuilt making sure -O3 was forced, and didn't see any timing difference there.
I'm starting to suspect my physics setup, but also noticed something weird.
BDW's do not have the same problem/overhead.
It's only on KNL. Does this point to compiler or tool chain?
Also, none of the values in the KNL timing file seem odd except the main timer (see attached file).
I think it's safe to say it's not stat or h5 write.
What could be slowing down on KNL only?
Thanks,
Nicole
On Sat, Jul 14, 2018 at 5:59 AM, Adelmann Andreas (PSI) <andreas.adelmann AT psi.ch> wrote:
I agree the space charge solver has not changed.Please check the compiler flags and make sure you compile with -O3Cheers Andreas
------
Dr. sc. math. Andreas (Andy) Adelmann
Head a.i. Labor for Scientific Computing and Modelling
Paul Scherrer Institut OHSA/D17 CH-5232 Villigen PSI
Phone Office: xx41 56 310 42 33 Fax: xx41 56 310 31 91
Phone Home: xx41 62 891 91 44
-------------------------------------------------------
Friday: ETH HPK G 28 +41 44 632 75 22
============================================
The more exotic, the more abstract the knowledge,
the more profound will be its consequences.
Leon Lederman
============================================
On 13 Jul 2018, at 08:22, Christof Metzger-Kraus <christof.j.kraus AT gmail.com> wrote:
Hi Nicole,
I have to analyze your numbers in detail. Your description suggests that the space charge solver behaves different in version 2.0. As far as I know this part of the code is still the same as in version 1.9
Christof
Nicole Neveu <nneveu AT hawk.iit.edu> schrieb am Fr., 13. Juli 2018, 06:28:
Hi All,
I typically run small cases on 8 cores, and I'm noticing more overhead on 2.0 than 1.9 for xy parallelization:
1.9 vs 2.0 xy parallelization on 8 cores:
[neveu@beboplogin1 old_opal]$ grep --include=*.out -rnw './' -e "mainTimer"
2.0-> ./standalone_newopal/optLinac_40nC.569053.knld-0046.out:315:Timings{0}> mainTimer........... Wall tot = 570.886, CPU tot = 348.68
1.9-> ./standalone_myopal/optLinac_40nC.569040.knld-0037.out:295:Timings{0}> mainTimer........... Wall tot = 313.199, CPU tot = 313.04
2.0 z parallel only:
[neveu@beboplogin1 rand_sample_small_test]$ grep --include=*.out -rnw './' -e "mainTimer"./optLinac_40nC_CORES=1/optLinac_40nC.569081.knld-0043.out:217:Timings> mainTimer........... Wall tot = 938.19, CPU tot = 935.99
./optLinac_40nC_CORES=2/optLinac_40nC.569080.knld-0042.out:303:Timings{0}> mainTimer........... Wall tot = 632.148, CPU tot = 631.86
./optLinac_40nC_CORES=4/optLinac_40nC.569079.knld-0041.out:307:Timings{0}> mainTimer........... Wall tot = 491.218, CPU tot = 432.94
./optLinac_40nC_CORES=8/optLinac_40nC.569078.knld-0037.out:315:Timings{0}> mainTimer........... Wall tot = 504.177, CPU tot = 329.67
2.0 xy parallel only:./optLinac_40nC_CORES=1/optLinac_40nC.569059.knld-0041.out:217:Timings> mainTimer........... Wall tot = 934.293, CPU tot = 930.2
./optLinac_40nC_CORES=2/optLinac_40nC.569050.knld-0043.out:303:Timings{0}> mainTimer........... Wall tot = 654.175, CPU tot = 654.01
./optLinac_40nC_CORES=4/optLinac_40nC.569049.knld-0042.out:307:Timings{0}> mainTimer........... Wall tot = 522.647, CPU tot = 461.41
./optLinac_40nC_CORES=8/optLinac_40nC.569048.knld-0041.out:315:Timings{0}> mainTimer........... Wall tot = 520.827, CPU tot = 383.14
Did something change with the way decomposition is done?
Thanks,
Nicole
On Sat, Jul 14, 2018 at 5:59 AM, Adelmann Andreas (PSI) <andreas.adelmann AT psi.ch> wrote:
I agree the space charge solver has not changed.Please check the compiler flags and make sure you compile with -O3Cheers Andreas
------
Dr. sc. math. Andreas (Andy) Adelmann
Head a.i. Labor for Scientific Computing and Modelling
Paul Scherrer Institut OHSA/D17 CH-5232 Villigen PSI
Phone Office: xx41 56 310 42 33 Fax: xx41 56 310 31 91
Phone Home: xx41 62 891 91 44
-------------------------------------------------------
Friday: ETH HPK G 28 +41 44 632 75 22
============================================
The more exotic, the more abstract the knowledge,
the more profound will be its consequences.
Leon Lederman
============================================
On 13 Jul 2018, at 08:22, Christof Metzger-Kraus <christof.j.kraus AT gmail.com> wrote:
Hi Nicole,
I have to analyze your numbers in detail. Your description suggests that the space charge solver behaves different in version 2.0. As far as I know this part of the code is still the same as in version 1.9
Christof
Nicole Neveu <nneveu AT hawk.iit.edu> schrieb am Fr., 13. Juli 2018, 06:28:
Hi All,
I typically run small cases on 8 cores, and I'm noticing more overhead on 2.0 than 1.9 for xy parallelization:
1.9 vs 2.0 xy parallelization on 8 cores:
[neveu@beboplogin1 old_opal]$ grep --include=*.out -rnw './' -e "mainTimer"
2.0-> ./standalone_newopal/optLinac_40nC.569053.knld-0046.out:315:Timings{0}> mainTimer........... Wall tot = 570.886, CPU tot = 348.68
1.9-> ./standalone_myopal/optLinac_40nC.569040.knld-0037.out:295:Timings{0}> mainTimer........... Wall tot = 313.199, CPU tot = 313.04
2.0 z parallel only:
[neveu@beboplogin1 rand_sample_small_test]$ grep --include=*.out -rnw './' -e "mainTimer"./optLinac_40nC_CORES=1/optLinac_40nC.569081.knld-0043.out:217:Timings> mainTimer........... Wall tot = 938.19, CPU tot = 935.99
./optLinac_40nC_CORES=2/optLinac_40nC.569080.knld-0042.out:303:Timings{0}> mainTimer........... Wall tot = 632.148, CPU tot = 631.86
./optLinac_40nC_CORES=4/optLinac_40nC.569079.knld-0041.out:307:Timings{0}> mainTimer........... Wall tot = 491.218, CPU tot = 432.94
./optLinac_40nC_CORES=8/optLinac_40nC.569078.knld-0037.out:315:Timings{0}> mainTimer........... Wall tot = 504.177, CPU tot = 329.67
2.0 xy parallel only:./optLinac_40nC_CORES=1/optLinac_40nC.569059.knld-0041.out:217:Timings> mainTimer........... Wall tot = 934.293, CPU tot = 930.2
./optLinac_40nC_CORES=2/optLinac_40nC.569050.knld-0043.out:303:Timings{0}> mainTimer........... Wall tot = 654.175, CPU tot = 654.01
./optLinac_40nC_CORES=4/optLinac_40nC.569049.knld-0042.out:307:Timings{0}> mainTimer........... Wall tot = 522.647, CPU tot = 461.41
./optLinac_40nC_CORES=8/optLinac_40nC.569048.knld-0041.out:315:Timings{0}> mainTimer........... Wall tot = 520.827, CPU tot = 383.14
Did something change with the way decomposition is done?
Thanks,
Nicole
Attachment:
timing.dat
Description: Binary data
- [Opal] overhead, Nicole Neveu, 07/13/2018
- Re: [Opal] overhead, Christof Metzger-Kraus, 07/13/2018
- Re: [Opal] overhead, Adelmann Andreas (PSI), 07/14/2018
- Re: [Opal] overhead, Nicole Neveu, 07/17/2018
- Re: [Opal] overhead, Adelmann Andreas (PSI), 07/17/2018
- Re: [Opal] overhead, Nicole Neveu, 07/17/2018
- Re: [Opal] overhead, Adelmann Andreas (PSI), 07/14/2018
- Re: [Opal] overhead, Christof Metzger-Kraus, 07/13/2018
Archive powered by MHonArc 2.6.19.