Skip to Content.
Sympa Menu

opal - Re: [Opal] Scaling OPAL to larger # of cores

opal AT lists.psi.ch

Subject: The OPAL Discussion Forum

List archive

Re: [Opal] Scaling OPAL to larger # of cores


Chronological Thread 
  • From: Nicole Neveu <nneveu AT hawk.iit.edu>
  • To: christof kraus <christof.j.kraus AT gmail.com>
  • Cc: opal <opal AT lists.psi.ch>
  • Subject: Re: [Opal] Scaling OPAL to larger # of cores
  • Date: Tue, 22 Nov 2016 10:11:24 -0600

Hi Christof, 

So after I looked at the log files, I realized the 6 min run time was for a different phase.
Attached is the timing info for the 16 and 32 core runs with all the same settings except the parallel direction. 
I get errors if I try to run 32 cores with the z direction parallel.

The difference surprised me: 
16 core = 5 min 14 sec
32 core = 5 min 05 sec

Thanks!

Nicole

On Tue, Nov 22, 2016 at 12:42 AM, christof kraus <christof.j.kraus AT gmail.com> wrote:
Hi Nicole,

could you send me the last few lines of the output (lines starting with 'Timing>') for both cases?

christof

On Mon, Nov 21, 2016 at 8:16 PM, Nicole Neveu <nneveu AT hawk.iit.edu> wrote:
Hi All, 

I have a question about scaling simulations up. What can I do to ensure a run on more cores is faster and efficient? Are there some general changes to the OPAL input file that help with the transition? i.e. SC grid, parallelization, etc.

When trying this on a small test case (16 to 32 cores), so far I only changed my parallelization (based on an old email from Andreas).

16 core run: 
FS_SC: Fieldsolver, FSTYPE = FFT, 
MX = 32, MY = 32, MT = 32,
PARFFTX = false, 
PARFFTY = false, 
PARFFTT = true,

32 core run: 
FS_SC: Fieldsolver, FSTYPE = FFT, 
MX = 32, MY = 32, MT = 32, 
PARFFTX = true, 
PARFFTY = true, 
PARFFTT = false,

I see a 1 minute difference in this case (6 min to 5 min).
What else could I do to help with the walltime?  
Or is this near the limit because the test problem is so small?

Thanks! 

Nicole


Timings{0}> -----------------------------------------------------------------
Timings{0}> Timing results for 32 nodes:
Timings{0}> -----------------------------------------------------------------
Timings{0}> mainTimer........... Wall tot = 300.758, CPU tot = 296.02
Timings{0}>
Timings{0}> AP: field evaluatio. Wall max = 0.004818, CPU max = 0.01
Timings{0}> Wall avg = 0.000150563, CPU avg = 0.0003125
Timings{0}> Wall min = 0, CPU min = 0
Timings{0}>
Timings{0}> AP: time integratio. Wall max = 0.007899, CPU max = 0
Timings{0}> Wall avg = 0.000246844, CPU avg = 0
Timings{0}> Wall min = 0, CPU min = 0
Timings{0}>
Timings{0}> Binaryrepart........ Wall max = 0.145679, CPU max = 0.16
Timings{0}> Wall avg = 0.144123, CPU avg = 0.123437
Timings{0}> Wall min = 0.134339, CPU min = 0.09
Timings{0}>
Timings{0}> Boundingbox......... Wall max = 6.62576, CPU max = 7.2
Timings{0}> Wall avg = 6.50314, CPU avg = 6.49094
Timings{0}> Wall min = 6.34977, CPU min = 5.86
Timings{0}>
Timings{0}> ComputePotential.... Wall max = 84.5662, CPU max = 84.58
Timings{0}> Wall avg = 83.737, CPU avg = 83.2778
Timings{0}> Wall min = 82.4327, CPU min = 80.67
Timings{0}>
Timings{0}> Create Distr........ Wall max = 8.01616, CPU max = 8
Timings{0}> Wall avg = 8.00246, CPU avg = 7.98938
Timings{0}> Wall min = 7.98854, CPU min = 7.98
Timings{0}>
Timings{0}> Fast inside test.... Wall max = 0, CPU max = 0
Timings{0}> Wall avg = 0, CPU avg = 0
Timings{0}> Wall min = 0, CPU min = 0
Timings{0}>
Timings{0}> Fieldeval........... Wall max = 11.8855, CPU max = 12.04
Timings{0}> Wall avg = 10.1749, CPU avg = 10.3362
Timings{0}> Wall min = 8.1999, CPU min = 8.3
Timings{0}>
Timings{0}> H5PartTimer......... Wall max = 9.48303, CPU max = 5.93
Timings{0}> Wall avg = 9.44561, CPU avg = 5.64875
Timings{0}> Wall min = 9.41114, CPU min = 5.26
Timings{0}>
Timings{0}> Histogram........... Wall max = 0, CPU max = 0
Timings{0}> Wall avg = 0, CPU avg = 0
Timings{0}> Wall min = 0, CPU min = 0
Timings{0}>
Timings{0}> Initialize geometry. Wall max = 0, CPU max = 0
Timings{0}> Wall avg = 0, CPU avg = 0
Timings{0}> Wall min = 0, CPU min = 0
Timings{0}>
Timings{0}> Inside test......... Wall max = 0, CPU max = 0
Timings{0}> Wall avg = 0, CPU avg = 0
Timings{0}> Wall min = 0, CPU min = 0
Timings{0}>
Timings{0}> Load Distr.......... Wall max = 0, CPU max = 0
Timings{0}> Wall avg = 0, CPU avg = 0
Timings{0}> Wall min = 0, CPU min = 0
Timings{0}>
Timings{0}> Particle Inside..... Wall max = 0, CPU max = 0
Timings{0}> Wall avg = 0, CPU avg = 0
Timings{0}> Wall min = 0, CPU min = 0
Timings{0}>
Timings{0}> Ray tracing......... Wall max = 0, CPU max = 0
Timings{0}> Wall avg = 0, CPU avg = 0
Timings{0}> Wall min = 0, CPU min = 0
Timings{0}>
Timings{0}> RC-CC1Dfft.......... Wall max = 4.68195, CPU max = 4.86
Timings{0}> Wall avg = 4.0449, CPU avg = 4.06781
Timings{0}> Wall min = 3.8521, CPU min = 3.66
Timings{0}>
Timings{0}> RC-CCtranspose...... Wall max = 50.3237, CPU max = 50.64
Timings{0}> Wall avg = 48.1107, CPU avg = 48.0537
Timings{0}> Wall min = 45.9717, CPU min = 44.72
Timings{0}>
Timings{0}> RC-RC1Dfft.......... Wall max = 2.97123, CPU max = 3.35
Timings{0}> Wall avg = 2.94408, CPU avg = 2.8425
Timings{0}> Wall min = 2.89989, CPU min = 2.41
Timings{0}>
Timings{0}> RC-RRtranspose...... Wall max = 15.3354, CPU max = 15.27
Timings{0}> Wall avg = 12.728, CPU avg = 12.6725
Timings{0}> Wall min = 9.676, CPU min = 9.29
Timings{0}>
Timings{0}> RC-total............ Wall max = 71.4212, CPU max = 71.41
Timings{0}> Wall avg = 68.2156, CPU avg = 67.9878
Timings{0}> Wall min = 64.627, CPU min = 63.88
Timings{0}>
Timings{0}> Secondary emission.. Wall max = 0, CPU max = 0
Timings{0}> Wall avg = 0, CPU avg = 0
Timings{0}> Wall min = 0, CPU min = 0
Timings{0}>
Timings{0}> SelfField total..... Wall max = 132.036, CPU max = 132.12
Timings{0}> Wall avg = 128.822, CPU avg = 128.557
Timings{0}> Wall min = 125.867, CPU min = 125.11
Timings{0}>
Timings{0}> SF: GreensFTotal.... Wall max = 47.2673, CPU max = 47.5
Timings{0}> Wall avg = 46.6207, CPU avg = 46.4644
Timings{0}> Wall min = 45.7379, CPU min = 45.5
Timings{0}>
Timings{0}> SF: IntGreenF1...... Wall max = 7.55262, CPU max = 6.68
Timings{0}> Wall avg = 5.9067, CPU avg = 5.73031
Timings{0}> Wall min = 5.4701, CPU min = 4.9
Timings{0}>
Timings{0}> SF: IntGreenF2...... Wall max = 10.5682, CPU max = 10.49
Timings{0}> Wall avg = 6.61645, CPU avg = 6.50031
Timings{0}> Wall min = 1.87389, CPU min = 1.68
Timings{0}>
Timings{0}> SF: IntGreenF3...... Wall max = 26.5837, CPU max = 26.77
Timings{0}> Wall avg = 25.0645, CPU avg = 25.0838
Timings{0}> Wall min = 23.1955, CPU min = 22.98
Timings{0}>
Timings{0}> SF: MirrorRho1...... Wall max = 8.29078, CPU max = 8.65
Timings{0}> Wall avg = 4.41462, CPU avg = 4.49937
Timings{0}> Wall min = 1.68001, CPU min = 1.68
Timings{0}>
Timings{0}> SF: MirrorRho2...... Wall max = 0.804512, CPU max = 0.76
Timings{0}> Wall avg = 0.513592, CPU avg = 0.485625
Timings{0}> Wall min = 0.239989, CPU min = 0.23
Timings{0}>
Timings{0}> SF: Potential....... Wall max = 92.7865, CPU max = 92.78
Timings{0}> Wall avg = 91.8932, CPU avg = 91.3728
Timings{0}> Wall min = 90.4418, CPU min = 88.74
Timings{0}>
Timings{0}> SF: ShIntGreenF1.... Wall max = 0.629902, CPU max = 0.62
Timings{0}> Wall avg = 0.479041, CPU avg = 0.479375
Timings{0}> Wall min = 0.446493, CPU min = 0.33
Timings{0}>
Timings{0}> SF: ShIntGreenF2.... Wall max = 0.63397, CPU max = 0.8
Timings{0}> Wall avg = 0.479042, CPU avg = 0.45125
Timings{0}> Wall min = 0.446217, CPU min = 0.33
Timings{0}>
Timings{0}> SF: ShIntGreenF3.... Wall max = 1.23779, CPU max = 1.31
Timings{0}> Wall avg = 0.947167, CPU avg = 0.985625
Timings{0}> Wall min = 0.497403, CPU min = 0.51
Timings{0}>
Timings{0}> SF: ShIntGreenF4.... Wall max = 2.33835, CPU max = 2.44
Timings{0}> Wall avg = 2.14176, CPU avg = 2.19562
Timings{0}> Wall min = 1.99385, CPU min = 1.92
Timings{0}>
Timings{0}> Statistics.......... Wall max = 22.8767, CPU max = 23.66
Timings{0}> Wall avg = 21.862, CPU avg = 21.9663
Timings{0}> Wall min = 20.6187, CPU min = 20.4
Timings{0}>
Timings{0}> StatMarkerTimer..... Wall max = 13.2213, CPU max = 12.76
Timings{0}> Wall avg = 9.83543, CPU avg = 9.90062
Timings{0}> Wall min = 9.55846, CPU min = 9.33
Timings{0}>
Timings{0}> TIntegration1....... Wall max = 17.974, CPU max = 18.65
Timings{0}> Wall avg = 17.7899, CPU avg = 17.7
Timings{0}> Wall min = 17.6067, CPU min = 16.63
Timings{0}>
Timings{0}> TIntegration1Loop1.. Wall max = 11.499, CPU max = 12.11
Timings{0}> Wall avg = 11.3574, CPU avg = 11.2572
Timings{0}> Wall min = 11.2614, CPU min = 10.65
Timings{0}>
Timings{0}> TIntegration1Loop2.. Wall max = 0, CPU max = 0
Timings{0}> Wall avg = 0, CPU avg = 0
Timings{0}> Wall min = 0, CPU min = 0
Timings{0}>
Timings{0}> TIntegration2....... Wall max = 22.1891, CPU max = 23.2
Timings{0}> Wall avg = 22.0634, CPU avg = 22.0503
Timings{0}> Wall min = 21.962, CPU min = 21.1
Timings{0}>
Timings{0}> TIntegration2Loop1.. Wall max = 11.3764, CPU max = 12.09
Timings{0}> Wall avg = 11.2245, CPU avg = 11.3209
Timings{0}> Wall min = 11.0921, CPU min = 10.39
Timings{0}>
Timings{0}> TIntegration2Loop2.. Wall max = 0, CPU max = 0
Timings{0}> Wall avg = 0, CPU avg = 0
Timings{0}> Wall min = 0, CPU min = 0
Timings{0}>
Timings{0}> WakeField........... Wall max = 0, CPU max = 0
Timings{0}> Wall avg = 0, CPU avg = 0
Timings{0}> Wall min = 0, CPU min = 0
Timings{0}>
Timings{0}> -----------------------------------------------------------------
Finished at: Mon Nov 21 13:05:10 CST 2016
##############=- Blues Job Resource Usage -=##############
Job ID: 1499055.bmgt1.lcrc.anl.gov
User ID: neveu
Group ID: collab
Job Name: optgun32coretest
Session ID: 20996
Resources: neednodes=2:ppn=16,nodes=2:ppn=16,size=32,walltime=00:10:00
Resources Used: cput=02:38:26,mem=12836kb,vmem=118776kb,walltime=00:05:05
Queue: batch
Account: AWA-beam-dynamics
##########################################################Timings{0}> -----------------------------------------------------------------
Timings{0}> Timing results for 16 nodes:
Timings{0}> -----------------------------------------------------------------
Timings{0}> mainTimer........... Wall tot = 310.754, CPU tot = 307.86
Timings{0}>
Timings{0}> AP: field evaluatio. Wall max = 0.004756, CPU max = 0.01
Timings{0}> Wall avg = 0.00029725, CPU avg = 0.000625
Timings{0}> Wall min = 0, CPU min = 0
Timings{0}>
Timings{0}> AP: time integratio. Wall max = 0.007864, CPU max = 0
Timings{0}> Wall avg = 0.0004915, CPU avg = 0
Timings{0}> Wall min = 0, CPU min = 0
Timings{0}>
Timings{0}> Binaryrepart........ Wall max = 0.061988, CPU max = 0.07
Timings{0}> Wall avg = 0.059573, CPU avg = 0.045
Timings{0}> Wall min = 0.051421, CPU min = 0.03
Timings{0}>
Timings{0}> Boundingbox......... Wall max = 7.39124, CPU max = 7.54
Timings{0}> Wall avg = 7.04649, CPU avg = 6.92375
Timings{0}> Wall min = 6.77905, CPU min = 6.5
Timings{0}>
Timings{0}> ComputePotential.... Wall max = 72.3649, CPU max = 72.74
Timings{0}> Wall avg = 71.3606, CPU avg = 71.2787
Timings{0}> Wall min = 70.2156, CPU min = 69.7
Timings{0}>
Timings{0}> Create Distr........ Wall max = 3.46927, CPU max = 3.42
Timings{0}> Wall avg = 3.42905, CPU avg = 3.415
Timings{0}> Wall min = 3.41356, CPU min = 3.41
Timings{0}>
Timings{0}> Fast inside test.... Wall max = 0, CPU max = 0
Timings{0}> Wall avg = 0, CPU avg = 0
Timings{0}> Wall min = 0, CPU min = 0
Timings{0}>
Timings{0}> Fieldeval........... Wall max = 32.995, CPU max = 33.2
Timings{0}> Wall avg = 18.5241, CPU avg = 18.4519
Timings{0}> Wall min = 0.67601, CPU min = 0.65
Timings{0}>
Timings{0}> H5PartTimer......... Wall max = 3.63982, CPU max = 2.09
Timings{0}> Wall avg = 3.63328, CPU avg = 1.94438
Timings{0}> Wall min = 3.62535, CPU min = 1.76
Timings{0}>
Timings{0}> Histogram........... Wall max = 0, CPU max = 0
Timings{0}> Wall avg = 0, CPU avg = 0
Timings{0}> Wall min = 0, CPU min = 0
Timings{0}>
Timings{0}> Initialize geometry. Wall max = 0, CPU max = 0
Timings{0}> Wall avg = 0, CPU avg = 0
Timings{0}> Wall min = 0, CPU min = 0
Timings{0}>
Timings{0}> Inside test......... Wall max = 0, CPU max = 0
Timings{0}> Wall avg = 0, CPU avg = 0
Timings{0}> Wall min = 0, CPU min = 0
Timings{0}>
Timings{0}> Load Distr.......... Wall max = 0, CPU max = 0
Timings{0}> Wall avg = 0, CPU avg = 0
Timings{0}> Wall min = 0, CPU min = 0
Timings{0}>
Timings{0}> Particle Inside..... Wall max = 0, CPU max = 0
Timings{0}> Wall avg = 0, CPU avg = 0
Timings{0}> Wall min = 0, CPU min = 0
Timings{0}>
Timings{0}> Ray tracing......... Wall max = 0, CPU max = 0
Timings{0}> Wall avg = 0, CPU avg = 0
Timings{0}> Wall min = 0, CPU min = 0
Timings{0}>
Timings{0}> RC-CC1Dfft.......... Wall max = 8.45858, CPU max = 9.14
Timings{0}> Wall avg = 7.96502, CPU avg = 8.09937
Timings{0}> Wall min = 7.71823, CPU min = 7.5
Timings{0}>
Timings{0}> RC-CCtranspose...... Wall max = 26.2579, CPU max = 26.32
Timings{0}> Wall avg = 23.7507, CPU avg = 23.3513
Timings{0}> Wall min = 21.7257, CPU min = 21.04
Timings{0}>
Timings{0}> RC-RC1Dfft.......... Wall max = 5.899, CPU max = 6.17
Timings{0}> Wall avg = 5.83536, CPU avg = 5.92375
Timings{0}> Wall min = 5.76832, CPU min = 5.39
Timings{0}>
Timings{0}> RC-RRtranspose...... Wall max = 13.3477, CPU max = 13.47
Timings{0}> Wall avg = 9.93063, CPU avg = 10.0519
Timings{0}> Wall min = 7.34289, CPU min = 6.94
Timings{0}>
Timings{0}> RC-total............ Wall max = 50.5179, CPU max = 50.17
Timings{0}> Wall avg = 47.8983, CPU avg = 47.8287
Timings{0}> Wall min = 45.5044, CPU min = 45.45
Timings{0}>
Timings{0}> Secondary emission.. Wall max = 0, CPU max = 0
Timings{0}> Wall avg = 0, CPU avg = 0
Timings{0}> Wall min = 0, CPU min = 0
Timings{0}>
Timings{0}> SelfField total..... Wall max = 144.456, CPU max = 144.53
Timings{0}> Wall avg = 133.723, CPU avg = 133.706
Timings{0}> Wall min = 119.422, CPU min = 119.27
Timings{0}>
Timings{0}> SF: GreensFTotal.... Wall max = 44.9175, CPU max = 45.08
Timings{0}> Wall avg = 43.8023, CPU avg = 43.6519
Timings{0}> Wall min = 41.7957, CPU min = 41.93
Timings{0}>
Timings{0}> SF: IntGreenF1...... Wall max = 16.4665, CPU max = 16.23
Timings{0}> Wall avg = 12.5308, CPU avg = 12.5556
Timings{0}> Wall min = 12.1118, CPU min = 11.56
Timings{0}>
Timings{0}> SF: IntGreenF2...... Wall max = 9.76892, CPU max = 10.18
Timings{0}> Wall avg = 6.61787, CPU avg = 6.56812
Timings{0}> Wall min = 2.98334, CPU min = 2.96
Timings{0}>
Timings{0}> SF: IntGreenF3...... Wall max = 18.4099, CPU max = 18.69
Timings{0}> Wall avg = 15.6008, CPU avg = 15.5406
Timings{0}> Wall min = 14.2737, CPU min = 13.87
Timings{0}>
Timings{0}> SF: MirrorRho1...... Wall max = 9.23684, CPU max = 8.84
Timings{0}> Wall avg = 4.31382, CPU avg = 4.2725
Timings{0}> Wall min = 1.83326, CPU min = 1.91
Timings{0}>
Timings{0}> SF: MirrorRho2...... Wall max = 0.962362, CPU max = 1.05
Timings{0}> Wall avg = 0.203628, CPU avg = 0.20625
Timings{0}> Wall min = 0.06624, CPU min = 0.04
Timings{0}>
Timings{0}> SF: Potential....... Wall max = 79.9031, CPU max = 80.22
Timings{0}> Wall avg = 78.8561, CPU avg = 78.7481
Timings{0}> Wall min = 77.5925, CPU min = 77.16
Timings{0}>
Timings{0}> SF: ShIntGreenF1.... Wall max = 1.35594, CPU max = 1.48
Timings{0}> Wall avg = 0.940222, CPU avg = 0.95125
Timings{0}> Wall min = 0.900464, CPU min = 0.83
Timings{0}>
Timings{0}> SF: ShIntGreenF2.... Wall max = 1.37591, CPU max = 1.4
Timings{0}> Wall avg = 0.938656, CPU avg = 0.89125
Timings{0}> Wall min = 0.903518, CPU min = 0.7
Timings{0}>
Timings{0}> SF: ShIntGreenF3.... Wall max = 1.49185, CPU max = 1.62
Timings{0}> Wall avg = 1.29236, CPU avg = 1.29125
Timings{0}> Wall min = 0.497132, CPU min = 0.44
Timings{0}>
Timings{0}> SF: ShIntGreenF4.... Wall max = 1.37615, CPU max = 1.58
Timings{0}> Wall avg = 1.30332, CPU avg = 1.31063
Timings{0}> Wall min = 1.16954, CPU min = 1.06
Timings{0}>
Timings{0}> Statistics.......... Wall max = 29.3409, CPU max = 29.37
Timings{0}> Wall avg = 24.1167, CPU avg = 24.0119
Timings{0}> Wall min = 20.1437, CPU min = 19.55
Timings{0}>
Timings{0}> StatMarkerTimer..... Wall max = 11.2081, CPU max = 11.12
Timings{0}> Wall avg = 8.25124, CPU avg = 8.13312
Timings{0}> Wall min = 8.01322, CPU min = 7.35
Timings{0}>
Timings{0}> TIntegration1....... Wall max = 29.1665, CPU max = 29.71
Timings{0}> Wall avg = 29.1125, CPU avg = 28.8856
Timings{0}> Wall min = 29.0189, CPU min = 28.41
Timings{0}>
Timings{0}> TIntegration1Loop1.. Wall max = 22.4934, CPU max = 22.97
Timings{0}> Wall avg = 22.2278, CPU avg = 22.1019
Timings{0}> Wall min = 21.7869, CPU min = 21.13
Timings{0}>
Timings{0}> TIntegration1Loop2.. Wall max = 0, CPU max = 0
Timings{0}> Wall avg = 0, CPU avg = 0
Timings{0}> Wall min = 0, CPU min = 0
Timings{0}>
Timings{0}> TIntegration2....... Wall max = 38.7885, CPU max = 39.27
Timings{0}> Wall avg = 38.4704, CPU avg = 38.5738
Timings{0}> Wall min = 38.0679, CPU min = 37.83
Timings{0}>
Timings{0}> TIntegration2Loop1.. Wall max = 22.4379, CPU max = 23.18
Timings{0}> Wall avg = 22.1603, CPU avg = 22.3675
Timings{0}> Wall min = 21.7358, CPU min = 21.53
Timings{0}>
Timings{0}> TIntegration2Loop2.. Wall max = 0, CPU max = 0
Timings{0}> Wall avg = 0, CPU avg = 0
Timings{0}> Wall min = 0, CPU min = 0
Timings{0}>
Timings{0}> WakeField........... Wall max = 0, CPU max = 0
Timings{0}> Wall avg = 0, CPU avg = 0
Timings{0}> Wall min = 0, CPU min = 0
Timings{0}>
Timings{0}> -----------------------------------------------------------------
Finished at: Tue Nov 22 09:47:26 CST 2016
##############=- Blues Job Resource Usage -=##############
Job ID: 1500310.bmgt1.lcrc.anl.gov
User ID: neveu
Group ID: collab
Job Name: optgun16coretest
Session ID: 87493
Resources: neednodes=1:ppn=16,nodes=1:ppn=16,size=16,walltime=00:10:00
Resources Used: cput=01:22:10,mem=8640kb,vmem=76748kb,walltime=00:05:14
Queue: sSerial
Account: AWA-beam-dynamics
##########################################################


Archive powered by MHonArc 2.6.19.

Top of Page