opal AT lists.psi.ch

Subject: The OPAL Discussion Forum

List archive

Re: [Opal] Re: Enabling Parallel Computing on OPAL-T

From: Andreas Adelmann <andreas.adelmann AT psi.ch>
To: Norman Huang <norman-huang AT hotmail.com>
Cc: <christof.j.kraus AT gmail.com>, <opal AT lists.psi.ch>, Yi-Nong Rao <raoyn AT triumf.ca>, Rick Bartman <krab AT triumf.ca>, Thomas Planche <tplanche AT triumf.ca>, <fwj AT triumf.ca>
Subject: Re: [Opal] Re: Enabling Parallel Computing on OPAL-T
Date: Thu, 1 Dec 2011 23:18:10 +0100
List-archive: <https://lists.web.psi.ch/pipermail/opal/>
List-id: The OPAL Discussion Forum <opal.lists.psi.ch>

Hello Norman,

yes the number of cores has to scale with the problem size

if you want to see speedup. This is also known as weak scaling.

All FFT (or spectral methods) follow this principe. Furthermore

because of the FFT involved, in the parallel case you need to have

a strong interconnect w.r.t. the bandwidth. This is because you need

to transpose the matrix.

If you tell me the initial beam condition, I can give you a

setup of the grid size and number of particles.

Cheers Andreas

On Dec 1, 2011, at 9:22 PM, Norman Huang wrote:

Hi, thanks for all the responses.

trcompdev.triumf.ca has 8 cores, and trcomp01.triumf.ca has 24 cores.

I ran a few more tests, and it seems to me like the effectiveness of the
number of cores depend on the grid size and possibly the number of
particles. For a 32x32x2048 grid with 50000 particles, the optimal
configuration was 4 cores. Specifically, a short tracking test produced
these results:

1 Core: 8:30min
2 Cores: 6:57min
4 Cores: 5:38min
8 Cores: 5:40min
16 Cores: 7:40min

Then I did the same test, but reduced the grid sizing to 16x16x16,
and I get:

1 Core: 3:00min
2 Cores: 4:12min
4 Cores: 8:01min

My guess is that the number of cores to use should scale according
to the calculation workload, if that makes sense?

Regards,
Norman

> Date: Thu, 1 Dec 2011 08:55:52 +0100
> Subject: Re: [Opal] Re: Enabling Parallel Computing on OPAL-T
> From: christof.j.kraus AT gmail.com
> To: andreas.adelmann AT psi.ch
> CC: norman-huang AT hotmail.com; opal AT lists.psi.ch; raoyn AT triumf.ca; krab AT triumf.ca; tplanche AT triumf.ca; fwj AT triumf.ca
>
> Hi,
>
> suppose it were the interconnection, does this then make sense:
>
> * start cite Normans first email *******
> When it's being run, all 8 cpu cores are active, but only one is doing the
> calculations. Furthermore, this run actually took an hour more than a previous
> run using a single core
> * end cite Normans first email *********
> ?
>
> christof
> On Tue, Nov 29, 2011 at 7:03 AM, Andreas Adelmann
> <andreas.adelmann AT psi.ch> wrote:
> > Hello Norman
> >
> > I need to know how many cores trcompdev.triumf.ca has.
> > You are requesting 8 cores on one node (trcompdev.triumf.ca)
> > maybe you are simply overloading trcompdev.
> >
> > One other thing you have to be aware of is the interconnect bandwidth:
> > Gigabit will not work and you have in such a case the very same effect
> > that you are observing. Again Fred know all this. And YES the FFT based
> > solver
> > scales very well iff you have a fast interconnect.
> >
> > I also observer that you are using O(2E6) grid points with only 5E4
> > particles.
> > Your grid is far to large! Per grid point you should have on the average 10
> > ... 40
> > particles.
> >
> > My advice:
> >
> > a) check the how much steam you nodes has
> >
> > b) do you use a fast interconnect?
> >
> > c) reduce the size of the grid or increase the number of particles.
> > NP=5E4 -> G= 16x16x16
> >
> > Cheers Andreas
> >
> >
> >
> > On Nov 28, 2011, at 10:40 PM, Norman Huang wrote:
> >
> > Hi Andreas,
> >
> > Here's the stdout output.
> >
> > Using this configuration, a run takes about 3.5 hours. While using a single
> > node only takes 2 hours.
> >
> > The Pi calculation test was successful, so I think OpenMPI is installed
> > correctly. Does FFT actually
> > exploit parallel processing, or do I have to use multigrid instead?
> >
> > Regards,
> > Norman
> >
> > ________________________________
> > Subject: Re: Enabling Parallel Computing on OPAL-T
> > From: andreas.adelmann AT psi.ch
> > Date: Sat, 26 Nov 2011 08:45:35 +0100
> > CC: opal AT lists.psi.ch; raoyn AT triumf.ca; krab AT triumf.ca; tplanche AT triumf.ca; fwj AT triumf.ca
> > To: norman-huang AT hotmail.com
> >
> > Hello Norman the input files is just fine, I made one small change not
> > related to your problem.
> >
> > Can you send me the std-output of your run? On which cluster do you run?
> >
> > Are you sure that MPI is installed properly i.e. can you run a canonical MPI
> > test - for example
> > calculating Pi - (this test comes with every MPI distribution).
> >
> > You should also contact Fred Jones, I remember having similar problems when
> > I was at Triumf
> > in the beginning of this year.
> >
> > Cheers Andreas
> >
> >
> >
> >
> >
> >
> >
> >
> > On Nov 26, 2011, at 2:54 AM, Norman Huang wrote:
> >
> > Hi Andreas,
> >
> > I'm trying to do space charge calculations using parallel computing on
> > OPAL-T, but
> > it appears only 1 core is being used.
> >
> > This is my fieldsolver parameters:
> >
> > Fs1:FIELDSOLVER, FSTYPE=fft, MX=32, MY=32, MT=2024,
> > PARFFTX=true, PARFFTY=true, PARFFTT=false,
> > BCFFTX=open, BCFFTY=open, BCFFTT=open,
> > BBOXINCR=1, GREENSF=INTEGRATED;
> >
> >
> > Command used to run the test 'isis.in': mpirun -np 8 opal --commlib mpi
> > isis.in
> >
> > When it's being run, all 8 cpu cores are active, but only one is doing the
> > calculations. Furthermore, this run actually took an hour more than a
> > previous
> > run using a single core.
> >
> > Am I missing some configurations?
> >
> > Regards,
> > Norman
> > <isis.in>
> >
> >
> > ------
> > Dr. sc. math. Andreas (Andy) Adelmann
> > Staff Scientist
> > Paul Scherrer Institut WLGB/132 CH-5232 Villigen PSI
> > Phone Office: xx41 56 310 42 33 Fax: xx41 56 310 31 91
> > Phone Home: xx41 62 891 91 44
> > -------------------------------------------------------
> > Thursdays: ETH CAB H 83.1 +41 44 632 36 72
> > ============================================
> > The more exotic, the more abstract the knowledge,
> > the more profound will be its consequences.
> > Leon Lederman
> > ============================================
> >
> >
> >
> > <stdout.txt>
> >
> >
> > ------
> > Dr. sc. math. Andreas (Andy) Adelmann
> > Staff Scientist
> > Paul Scherrer Institut WLGB/132 CH-5232 Villigen PSI
> > Phone Office: xx41 56 310 42 33 Fax: xx41 56 310 31 91
> > Phone Home: xx41 62 891 91 44
> > -------------------------------------------------------
> > Thursdays: ETH CAB H 83.1 +41 44 632 36 72
> > ============================================
> > The more exotic, the more abstract the knowledge,
> > the more profound will be its consequences.
> > Leon Lederman
> > ============================================
> >
> >
> >

------
Dr. sc. math. Andreas (Andy) Adelmann
Staff Scientist
Paul Scherrer Institut WLGB/132 CH-5232 Villigen PSI
Phone Office: xx41 56 310 42 33 Fax: xx41 56 310 31 91
Phone Home: xx41 62 891 91 44
-------------------------------------------------------
Thursdays: ETH CAB H 83.1 +41 44 632 36 72
============================================
The more exotic, the more abstract the knowledge,
the more profound will be its consequences.
Leon Lederman
============================================

Re: [Opal] Re: Enabling Parallel Computing on OPAL-T, christof kraus, 12/01/2011
- RE: [Opal] Re: Enabling Parallel Computing on OPAL-T, Norman Huang, 12/01/2011
  - Re: [Opal] Re: Enabling Parallel Computing on OPAL-T, Andreas Adelmann, 12/01/2011