Skip to Content.
Sympa Menu

opal - Re: [Opal] Problems with mpirun (and without)

opal AT lists.psi.ch

Subject: The OPAL Discussion Forum

List archive

Re: [Opal] Problems with mpirun (and without)


Chronological Thread  
  • From: "Adelmann Andreas (PSI)" <andreas.adelmann AT psi.ch>
  • To: "Taubert, Sebastian" <taubert AT uni-mainz.de>
  • Cc: "opal AT lists.psi.ch" <opal AT lists.psi.ch>
  • Subject: Re: [Opal] Problems with mpirun (and without)
  • Date: Thu, 26 Nov 2020 10:08:03 +0000
  • Accept-language: en-US, de-CH
  • Authentication-results: localhost; iprev=pass (psi-seppmail1.ethz.ch) smtp.remote-ip=129.132.93.141; spf=pass smtp.mailfrom=psi.ch; dmarc=skipped

Hi Sebastian yes you can ignore the mca  messages. 

Up to 4 cores your parallel run should work fine. Because your
Fieldsolver configuration 

Fs1: FIELDSOLVER,
    FSTYPE=FFT,
    MX=8, MY=8, MT=8,
    PARFFTX=FALSE, PARFFTY=FALSE, PARFFTT=TRUE,
    BCFFTY=open, BCFFTY=open, BCFFTT=open,
    BBOXINCR=1, GREENSF=STANDARD;

you can not use more than 8 cores. Your  grid  is to small with the 
chosen grid distribution. For real runs with space charge you will
probable use larger grids and hence can use more cores.

I hope this makes sense! 


Cheers A
------
Dr. sc. math. Andreas (Andy) Adelmann
Head a.i. Labor for Scientific Computing and Modelling 
Paul Scherrer Institut OHSA/ CH-5232 Villigen PSI
Phone Office: xx41 56 310 42 33 Fax: xx41 56 310 31 91
Zoom ID: 470-582-4086 Password: AdA
-------------------------------------------------------
Friday: ETH HPK G 28   +41 44 633 3076
============================================
The more exotic, the more abstract the knowledge, 
the more profound will be its consequences.
Leon Lederman 
============================================

On 26 Nov 2020, at 10:30, Taubert, Sebastian <taubert AT uni-mainz.de> wrote:

Dear all,


I use the pre compiled OPAL binary Version 2.4. When I start this binary with "opal" (without mpirun and input file) I get the following messages:



[Debby2:01402] mca_base_component_repository_open: unable to open mca_oob_ud: libosmcomp.so.3: cannot open shared object file: No such file or directory (ignored)
[Debby2:01398] mca_base_component_repository_open: unable to open mca_oob_ud: libosmcomp.so.3: cannot open shared object file: No such file or directory (ignored)
[Debby2:01398] mca_base_component_repository_open: unable to open mca_btl_openib: librdmacm.so.1: cannot open shared object file: No such file or directory (ignored)
[Debby2:01398] mca_base_component_repository_open: unable to open mca_pml_ucx: libucp.so.0: cannot open shared object file: No such file or directory (ignored)
[Debby2:01398] mca_base_component_repository_open: unable to open mca_mtl_psm: libpsm_infinipath.so.1: cannot open shared object file: No such file or directory (ignored)
[Debby2:01398] mca_base_component_repository_open: unable to open mca_osc_ucx: libucp.so.0: cannot open shared object file: No such file or directory (ignored)
Ippl> CommMPI: Parent process waiting for children ...
Ippl> CommMPI: Initialization complete.

Despite that, the code runs and gives reasonable results, when it is run with an input file. 


But, as soon as I try to use mpirun and the input file that is attached I get the following error:



ErrorError{0}> All Fields in an _expression_ must be aligned.  (Do you have enough guard cells?)
Error{0}> This error occurred while evaluating an _expression_ for an LField with domain {[0:7:1],[0:7:1],[0:0:1]}
Warning{0}> CommMPI: Found extra message from node 1, tag 20015: msg = Message contains 6 items (0 removed).  Contents:
Warning{0}>   Item 0: 3 elements, 12 bytes total, needDelete = 0
Warning{0}>   Item 1: 3 elements, 12 bytes total, needDelete = 0
Warning{0}>   Item 2: 3 elements, 12 bytes total, needDelete = 0
Warning{0}>   Item 3: 1 elements, 4 bytes total, needDelete = 0
Warning{0{}>   Item 4: 3 elements, 12 bytes total, needDelete = 0
Warning{0}>   Item 5: 64 elements, 1536 bytes total, needDelete = 0
Warning{0}>
9}> All Fields in an _expression_ must be aligned.  (Do you have enough guard cells?)
Error{9}> This error occurred while evaluating an _expression_ for an LField with domain {[0:7:1],[0:7:1],[7:7:1]}


I tried this on different systems, always with the same result. On another system I got additionally the following errors:



OPAL{0}> Track start at: 10:09:41, t= 0.000 [fs]; zstart at: 0.000 [um]
OPAL{0}> Executing ParallelTTracker, initial dt= 1.000 [ps];
OPAL{0}> max integration steps 10000000000, next step= 0
Error{0}> All Fields in an _expression_ must be aligned.  (Do you have enough guard cells?)
Error{0}> This error occurred while evaluating an _expression_ for an LField with domain {[0:7:1],[0:7:1],[0:0:1]}
Segfault
amrex::Error::2::Sorry, out of memory, bye ... !!!
SIGABRT
amrex::Error::5::Sorry, out of memory, bye ... !!!
SIGABRT
amrex::Error::7::Sorry, out of memory, bye ... !!!
SIGABRT
/usr/bin/addr2line: '/gpfs/fs1/home/sthomas/temp/opal': No such file
/usr/bin/addr2line: '/gpfs/fs1/home/sthomas/temp/opal': No such file


Sorry, if this is confused and a bit much, but I don't know what to do next. Is there a problem in my Input file? Why does OPAL alone these strange errors in the beginning? My system takes roughly two minutes for that file.


Thanks for your input! Cheers

Sebastian 


Doctoral Student

Accelerator Physics

Institut für Kernphysik
Johannes Gutenberg-Universität Mainz
Johann-Joachim-Becher-Weg 45
D - 55128 Mainz


E-Mail: sthomas AT uni-mainz.de
Office: Due to Covid-19, temporarily not in office
Mobile: +49 1515 0535622

<drift.in>




Archive powered by MHonArc 2.6.19.

Top of Page