opal AT lists.psi.ch
Subject: The OPAL Discussion Forum
List archive
- From: "Adelmann Andreas (PSI)" <andreas.adelmann AT psi.ch>
- To: Philippe Piot <philippe.piot AT gmail.com>
- Cc: "opal AT lists.psi.ch" <opal AT lists.psi.ch>
- Subject: Re: [Opal] optimizer sometime gets stuck
- Date: Thu, 27 May 2021 14:15:09 +0000
- Accept-language: en-US, de-CH
- Authentication-results: localhost; iprev=pass (psi-seppmail1.ethz.ch) smtp.remote-ip=129.132.93.141; spf=pass smtp.mailfrom=psi.ch; dmarc=skipped
Hello Philippe on Bebop most of the simulation
where done by Nicole. Indeed, we had a lot of “stability” related
problems; with that I mean after a resubmission the run
was successfully. Some where related some MPI environment variables
(you also set some). Needless to say that I also have the suspicion
that there could be a dead lock in OPAL.
We needed the help of the computing consultant to sort out some of these
issues.
The submission script looks good to me.
So here a few things to tryout:
1. does the job run when using less cores?
2. what about the KNL partition? The cores are slower but you need
to use less nodes.
3. in case of a deadlock, maybe a computing guy can find out where in the code
(would need to compile with -O3 -g)
4. as a last resort we could try to debugging this at our retreat. Interesting would be
a configuration with a minimal amount of cores that exhibits the problem.
Does that make sense ?
Cheers A
------
Dr. sc. math. Andreas (Andy) Adelmann
Head a.i. Labor for Scientific Computing and Modelling
Paul Scherrer Institut OHSA/ CH-5232 Villigen PSI
Phone Office: xx41 56 310 42 33 Fax: xx41 56 310 31 91
Zoom ID: 470-582-4086 Password: AdA
Dr. sc. math. Andreas (Andy) Adelmann
Head a.i. Labor for Scientific Computing and Modelling
Paul Scherrer Institut OHSA/ CH-5232 Villigen PSI
Phone Office: xx41 56 310 42 33 Fax: xx41 56 310 31 91
Zoom ID: 470-582-4086 Password: AdA
Zoom Link:
https://ethz.zoom.us/j/4705824086?pwd=dFcvT1pMMGY0bHg0dTNncUNZZTJkZz09
-------------------------------------------------------
Friday: ETH HPK G 28 +41 44 633 3076
============================================
The more exotic, the more abstract the knowledge,
the more profound will be its consequences.
Leon Lederman
============================================
-------------------------------------------------------
Friday: ETH HPK G 28 +41 44 633 3076
============================================
The more exotic, the more abstract the knowledge,
the more profound will be its consequences.
Leon Lederman
============================================
On 27 May 2021, at 14:45, Philippe Piot <philippe.piot AT gmail.com> wrote:
Andreas,Did you ever encounter this type of problem on bebop? This is the cluster I am using -- below is my input script in case you have a good suggestion. Thank you! -- Philippe.
#!/bin/bash -l
#SBATCH -A Bright-Beams
#SBATCH --job-name=awa_optim
#SBATCH -o optim.%j.%N.out
#SBATCH -e optim.%j.%N.error
#SBATCH --time=18:00:00
#SBATCH --nodes=8
#SBATCH --ntasks-per-node=36
#SBATCH --partition=bdwall
#
#export I_MPI_SLURM_EXT=0
#export I_MPI_FABRICS=shm:tmi
ulimit -s unlimited
export OPAL_EXE_PATH=/lcrc/project/Bright-Beams/software/opal/build_gcc/src
#
# cd $SLURM_SUBMIT_DIR
#
rm -rf *.0 tmp *_0
#
# mkdir results tmp
#
# Setup My Environment
module load gcc/7.1.0-4bgguyp
module load boost # needs > 1.66
module load mpich
module load hdf5/1.10.5-fuzylbv # need parallel
module load libszip
module load gsl #/2.4
# Run My Program
mpirun -n $SLURM_NTASKS $OPAL_EXE_PATH/opal awaDrive_optimEmit.in --info 5
On Thu, May 27, 2021 at 7:41 AM Adelmann Andreas (PSI) <andreas.adelmann AT psi.ch> wrote:
Hi Philippe tend to agree with Jochem (I misinterpreted the output snippet in your original email).Cheers A------
Dr. sc. math. Andreas (Andy) Adelmann
Head a.i. Labor for Scientific Computing and Modelling
Paul Scherrer Institut OHSA/ CH-5232 Villigen PSI
Phone Office: xx41 56 310 42 33 Fax: xx41 56 310 31 91
Zoom ID: 470-582-4086 Password: AdAZoom Link: https://ethz.zoom.us/j/4705824086?pwd=dFcvT1pMMGY0bHg0dTNncUNZZTJkZz09
-------------------------------------------------------
Friday: ETH HPK G 28 +41 44 633 3076
============================================
The more exotic, the more abstract the knowledge,
the more profound will be its consequences.
Leon Lederman
============================================
On 27 May 2021, at 14:32, Philippe Piot <philippe.piot AT gmail.com> wrote:
<pilot.trace.0>
-
[Opal] optimizer sometime gets stuck,
Philippe Piot, 05/27/2021
-
Re: [Opal] optimizer sometime gets stuck,
Adelmann Andreas (PSI), 05/27/2021
-
Message not available
- Re: [Opal] optimizer sometime gets stuck, Philippe Piot, 05/27/2021
-
Message not available
-
Message not available
- [Opal] Fwd: optimizer sometime gets stuck | output part I, Philippe Piot, 05/27/2021
-
Re: [Opal] optimizer sometime gets stuck,
Adelmann Andreas (PSI), 05/27/2021
-
Message not available
- [Opal] Fwd: optimizer sometime gets stuck | output part II, Philippe Piot, 05/27/2021
-
Message not available
-
Re: [Opal] optimizer sometime gets stuck,
Adelmann Andreas (PSI), 05/27/2021
-
Re: [Opal] optimizer sometime gets stuck,
Philippe Piot, 05/27/2021
- Re: [Opal] optimizer sometime gets stuck, Adelmann Andreas (PSI), 05/27/2021
-
Re: [Opal] optimizer sometime gets stuck,
Philippe Piot, 05/27/2021
-
Re: [Opal] optimizer sometime gets stuck,
Adelmann Andreas (PSI), 05/27/2021
Archive powered by MHonArc 2.6.19.