Skip to Content.
Sympa Menu

opal - Re: [Opal] MPI issue

opal AT lists.psi.ch

Subject: The OPAL Discussion Forum

List archive

Re: [Opal] MPI issue


Chronological Thread  
  • From: Adelmann Andreas <andreas.adelmann AT psi.ch>
  • To: Robert Nagler <nagler AT radiasoft.net>
  • Cc: "opal AT lists.psi.ch" <opal AT lists.psi.ch>
  • Subject: Re: [Opal] MPI issue
  • Date: Thu, 13 Jul 2023 07:56:19 +0000
  • Accept-language: en-US
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 104.47.22.41) smtp.rcpttodomain=lists.psi.ch smtp.mailfrom=psi.ch; dmarc=pass (p=none sp=none pct=100) action=none header.from=psi.ch; dkim=none (message not signed); arc=none (0)
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=M5oh8+4DFgxXnL/w5XeqUXDWv3THfZkf6uVx2/EcCbc=; b=e4aLYhQZcHGqJ0A/TUrrKauGaioZoWl5E9Qu+Kq11+L83qJdGlvI/3t2RQWJApDFU1CdYUkiOwD0t2bifdnsvJvAs6bMPjlBKoIcgF0fjVH2muOoWsDPU1l8pRsbncewP3uidnrEFWEpWX3Q7rE3MASeUeDQ7F61LC59mTM6Xq3HfanptoY+y1C4kV1pbo+rpVMjSza7Aw5qF7st7g30+A3T/F5ILhXUM/M2QYzC0pMSp2xbsqxfs/dcroUVnGi6elcGIoxHnT+rh4OznW+2DXFhW9e/DoD6Ly4BHdbRf0Z3upjdfqFjA02JqTleK/7gx/PBTo2lZagWXfMF/1PB/Q==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=Dtw8rqI8cEjWz/MHd930Xwjn4WrwFZMOXIOxzGf79WRFeJ8GSS5avRWnVUfNoR+mATcngvd2EBxzQ/S2X4a4asZUSiBCZsP9IanFhzpiEYB4+J5AMO3stge/fiMlYULprCCCSuEITAbIE5bDGj1xWbJ04Hnx6ezX62LddjtbFi4D4V1xF4oNkefK0rkdVH0onIbFU1w6TpC4VTEczLmvjCXoKjjsIq+ASVuGjf1iQd/Tx+BpTOY1TtniN099AmVcY35QzsX8NlDJn8eAYL/0WWAAEYKq5ej2rqWemnoM4js+sZM3ca+5Ap8WwfFANDUAHkf50ZnOHVPQv75kLTIZKA==
  • Authentication-results: mc2.ethz.ch; iprev=pass (mail-zr0che01on2083.outbound.protection.outlook.com) smtp.remote-ip=40.107.24.83; spf=pass smtp.mailfrom=psi.ch; dkim=pass header.d=psi.ch header.s=selector2 header.a=rsa-sha256; dmarc=pass header.from=psi.ch
  • Authentication-results-original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=psi.ch;

Hi Rob, I never got that error.

Is the problem arising only when you run OPAL from the container on Perlmutter OR 
does the problem also manifest itself when compiling/running on your cluster?

Cheers A
------
Dr. sc. math. Andreas (Andy) Adelmann
Paul Scherrer Institut OHSA/D17 CH-5232 Villigen PSI
Phone Office: xx41 56 310 42 33 Fax: xx41 56 310 31 91
Zoom ID: 470-582-4086 Password: AdA
Zoom Link: https://ethz.zoom.us/j/4705824086?pwd=dFcvT1pMMGY0bHg0dTNncUNZZTJkZz09

-------------------------------------------------------
Friday: ETH HPK G 28   +41 44 633 3076
============================================
The more exotic, the more abstract the knowledge, 
the more profound will be its consequences.
Leon Lederman 
============================================

On 11 Jul 2023, at 21:23, Robert Nagler <nagler AT radiasoft.net> wrote:

We're trying to run Opal with the attached input on NERSC Perlmutter via Shifter (NERSC's container technology).

The first problem we ran into is that NERSC's Cray MPICH ABI DSOs do not include C++ bindings, since they are deprecated in MPI 3. This was worked around by switching to the mpicc (instead of mpicxx), which doesn't include libmpic++.so. Opal loads on Perlmutter with our image.

The current problem is this:
PMPI_Allreduce(497).....: MPI_Allreduce(sbuf=MPI_IN_PLACE, rbuf=0x7fff418617a7, count=1, datatype=dtype=0x4c000133, op=MPI_LOR, comm=MPI_COMM_WORLD) failed
MPIR_LOR_check_dtype(92): MPI_Op MPI_LOR operation not defined for this datatype

We switched to Opal 2022.1.0 and Trilions 13.0.1 and updated other dependencies before we got this error. We are using Fedora 36 as the base container image, which comes with gcc 12.2.1 and mpich-3.4.3. 

I will debug this further, but I was wondering if someone has run into this issue and is it specific to 2022.1.0.

Thanks,
Rob

Robert Nagler
CTO | RadiaSoft LLC

<eic_test_wig.txt>




Archive powered by MHonArc 2.6.24.

Top of Page