OpenMPI MPI_Init error occurs when importing NetKet
Created by: AndyMc629
Have installed netket and have all the required prerequisites already installed on my Ubuntu 16.04 LTS OS.
Have opened up a Jupyter notebook and ran the following command to check installation
# Import netket library
import netket as nk
and the kernel dies and I get the following stderr output in my terminal:
[andrew-ThinkStation-P300:02140] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_shmem_sysv: /usr/lib/openmpi/lib/openmpi/mca_shmem_sysv.so: undefined symbol: opal_show_help (ignored)
[andrew-ThinkStation-P300:02140] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_shmem_posix: /usr/lib/openmpi/lib/openmpi/mca_shmem_posix.so: undefined symbol: opal_shmem_base_framework (ignored)
[andrew-ThinkStation-P300:02140] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_shmem_mmap: /usr/lib/openmpi/lib/openmpi/mca_shmem_mmap.so: undefined symbol: opal_show_help (ignored)
--------------------------------------------------------------------------
It looks like opal_init failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during opal_init; some of which are due to configuration or
environment problems. This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):
opal_shmem_base_select failed
--> Returned value -1 instead of OPAL_SUCCESS
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems. This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):
opal_init failed
--> Returned value Error (-1) instead of ORTE_SUCCESS
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems. This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):
ompi_mpi_init: ompi_rte_init failed
--> Returned "Error" (-1) instead of "Success" (0)
--------------------------------------------------------------------------
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
*** and potentially your MPI job)
[andrew-ThinkStation-P300:2140] Local abort before MPI_INIT completed successfully; not able to aggregate error messages, and not able to guarantee that all other processes were killed!
[I 16:28:49.165 NotebookApp] KernelRestarter: restarting kernel (1/5), keep random ports
kernel c58795c7-4474-45a7-8fe8-c444fc16b2fe restarted
[I 16:30:40.190 NotebookApp] Saving file at /netket-playground/tutorials-and-examples/Heisenberg1d.ipynb
Has anyone else been having this issue? I've reinstalled my OpenMPI installation from source and I can compile and run mpicc examples.
Would appreciate some advice on even where I should be looking in the source code for the fault, it's been very hard to diagnose (most online examples of this issue just suggest recompiling your OpenMPI installation)
Thanks in advance!