Debugging Tips
This page contains a few topics that, although aren’t necessary to know about to run nekRS be default, can be the source of issues while trying to get nekRS running in more complex environments or while making multiple changes.
Just-in-time Compilation
nekRS uses just-in-time compilation to build the functions in the .udf and .oudf
case files, as well as for compiling certain fixed-size arrays based on the order of
the polynomial approximation or other problem settings.
For most cases, no special actions need to be taken by the user for this process to work correctly. However, a high-level understanding of the just-in-time compilation is useful to know what steps need to be taken to fully clear the cached build files, as well as how to perform the pre-compilation separately from a full run to obtain more accurate runtime measurements.
When nekRS performs just-in-time compilation, object files are created in the
.cache directory within the current case directory. To completely clear the
cached state of the case, simply delete the .cache directory:
user$ rm -rf .cache/
Tip
If you experience strange behavior when running your case during the precompilation
step (such as failures to build in COMMON blocks or other parts of the code that you
are not touching in the .udf and .oudf files), try deleting the .cache
directory and trying again. It is not uncommon for the precompilation process to miss
the need to build new versions of object files if you are making frequent changes to
the nekRS source. This is also sometimes encountered if you are using multiple nekRS
versions in different projects (such as standalone nekRS or nekRS wrapped within
a multiphysics coupling application such as ENRICO), but don’t have your
environment completely self-consistent.
The precompilation process usually takes on the order of one minute. Depending on
the use case, it may be advantageous to force the precompilation separately from the run itself.
To precompile the case, use the --build-only option. See the
Scripts That Ship with nekRS section for where to find this script.
As an example, the following commands first precompile a case named
my_case for a later run with at least 4 GPUs. After the precompilation
step, you can run as usual with the nrsmpi script; the precompiled case
will be reused and the build step skipped:
# precompilation
user$ nrsmpi my_case 4 --build-only 4
# actual run
user$ nrsmpi my_case 32
Verbose and Debug Mode
In the .par file, you can enable verbose output by setting
verbose = true in the [GENERAL] section. This prints additional
informational messages during the run, such as detailed option settings and
norm/residual information from the linear solvers.
NekRS also supports a --debug mode (see Run NekRS), which
enables verbose output and activates extra runtime checks, including additional
floating-point exception trapping (via FE_ALL_EXCEPT). This can help detect
NaNs, overflows, and other numerical issues early in the run.
Check Device Arrays
You can quickly inspect device data by printing minima, maxima, and norms using Array Operators.
Alternatively, you can dump a field to a checkpoint file with Adding Custom Output File and then visualize it with your preferred post-processing tool.
Kernel in Serial Mode
When developing a new kernel, it is often useful to test it with the CPU
backend using the command-line option --backend serial. In this mode,
OCCA translates the OKL kernel to C++, so you can use standard host I/O
(e.g., printf or std::cout) inside the kernel for debugging. The only
extra step is to include the C/C++ I/O headers inside the same __okl__
block so that the translated C++ code can see them:
#ifdef __okl__
#include <cstdio>
#include <iostream>
@kernel void fillProp(const dlong Nelements,
const dfloat Re,
const dfloat Pe,
@ restrict dfloat* MUE,
@ restrict dfloat* RHO,
@ restrict dfloat* K,
@ restrict dfloat* RHOCP)
{
for (dlong e = 0; e < Nelements; ++e; @outer(0)) {
for (int n = 0; n < p_Np; ++n; @inner(0)) {
const int id = e * p_Np + n;
MUE[id] = 1.0 / Re;
K[id] = 1.0 / Pe;
RHO[id] = 1.0;
RHOCP[id] = 1.0;
printf("debug: id = %d, RHO = %g\n", id, RHO[id]);
fflush(stdout);
std::cout << "debug: id = " << id
<< ", MUE = " << MUE[id] << std::endl;
}
}
}
#endif
Explicitly flushing (fflush(stdout) and using std::endl) helps ensure
that debug output appears immediately, which is useful when diagnosing hangs
or crashes.
Synchronization
In parallel programming, it is sometimes helpful to insert explicit barriers to isolate or block sections of code while debugging.
In legacy Nek5000, you can force a global synchronization in
userchkusingnekgsync():call nekgsync()
In a NekRS
udffile, you can insert an MPI barrier explicitly:MPI_Barrier(platform->comm.mpiComm());
For host–device synchronization, you can ensure all device work is completed after a kernel launch by synchronizing the device:
platform->device.finish();
For CUDA or HIP backends, setting the following environment variables forces all kernel launches to be synchronous (mainly useful for debugging, as it can significantly reduce performance):
export CUDA_LAUNCH_BLOCKING=1 export HIP_LAUNCH_BLOCKING=1
Using gdb
Using a debugger helps locate segmentation faults, set breakpoints, detect problematic conditions, and obtain backtraces. Here we demonstrate some typical workflows with gdb. Other debugging tools can be used in a similar manner.
Attaching to a Process
After starting NekRS, you can attach gdb to the running process.
First, find the process ID (PID), for example using top or ps:
top
# or
ps aux | grep nekrs
Then attach gdb to the PID:
gdb -p <PID>
Once attached, you can obtain a backtrace:
(gdb) bt
This is particularly useful when the code appears to hang and you still have a chance to inspect where a process is stopped. Note that, because the code runs in parallel, it is common for some ranks to be waiting (e.g., in an MPI call) for a problematic rank to reach a synchronization point. Attaching to a single process may therefore only show that it is blocked in MPI, rather than the original source location of the error.
You can also inspect variables, move between stack frames, monitor threads, set breakpoints, and so on. See the gdb documentation for details.
Tip
On modern Linux, gdb -p can be blocked by ptrace policy (ptrace_scope),
so sometimes you need sudo.
Attach Mode
For an alternative workflow that attaches NekRS right at startup, you can use
the attach mode by Run NekRS with the --attach option.
This pauses NekRS and prints the PID of each rank. For example:
$HOME/bin/nrsmpi ethier.par 8 --backend serial --attach
rank 0 on pop-os: pid<3255824>
Attach debugger, then send <SIGCONT> to rank0 to continue
rank 1 on pop-os: pid<3255825>
rank 2 on pop-os: pid<3255826>
rank 3 on pop-os: pid<3255827>
rank 4 on pop-os: pid<3255828>
rank 5 on pop-os: pid<3255829>
rank 6 on pop-os: pid<3255830>
rank 7 on pop-os: pid<3255831>
In another terminal, attach gdb to rank 0, set breakpoints, and then
continue execution:
gdb -p 3255824
(gdb) break some_function # optional
(gdb) continue
kill -CONT 3255824
Finally, from a third terminal (or after detaching from gdb), send
SIGCONT to rank 0 so that NekRS leaves the pause state and proceeds under
debugger control:
kill -CONT 3255824
This allows you to attach gdb right from the beginning of the run, which is
especially useful when the breakpoint is in the early setup stage.
Launch NekRS under gdb
You can also run NekRS directly under gdb on all ranks. For example, the
following command starts NekRS under gdb and prints a backtrace when an
error is detected:
mpirun -np 8 gdb -ex "run" -ex "bt" --args $NEKRS_HOME/bin/nekrs --setup ethier.par
Note
This makes every process print its backtrace, which can be overwhelming.
You can manually configure gdb logging (for example, directing each
rank’s output to a separate file), but in many cases it is simpler to use
NEKRS_SIGNUM_BACKTRACE. See Runtime Control with Signals.
Tip
NekRS monitors OCCA errors and assertions and will print an OCCA error
message and abort when a fatal condition is detected (e.g., OUT_OF_MEMORY,
DEVICE_NOT_FOUND). When this happens, the code may terminate via
MPI_Abort or an internal error handler, so a simple backtrace on exit is
not always enough to see where the error originated.
To catch such failures more reliably, you can set breakpoints on common error paths before running:
mpirun -np 8 gdb \
-ex "break __cxa_throw" \
-ex "catch catch" \
-ex "break 'occa::exception'" \
-ex "break 'occa::error'" \
-ex "break 'occa::memory::assertInitialized'" \
-ex "break MPI_Abort" \
-ex "run" \
--args $NEKRS_HOME/bin/nekrs --setup ethier.par
With these breakpoints in place, gdb will stop as soon as one of these
error paths is triggered, allowing you to inspect the call stack and local
variables at the point where the failure occurs.