MPS

NVIDIA Multi-Process Service (MPS) is a facility that enables compute kernels submitted from multiple CPU processes to execute simultaneously on the same GPU. Such overlapping can potentially enable more thorough resource use and better overall throughput.

Using MPS can also enable powerful scaling of applications across multiple GPUs, through more efficient overlapping of hardware resource utilization and better exploitation of CPU-based parallelism.

You are advised to enable this feature when the application is started and disable it when the application is stopped. If this feature is enabled for an application that does not use it, the application performance may deteriorate. You are advised to run the following script:

readonly procs_per_gpu=${GPU_WORKERS:-2}
readonly host_mps=${HOST_MPS:-}
if (( procs_per_gpu > 1 )) && [[ -z "${host_mps}" ]]; then
    export CUDA_MPS_PIPE_DIRECTORY="${PWD}/.mps"
    export CUDA_MPS_LOG_DIRECTORY="${PWD}/.mps"
    if ! nvidia-cuda-mps-control -d; then
        echo "ERROR: Failed to start MPS daemon. Please resolve issue or set GPU_WORKERS to 1"
        exit 1
    fi
    echo "INFO: MPS server daemon started"
    trap "echo quit | nvidia-cuda-mps-control" EXIT
fi

Parent topic: Multi-GPU Optimization