Why nvv4l2h264enc Fails Inside Docker on Jetson Orin NX with JetPack 6.2.2

Why nvv4l2h264enc Fails Inside Docker on Jetson Orin NX with JetPack 6.2.2

After a fresh JetPack 6.2.2 (L4T R36.5.0) installation on a Jetson Orin NX, I ran into an issue: hardware H.264 encoding worked perfectly on the host but consistently failed inside Docker containers. What followed was a deep dive through strace logs, DRM ioctls, and NVIDIA’s closed-source library internals — only to discover the root cause was a missing two-megabyte package.

The Setup

  • Board: Jetson Orin NX (Engineering Reference Developer Kit)
  • JetPack: 6.2.2 (L4T R36.5.0, kernel 5.15.185-tegra, OOT kernel modules)
  • Container: nvcr.io/nvidia/deepstream-l4t:7.1-samples-multiarch
  • Docker runtime: --runtime=nvidia --privileged

The Symptom

On the host, the canonical GStreamer hardware encoding pipeline runs without issues:

gst-launch-1.0 videotestsrc ! nvvidconv ! 'video/x-raw(memory:NVMM), framerate=5/1' \
  ! nvv4l2h264enc ! fakesink
Setting pipeline to PAUSED ...
Opening in BLOCKING MODE
NvMMLiteOpen : Block : BlockType = 4
===== NvVideo: NVENC =====
NvMMLiteBlockCreate : Block : BlockType = 4
H264: Profile = 66 Level = 0
NVMEDIA: Need to set EMC bandwidth : 21000
Pipeline is PREROLLED ...

Inside Docker — same image, same hardware, --privileged, --runtime=nvidia — it dies:

Opening in BLOCKING MODE
ENC_CTX(0xffff88008460) Error in initializing nvenc context
ERROR: from element /GstPipeline:pipeline0/nvv4l2h264enc:nvv4l2h264enc0:
  Could not get/set settings from/on resource.
  Device is in streaming mode

No NvMMLiteOpen. No ===== NvVideo: NVENC =====. The NVMM library gives up before even attempting to talk to the hardware.

What I Ruled Out

Device nodes — all present

The container sees every relevant device: /dev/nvmap, /dev/host1x-fence, /dev/dri/renderD128 (tegra host1x DRM), /dev/nvgpu/igpu0/*, /dev/v4l2-nvenc. The major:minor numbers match the host exactly.

Libraries — identical

Every NVIDIA library is bind-mounted from the host by the nvidia-container-runtime. I checksummed the critical ones (libnvtvmr.so, libnvrm_host1x.so, libtegrav4l2.so, libv4l2_nvvideocodec.so, libnvmmlite_video.so) — all identical between host and container.

DRM access — works fine

I wrote a quick Python test that opens /dev/dri/renderD128 and calls DRM_IOCTL_TEGRA_CHANNEL_OPEN for the NVENC host1x class (0x21). It succeeds in both environments, returning context=1, version=35, capabilities=1.

Permissions and cgroups — not the issue

With --privileged, Docker disables seccomp, AppArmor, and cgroup device filtering. NVIDIA_VISIBLE_DEVICES=all made no difference.

sysfs — accessible

/sys/bus/nvmem/devices/fuse/nvmem, /sys/devices/soc0/*, /sys/firmware/devicetree/base/* — all readable inside the container. The chip correctly identifies as Tegra234 (Orin), soc_id=35.

The Investigation

Comparing DRM ioctl traces

I ran full strace -f -e trace=ioctl captures on both host and container. The difference was stark:

Metric Host Container
Total ioctl calls 2016 1644
DRM_IOCTL calls 236 36
DRM_IOCTL_TEGRA_CHANNEL_OPEN Yes Never called

On the host, the encoder thread opens a DRM tegra channel to the NVENC engine. In the container, the library queries DRM_IOCTL_VERSION on the render nodes but never proceeds to open a channel. The initialization is aborted at a higher level.

Tracing the encoder thread

By tracing all syscalls with strace -f, I identified the exact thread responsible for encoder initialization. On the host, this thread prints NvMMLiteOpen and then opens DRM channels. In the container, the thread spawns two child processes via clone() + execve() and then immediately reports failure.

The child processes were:

  1. lsmod — exits with code 127 (“command not found”)
  2. grep nvgpu — reads from the pipe, gets nothing, exits with non-zero

The NVIDIA NVMM library calls something equivalent to popen("lsmod | grep nvgpu") to verify that the nvgpu kernel module is loaded before initializing the encoder context.

The failure chain

libnvv4l2.so
  └─ libv4l2_nvvideocodec.so
       └─ libnvtvmr.so (Tegra Video Resource Manager)
            └─ system("lsmod | grep nvgpu")
                 └─ sh: lsmod: not found
                      └─ grep reads empty pipe → exit 1
                           └─ libnvtvmr concludes: no GPU → abort NVENC init

The library doesn’t check /proc/modules directly. It doesn’t query the kernel via sysfs. It shells out to lsmod — a userspace utility from the kmod package — and when that binary is missing, it silently concludes the hardware doesn’t exist.

The Fix

apt-get install -y kmod

That’s it. After installing kmod (which provides lsmod, modprobe, modinfo, etc.), the encoder initializes successfully inside the container:

Opening in BLOCKING MODE
NvMMLiteOpen : Block : BlockType = 4
===== NvVideo: NVENC =====
NvMMLiteBlockCreate : Block : BlockType = 4
H264: Profile = 66 Level = 0
NVMEDIA: Need to set EMC bandwidth : 21000
NvVideo: bBlitMode is set to TRUE
Pipeline is PREROLLED ...
Got EOS from element "pipeline0".

For Dockerfiles, add kmod to your package list:

RUN apt-get update && apt-get install -y kmod && rm -rf /var/lib/apt/lists/*

Why This Only Surfaces Now

On JetPack 5.x (Xavier-era), the base container images typically included kmod as a dependency of other packages. The DeepStream L4T 7.1 container for JetPack 6.x ships a leaner base that doesn’t pull it in. The host always has kmod installed as part of the Ubuntu base system — so the problem is container-only.

The error message Error in initializing nvenc context gives zero indication that the issue is a missing userspace utility. The sh: 1: lsmod: not found warning does appear in the GStreamer plugin scanner output, but it’s buried among dozens of other harmless warnings and is easily dismissed as irrelevant since it appears during plugin scanning, not during the actual encoding attempt. What makes it worse is that the NVMM library runs its own lsmod check silently — the stderr from the failed shell-out is redirected to /dev/null.

Takeaways

  1. --privileged doesn’t help when the problem is a missing binary, not missing permissions. Every device node, sysfs entry, and DRM ioctl was accessible. The library simply refused to use them.

  2. Closed-source libraries with shell-out dependencies are a debugging nightmare. Without source access, the only path is binary-level tracing. The strace comparing host vs. container thread behavior was what ultimately revealed the lsmod dependency.

  3. Always install kmod in Jetson Docker containers. If your container runs any NVIDIA multimedia workload (encoding, decoding, VIC), the NVMM library stack may depend on lsmod for hardware discovery. This isn’t documented anywhere in NVIDIA’s container guides.

  4. The lsmod: not found warning during gst-plugin-scanner is a red flag. If you see it, your encoding/decoding pipelines will likely fail at runtime even though the GStreamer elements appear to load successfully.