Step 0: Define your latency budget
“Glass-to-glass” means: photon hits the camera sensor → operator sees it on the display. For teleop, your budget is usually dominated by buffering and decode/render delays — not just encoding.
Step 1: Measure end-to-end latency
You need a repeatable method you can run after every change to prevent chasing ghosts. Choose one of these practical options:
The “LED + camera” method
Blink an LED in view of the camera and film both the LED and operator screen with a high-FPS phone camera. Count frames between LED turning on and screen showing it.
On-frame timestamps
Overlay a sender timestamp into the video frame before encoding, then read it on the receiver and compute delta. Great for graphs; fast for iteration.
Step 2: Sane video settings
Teleop video should prioritize responsiveness over cinematic quality. Your goal is stable frame pacing, low buffering, and predictable recovery.
- Start at 720p (or lower) before attempting 1080p+
- 30 fps is a good baseline; 60 fps is harder on LTE
- H.264 is interoperable; use hardware encode
- Short keyframe interval for recovery (e.g., ~1s)
- Allow bitrate to downshift under congestion
Step 3: Keep buffering tight
Buffers smooth jitter but add direct delay. Keep them small without making the stream unusable.
- → Jitter buffer size (ms)
- → Packet loss & retransmissions
- → Frame drops vs added delay
- → Decode/render queue depth
- → Buffer grows under loss, never shrinks
- → Receiver adds latency to avoid stutter
- → Decode can’t keep up
- → Render is vsync-locked
Step 4: Design for LTE reality
LTE has jitter bursts, variable uplink, and sudden loss. Prefer UDP transport (like WebRTC), ensure STUN/TURN works, and always prefer dropping quality over adding delay when networks get congested.
Step 5: Hardware Encode/Decode
Software encoding easily introduces latency via CPU spikes. On the vehicle (sender), use hardware encoders to avoid extra copies. On the operator (receiver), prefer hardware decode and watch out for silent decode queue growth.
Step 6: Regression checklist
- Baseline glass-to-glass test
- Impairment test (jitter + loss)
- CPU stress test on sender/receiver
- Drop network for 30s & check recovery
- Operator UX test: does it still “feel drivable”?
Starter Configuration
Treat this as a baseline checklist for your stack.
Goal: stable operator view under LTE jitter/loss
Video:
- 720p @ 30fps (start here)
- Hardware H.264 encode if available
- Keyframe interval ~1s (recovery-friendly)
- Bitrate: allow downshift under congestion
Buffers:
- Keep jitter buffer tight
- Avoid multi-stage buffering (decoder + render + display)
Transport:
- WebRTC / UDP with congestion control
- NAT traversal works (STUN/TURN)
- Fast reconnect behavior
Quality Strategy:
- Drop quality over adding delay
- Instrument: loss, jitter, buffer size, frame dropsFrequently Asked Questions
Why not just increase the buffer to remove stutter?↓
Because buffers remove stutter by adding delay. For teleoperation, delay is often worse than visual noise. Let quality drop temporarily and keep latency bounded.
What’s the most common mistake?↓
Optimizing encoding while ignoring receiver buffering and render delay. The experience is end-to-end, so measure the whole pipeline.
How do we handle bad LTE moments?↓
Design for downshift: lower bitrate/resolution temporarily, keep keyframes frequent for recovery, and never let jitter buffers grow without bound.