Skip to content

RTP in Gossipper scenarios

This document explains how to send and receive RTP media from within XML scenario files.

How media endpoints are resolved

Gossipper automatically extracts the remote RTP endpoint from the last received SIP message. The engine parses the SDP body and reads the m=audio, m=video, or m=image line to determine the destination IP and port. No manual endpoint configuration is required inside the scenario XML.

The local bind IP and port are derived from [local_ip] and [media_port].

WebRTC-style SDP (ICE placeholders and SRTP)

For m=audio, m=video, or m=image, if the classic c= / m= pair uses placeholders (0.0.0.0, ::, or m= port 9), the engine prefers the best a=candidate: line (UDP, RTP component 1) inside that media section. If the SIP body has no matching m= line but only ICE lines (e.g. a trickle fragment with a=candidate:), the same parser can still supply IP and port for audio when you run exec rtp_stream start / mic, or for video / image when you run play_pcap_video / play_pcap_image on that message.

BUNDLE, JSON trickle, TURN: When the message is application/trickle-ice+json, gossipper converts the JSON payload to SDP-style lines (EffectiveMediaSDPBody) before extracting addresses or ICE. a=group:BUNDLE is applied when the media section is in the bundle and still uses placeholders, so the RTP port/IP can follow the first bundled MID. If ICE selects typ relay, pass -turn_server, -turn_user, and -turn_pass (and -turn_realm if your server needs it); gossipper allocates a UDP TURN relay and sends RTP/SRTP on that path. Details: srtp.md.

-media_srtp enables SDES (a=crypto:) or DTLS-SRTP (a=fingerprint:), local ICE material for offers, STUN handling on the media socket, and DTLS role from a=setup:. After the first full SRTP answer, a follow-up SIP body that contains only ICE attributes (no SAVP/fingerprint hint) does not tear down negotiated keys; see srtp.md for flags, DTLS client vs server, TURN/relay, JSON trickle, and remaining limits (no full ICE nomination, no TCP TURN in CLI).


SRTP and DTLS-SRTP (scenario media)

exec rtp_stream (file or synthetic) and mic negotiate media security from the last SIP body when you start the stream: enable -media_srtp on the process for SDES (a=crypto:) or DTLS-SRTP (a=fingerprint:), or -media_reject_srtp to refuse encrypted SDP.

Mode SDP signal Notes
SDES a=crypto: with inline: RFC 4568-style keys; supported suites per srtp.md.
DTLS-SRTP a=fingerprint: (SHA-256 / SHA-384) DTLS 1.2 on the RTP PacketConn; demux DTLS vs RTP; SRTP keys from the negotiated DTLS-SRTP profile. Role from a=setup: (Gossipper as DTLS client by default; DTLS server when the peer is a=setup:active).

If the peer hints SRTP and you pass neither -media_srtp nor -media_reject_srtp, rtp_stream start and mic fail with a message pointing at those flags (see configureMediaSRTPForRTPStream in the engine).

PCAP replay (exec play_pcap_audio / video / image) can encrypt outbound RTP when an SRTP send context is active on the Session; the engine clears SRTP state before starting PCAP, so treat srtp.md as the source of truth for PCAP + SRTP ordering and limits.

ICE placeholders, BUNDLE, JSON trickle, and TURN interact with the same media socket used for RTP/SRTP/DTLS; see srtp.md and the WebRTC-style SDP subsection above.


exec rtp_stream

The primary way to control an audio RTP stream from a scenario is the exec rtp_stream action inside any command that supports <action> blocks (<nop>, <recv>, etc.).

Synthetic streams — to stream without a media file, use the synthetic keyword instead of a file path. See synthetic-rtp-sender.md for the full reference.

Start a stream from a file

<nop>
  <action>
    <exec rtp_stream="audio.raw"/>
  </action>
</nop>

The file path is resolved relative to the scenario directory. Both .raw and .wav files are supported.

WAV files must be PCM mono 8 kHz (8-bit or 16-bit samples). Raw files are read as-is and split into fixed-size chunks.

Full parameter syntax

rtp_stream="<path>,<loop_count>,<payload_type>,<payload_name>"
Parameter Default Description
path required Path to the audio file
loop_count 1 Number of times to loop the file; -1 loops indefinitely
payload_type 0 RTP payload type number
payload_name PCMU/8000 Codec descriptor

Supported payload descriptors:

Descriptor PT Clock rate Pkt duration
PCMU/8000 0 8000 Hz 20 ms
PCMA/8000 8 8000 Hz 20 ms
G722/8000 9 8000 Hz 20 ms
ILBC/8000 97 8000 Hz 30 ms
H264/90000 96 90000 Hz 33 ms
OPUS/48000 111 48000 Hz 20 ms

Examples:

<!-- PCMU, play once -->
<exec rtp_stream="audio.raw,1,0,PCMU/8000"/>

<!-- PCMA WAV, loop 3 times -->
<exec rtp_stream="voice.wav,3,8,PCMA/8000"/>

<!-- Loop indefinitely until stop -->
<exec rtp_stream="hold_music.raw,-1,0,PCMU/8000"/>

Control commands

<!-- Pause the running stream -->
<exec rtp_stream="pause"/>

<!-- Resume a paused stream -->
<exec rtp_stream="resume"/>

<!-- Stop the stream and close the socket -->
<exec rtp_stream="stop"/>

<!-- Echo mode: reflects every received RTP packet back to the sender -->
<exec rtp_stream="echo"/>

<!-- Live microphone → PCMU/8000 RTP: default builds — Linux: arecord; macOS/Windows: ffmpeg. Build with -tags audio (+ CGO + libportaudio): PortAudio capture; device = index or name substring. Optional device after a comma (commas inside the device string are preserved). -->
<exec rtp_stream="mic"/>
<exec rtp_stream="mic,plughw:1,0"/>
<exec rtp_stream="mic,ffmpeg:-f pulse -i default"/>
<!-- PortAudio (-tags audio) examples -->
<exec rtp_stream="mic,2"/>
<exec rtp_stream="mic,USB Audio"/>

exec rtp_record

Writes decoded incoming audio to a 16-bit PCM mono 8 kHz WAV file. G.711 PCMU/PCMA (PT 0 / 8) is decoded to PCM. Other payload types and RFC 2833 telephone-event frames advance the timeline as silence (one 20 ms frame for unknown codecs; RFC 2833 uses the event duration field). A small RTP sequence reorder buffer reduces gaps when packets arrive out of order.

<nop>
  <action>
    <exec rtp_record="start,./captures/call.wav"/>
  </action>
</nop>

Forms:

Value Meaning
start,<path> Begin recording to path (mono remote leg).
start,<path>,duplex Stereo WAV: L = sent (local) samples, R = received; lengths padded with silence to match.
stop Stop and flush the WAV file.

Paths are resolved like other media files (relative to the scenario directory). Start recording after rtp_stream (or mic / play_pcap_*) has started the media session, or use CLI -record_wav_dir for automatic per-call files named from Call-ID.


exec play_pcap_audio

Replays a pre-recorded PCAP capture to the remote audio endpoint discovered from SDP. Inter-packet timing from the original capture is preserved.

<nop>
  <action>
    <exec play_pcap_audio="capture.pcap"/>
  </action>
</nop>

The engine looks for the m=audio line in the last SDP to determine the destination.


exec play_pcap_video

Replays a PCAP capture to the video endpoint discovered from m=video in SDP.

<nop>
  <action>
    <exec play_pcap_video="video.pcap"/>
  </action>
</nop>

This is a pragmatic implementation: the raw RTP payload from the PCAP is forwarded as-is without any codec-specific processing.


exec play_pcap_image

Same as play_pcap_video but targets the m=image SDP line.

<nop>
  <action>
    <exec play_pcap_image="fax.pcap"/>
  </action>
</nop>

exec rtpcheck

Blocks scenario execution until a minimum number of RTP packets have been observed, or a timeout expires.

<nop>
  <action>
    <exec rtpcheck="min_packets=10 timeout_ms=2000 direction=any"/>
  </action>
</nop>

Parameters

Parameter Default Description
min_packets 1 Minimum RTP packet count required
timeout_ms 1000 Timeout in milliseconds
direction any any, send, recv, or both

direction=both requires at least min_packets in each direction.

Short form (just a packet count, 1-second timeout, any direction):

<exec rtpcheck="10"/>

exec send_dtmf

Sends DTMF tones as RFC 2833 telephone-event RTP packets to the audio endpoint discovered from SDP.

<nop>
  <action>
    <exec send_dtmf="1234#"/>
  </action>
</nop>

Supported digits: 0–9, *, #, A–D.


RTCP

RTCP sender reports are emitted automatically by the media session every 500 ms while a stream is running. Incoming RTCP packets are parsed and counted. RTCP statistics are aggregated into the engine summary and JSON export.


Complete example: UAC audio call

<?xml version="1.0" encoding="UTF-8" ?>
<scenario name="audio-uac">

  <send retrans="500">
    <![CDATA[
INVITE sip:[service]@[remote_ip]:[remote_port] SIP/2.0
Via: SIP/2.0/[transport] [local_ip]:[local_port];branch=[branch]
From: sipp <sip:sipp@[local_ip]:[local_port]>;tag=[call_number]
To: sut <sip:[service]@[remote_ip]:[remote_port]>
Call-ID: [call_id]
CSeq: [cseq] INVITE
Contact: sip:sipp@[local_ip]:[local_port]
Max-Forwards: 70
Content-Type: application/sdp
Content-Length: [len]

v=0
o=user1 53655765 2353687637 IN IP[local_ip_type] [local_ip]
s=-
c=IN IP[local_ip_type] [local_ip]
t=0 0
m=audio [media_port] RTP/AVP 0
a=rtpmap:0 PCMU/8000
    ]]>
  </send>

  <recv response="100" optional="true"/>

  <recv response="200" rtd="true">
    <action>
      <!-- Start RTP stream to the endpoint found in the 200 OK SDP -->
      <exec rtp_stream="audio.raw,1,0,PCMU/8000"/>
    </action>
  </recv>

  <send>
    <![CDATA[
ACK sip:[service]@[remote_ip]:[remote_port] SIP/2.0
Via: SIP/2.0/[transport] [local_ip]:[local_port];branch=[branch]
From: sipp <sip:sipp@[local_ip]:[local_port]>;tag=[call_number]
To: sut <sip:[service]@[remote_ip]:[remote_port]>[peer_tag_param]
Call-ID: [call_id]
CSeq: [cseq] ACK
Max-Forwards: 70
Content-Length: 0
    ]]>
  </send>

  <!-- Wait for the stream to finish, or hold for a fixed duration -->
  <pause milliseconds="4000"/>

  <!-- Stop media before hanging up -->
  <nop>
    <action>
      <exec rtp_stream="stop"/>
    </action>
  </nop>

  <send retrans="500">
    <![CDATA[
BYE sip:[service]@[remote_ip]:[remote_port] SIP/2.0
Via: SIP/2.0/[transport] [local_ip]:[local_port];branch=[branch]
From: sipp <sip:sipp@[local_ip]:[local_port]>;tag=[call_number]
To: sut <sip:[service]@[remote_ip]:[remote_port]>[peer_tag_param]
Call-ID: [call_id]
CSeq: [cseq] BYE
Max-Forwards: 70
Content-Length: 0
    ]]>
  </send>

  <recv response="200"/>

</scenario>

Complete example: UAS with echo

<?xml version="1.0" encoding="UTF-8" ?>
<scenario name="audio-uas-echo">

  <recv request="INVITE"/>

  <send>
    <![CDATA[
SIP/2.0 100 Trying
[last_Via:]
[last_From:]
[last_To:]
[last_Call-ID:]
[last_CSeq:]
Content-Length: 0
    ]]>
  </send>

  <send>
    <![CDATA[
SIP/2.0 200 OK
[last_Via:]
[last_From:]
[last_To:];tag=[call_number]
[last_Call-ID:]
[last_CSeq:]
Contact: <sip:[local_ip]:[local_port]>
Content-Type: application/sdp
Content-Length: [len]

v=0
o=Gossipper 0 0 IN IP[local_ip_type] [local_ip]
s=-
c=IN IP[local_ip_type] [local_ip]
t=0 0
m=audio [media_port] RTP/AVP 0
a=rtpmap:0 PCMU/8000
    ]]>
  </send>

  <recv request="ACK">
    <action>
      <!-- Reflect incoming RTP back to the caller -->
      <exec rtp_stream="echo"/>
    </action>
  </recv>

  <recv request="BYE">
    <action>
      <exec rtp_stream="stop"/>
    </action>
  </recv>

  <send>
    <![CDATA[
SIP/2.0 200 OK
[last_Via:]
[last_From:]
[last_To:];tag=[call_number]
[last_Call-ID:]
[last_CSeq:]
Content-Length: 0
    ]]>
  </send>

</scenario>

Known limits

  • SRTP / DTLS-SRTP: supported for rtp_stream, mic, and PCAP replay when -media_srtp matches the peer SDP; not a full browser WebRTC stack (ICE nomination, TCP TURN, richer suites — see srtp.md and media-roadmap.md).
  • rtpcheck performs pragmatic activity counting only; full SIPp rtpcheck parity with jitter and loss metrics is deferred.
  • No dedicated video/image codec pipeline; PCAP replay forwards raw RTP payloads.
  • HEP mirroring of RTP/RTCP is deferred; only SIP signaling is currently forwarded to Homer.