RTP in Gossipper scenarios¶

This document explains how to send and receive RTP media from within XML scenario files.

How media endpoints are resolved¶

Gossipper automatically extracts the remote RTP endpoint from the last received SIP message. The engine parses the SDP body and reads the m=audio, m=video, or m=image line to determine the destination IP and port. No manual endpoint configuration is required inside the scenario XML.

The local bind IP and port are derived from [local_ip] and [media_port].

WebRTC-style SDP (ICE placeholders and SRTP)¶

For m=audio, m=video, or m=image, if the classic c= / m= pair uses placeholders (0.0.0.0, ::, or m= port 9), the engine prefers the best a=candidate: line (UDP, RTP component 1) inside that media section. If the SIP body has no matching m= line but only ICE lines (e.g. a trickle fragment with a=candidate:), the same parser can still supply IP and port for audio when you run exec rtp_stream start / mic, or for video / image when you run play_pcap_video / play_pcap_image on that message.

BUNDLE, JSON trickle, TURN: When the message is application/trickle-ice+json, gossipper converts the JSON payload to SDP-style lines (EffectiveMediaSDPBody) before extracting addresses or ICE. a=group:BUNDLE is applied when the media section is in the bundle and still uses placeholders, so the RTP port/IP can follow the first bundled MID. If ICE selects typ relay, pass -turn_server, -turn_user, and -turn_pass (and -turn_realm if your server needs it); gossipper allocates a UDP TURN relay and sends RTP/SRTP on that path. Details: srtp.md.

-media_srtp enables SDES (a=crypto:) or DTLS-SRTP (a=fingerprint:), local ICE material for offers, STUN handling on the media socket, and DTLS role from a=setup:. After the first full SRTP answer, a follow-up SIP body that contains only ICE attributes (no SAVP/fingerprint hint) does not tear down negotiated keys; see srtp.md for flags, DTLS client vs server, TURN/relay, JSON trickle, and remaining limits (no full ICE nomination, no TCP TURN in CLI).

SRTP and DTLS-SRTP (scenario media)¶

exec rtp_stream (file or synthetic) and mic negotiate media security from the last SIP body when you start the stream: enable -media_srtp on the process for SDES (a=crypto:) or DTLS-SRTP (a=fingerprint:), or -media_reject_srtp to refuse encrypted SDP.

Mode	SDP signal	Notes
SDES	`a=crypto:` with `inline:`	RFC 4568-style keys; supported suites per srtp.md.
DTLS-SRTP	`a=fingerprint:` (SHA-256 / SHA-384)	DTLS 1.2 on the RTP `PacketConn`; demux DTLS vs RTP; SRTP keys from the negotiated DTLS-SRTP profile. Role from `a=setup:` (Gossipper as DTLS client by default; DTLS server when the peer is `a=setup:active`).

If the peer hints SRTP and you pass neither -media_srtp nor -media_reject_srtp, rtp_stream start and mic fail with a message pointing at those flags (see configureMediaSRTPForRTPStream in the engine).

PCAP replay (exec play_pcap_audio / video / image) can encrypt outbound RTP when an SRTP send context is active on the Session; the engine clears SRTP state before starting PCAP, so treat srtp.md as the source of truth for PCAP + SRTP ordering and limits.

ICE placeholders, BUNDLE, JSON trickle, and TURN interact with the same media socket used for RTP/SRTP/DTLS; see srtp.md and the WebRTC-style SDP subsection above.

exec rtp_stream¶

The primary way to control an audio RTP stream from a scenario is the exec rtp_stream action inside any command that supports <action> blocks (<nop>, <recv>, etc.).

Synthetic streams — to stream without a media file, use the synthetic keyword instead of a file path. See synthetic-rtp-sender.md for the full reference.

Start a stream from a file¶

<nop>
  <action>
    <exec rtp_stream="audio.raw"/>
  </action>
</nop>

The file path is resolved relative to the scenario directory. Both .raw and .wav files are supported.

WAV files must be PCM mono 8 kHz (8-bit or 16-bit samples). Raw files are read as-is and split into fixed-size chunks.

Full parameter syntax¶

rtp_stream="<path>,<loop_count>,<payload_type>,<payload_name>"

Parameter	Default	Description
`path`	required	Path to the audio file
`loop_count`	`1`	Number of times to loop the file; `-1` loops indefinitely
`payload_type`	`0`	RTP payload type number
`payload_name`	`PCMU/8000`	Codec descriptor

Supported payload descriptors:

Descriptor	PT	Clock rate	Pkt duration
`PCMU/8000`	0	8000 Hz	20 ms
`PCMA/8000`	8	8000 Hz	20 ms
`G722/8000`	9	8000 Hz	20 ms
`ILBC/8000`	97	8000 Hz	30 ms
`H264/90000`	96	90000 Hz	33 ms
`OPUS/48000`	111	48000 Hz	20 ms

Examples:

<!-- PCMU, play once -->
<exec rtp_stream="audio.raw,1,0,PCMU/8000"/>

<!-- PCMA WAV, loop 3 times -->
<exec rtp_stream="voice.wav,3,8,PCMA/8000"/>

<!-- Loop indefinitely until stop -->
<exec rtp_stream="hold_music.raw,-1,0,PCMU/8000"/>

Control commands¶

<!-- Pause the running stream -->
<exec rtp_stream="pause"/>

<!-- Resume a paused stream -->
<exec rtp_stream="resume"/>

<!-- Stop the stream and close the socket -->
<exec rtp_stream="stop"/>

<!-- Echo mode: reflects every received RTP packet back to the sender -->
<exec rtp_stream="echo"/>

<!-- Live microphone → PCMU/8000 RTP: default builds — Linux: arecord; macOS/Windows: ffmpeg. Build with -tags audio (+ CGO + libportaudio): PortAudio capture; device = index or name substring. Optional device after a comma (commas inside the device string are preserved). -->
<exec rtp_stream="mic"/>
<exec rtp_stream="mic,plughw:1,0"/>
<exec rtp_stream="mic,ffmpeg:-f pulse -i default"/>
<!-- PortAudio (-tags audio) examples -->
<exec rtp_stream="mic,2"/>
<exec rtp_stream="mic,USB Audio"/>

exec rtp_record¶

Writes decoded incoming audio to a 16-bit PCM mono 8 kHz WAV file. G.711 PCMU/PCMA (PT 0 / 8) is decoded to PCM. Other payload types and RFC 2833 telephone-event frames advance the timeline as silence (one 20 ms frame for unknown codecs; RFC 2833 uses the event duration field). A small RTP sequence reorder buffer reduces gaps when packets arrive out of order.

<nop>
  <action>
    <exec rtp_record="start,./captures/call.wav"/>
  </action>
</nop>

Forms:

Value	Meaning
`start,<path>`	Begin recording to `path` (mono remote leg).
`start,<path>,duplex`	Stereo WAV: L = sent (local) samples, R = received; lengths padded with silence to match.
`stop`	Stop and flush the WAV file.

Paths are resolved like other media files (relative to the scenario directory). Start recording after rtp_stream (or mic / play_pcap_*) has started the media session, or use CLI -record_wav_dir for automatic per-call files named from Call-ID.

exec play_pcap_audio¶

Replays a pre-recorded PCAP capture to the remote audio endpoint discovered from SDP. Inter-packet timing from the original capture is preserved.

<nop>
  <action>
    <exec play_pcap_audio="capture.pcap"/>
  </action>
</nop>

The engine looks for the m=audio line in the last SDP to determine the destination.

exec play_pcap_video¶

Replays a PCAP capture to the video endpoint discovered from m=video in SDP.

<nop>
  <action>
    <exec play_pcap_video="video.pcap"/>
  </action>
</nop>

This is a pragmatic implementation: the raw RTP payload from the PCAP is forwarded as-is without any codec-specific processing.

exec play_pcap_image¶

Same as play_pcap_video but targets the m=image SDP line.

<nop>
  <action>
    <exec play_pcap_image="fax.pcap"/>
  </action>
</nop>

exec rtpcheck¶

Blocks scenario execution until a minimum number of RTP packets have been observed, or a timeout expires.

<nop>
  <action>
    <exec rtpcheck="min_packets=10 timeout_ms=2000 direction=any"/>
  </action>
</nop>

Parameters¶

Parameter	Default	Description
`min_packets`	`1`	Minimum RTP packet count required
`timeout_ms`	`1000`	Timeout in milliseconds
`direction`	`any`	`any`, `send`, `recv`, or `both`

direction=both requires at least min_packets in each direction.

Short form (just a packet count, 1-second timeout, any direction):

<exec rtpcheck="10"/>

exec send_dtmf¶

Sends DTMF tones as RFC 2833 telephone-event RTP packets to the audio endpoint discovered from SDP.

<nop>
  <action>
    <exec send_dtmf="1234#"/>
  </action>
</nop>

Supported digits: 0–9, *, #, A–D.

RTCP¶

RTCP sender reports are emitted automatically by the media session every 500 ms while a stream is running. Incoming RTCP packets are parsed and counted. RTCP statistics are aggregated into the engine summary and JSON export.

Complete example: UAC audio call¶

<?xml version="1.0" encoding="UTF-8" ?>
<scenario name="audio-uac">

  <send retrans="500">
    <![CDATA[
INVITE sip:[service]@[remote_ip]:[remote_port] SIP/2.0
Via: SIP/2.0/[transport] [local_ip]:[local_port];branch=[branch]
From: sipp <sip:sipp@[local_ip]:[local_port]>;tag=[call_number]
To: sut <sip:[service]@[remote_ip]:[remote_port]>
Call-ID: [call_id]
CSeq: [cseq] INVITE
Contact: sip:sipp@[local_ip]:[local_port]
Max-Forwards: 70
Content-Type: application/sdp
Content-Length: [len]

v=0
o=user1 53655765 2353687637 IN IP[local_ip_type] [local_ip]
s=-
c=IN IP[local_ip_type] [local_ip]
t=0 0
m=audio [media_port] RTP/AVP 0
a=rtpmap:0 PCMU/8000
    ]]>
  </send>

  <recv response="100" optional="true"/>

  <recv response="200" rtd="true">
    <action>
      <!-- Start RTP stream to the endpoint found in the 200 OK SDP -->
      <exec rtp_stream="audio.raw,1,0,PCMU/8000"/>
    </action>
  </recv>

  <send>
    <![CDATA[
ACK sip:[service]@[remote_ip]:[remote_port] SIP/2.0
Via: SIP/2.0/[transport] [local_ip]:[local_port];branch=[branch]
From: sipp <sip:sipp@[local_ip]:[local_port]>;tag=[call_number]
To: sut <sip:[service]@[remote_ip]:[remote_port]>[peer_tag_param]
Call-ID: [call_id]
CSeq: [cseq] ACK
Max-Forwards: 70
Content-Length: 0
    ]]>
  </send>

  <!-- Wait for the stream to finish, or hold for a fixed duration -->
  <pause milliseconds="4000"/>

  <!-- Stop media before hanging up -->
  <nop>
    <action>
      <exec rtp_stream="stop"/>
    </action>
  </nop>

  <send retrans="500">
    <![CDATA[
BYE sip:[service]@[remote_ip]:[remote_port] SIP/2.0
Via: SIP/2.0/[transport] [local_ip]:[local_port];branch=[branch]
From: sipp <sip:sipp@[local_ip]:[local_port]>;tag=[call_number]
To: sut <sip:[service]@[remote_ip]:[remote_port]>[peer_tag_param]
Call-ID: [call_id]
CSeq: [cseq] BYE
Max-Forwards: 70
Content-Length: 0
    ]]>
  </send>

  <recv response="200"/>

</scenario>

Complete example: UAS with echo¶

<?xml version="1.0" encoding="UTF-8" ?>
<scenario name="audio-uas-echo">

  <recv request="INVITE"/>

  <send>
    <![CDATA[
SIP/2.0 100 Trying
[last_Via:]
[last_From:]
[last_To:]
[last_Call-ID:]
[last_CSeq:]
Content-Length: 0
    ]]>
  </send>

  <send>
    <![CDATA[
SIP/2.0 200 OK
[last_Via:]
[last_From:]
[last_To:];tag=[call_number]
[last_Call-ID:]
[last_CSeq:]
Contact: <sip:[local_ip]:[local_port]>
Content-Type: application/sdp
Content-Length: [len]

v=0
o=Gossipper 0 0 IN IP[local_ip_type] [local_ip]
s=-
c=IN IP[local_ip_type] [local_ip]
t=0 0
m=audio [media_port] RTP/AVP 0
a=rtpmap:0 PCMU/8000
    ]]>
  </send>

  <recv request="ACK">
    <action>
      <!-- Reflect incoming RTP back to the caller -->
      <exec rtp_stream="echo"/>
    </action>
  </recv>

  <recv request="BYE">
    <action>
      <exec rtp_stream="stop"/>
    </action>
  </recv>

  <send>
    <![CDATA[
SIP/2.0 200 OK
[last_Via:]
[last_From:]
[last_To:];tag=[call_number]
[last_Call-ID:]
[last_CSeq:]
Content-Length: 0
    ]]>
  </send>

</scenario>

Known limits¶

SRTP / DTLS-SRTP: supported for rtp_stream, mic, and PCAP replay when -media_srtp matches the peer SDP; not a full browser WebRTC stack (ICE nomination, TCP TURN, richer suites — see srtp.md and media-roadmap.md).
rtpcheck performs pragmatic activity counting only; full SIPp rtpcheck parity with jitter and loss metrics is deferred.
No dedicated video/image codec pipeline; PCAP replay forwards raw RTP payloads.
HEP mirroring of RTP/RTCP is deferred; only SIP signaling is currently forwarded to Homer.