Homer Server Storage Layout¶
Documentation for on-disk data storage structure.
Overview¶
Homer Server uses DuckLake — a lakehouse format that combines: - Parquet files for data storage - SQLite catalog for metadata (snapshots, schemas, file statistics) - Hive-style partitioning for efficient time-range queries
Directory Structure¶
/data/homer/
├── homer_catalog.sqlite # DuckLake catalog (metadata)
└── parquet/ # Data directory
└── main/ # Schema "main"
├── hep_proto_1_call/ # SIP calls (INVITE, BYE, etc.)
│ ├── date=2025-01-26/
│ │ ├── data_00001.parquet
│ │ └── data_00002.parquet
│ └── date=2025-01-27/
│ ├── data_00003.parquet
│ └── data_00004.parquet
│
├── hep_proto_1_registration/ # SIP registrations (REGISTER)
│ └── date=2025-01-27/
│ └── data_00001.parquet
│
├── hep_proto_1_default/ # Other SIP (OPTIONS, NOTIFY, etc.)
├── hep_proto_5_default/ # RTCP JSON
├── hep_proto_34_default/ # RTCP binary
├── hep_proto_35_default/ # RTP
├── hep_proto_53_default/ # DNS
└── hep_proto_100_default/ # Logs
Catalog (homer_catalog.sqlite)¶
SQLite database containing:
| Table | Description |
|---|---|
ducklake_table |
List of tables |
ducklake_column |
Column schemas |
ducklake_snapshot |
Snapshot history |
ducklake_data_file |
List of Parquet files |
ducklake_file_column_stats |
Min/max column statistics |
ducklake_partition_column |
Partitioning settings |
ducklake_partition_info |
Partition values in files |
Catalog Contents Example¶
-- Tables
SELECT table_name FROM ducklake_table;
-- hep_proto_1_call
-- hep_proto_1_registration
-- hep_proto_5_default
-- ...
-- Files
SELECT file_path, row_count FROM ducklake_data_file WHERE table_id = 1;
-- main/hep_proto_1_call/date=2025-01-27/hour=14/data_00001.parquet, 50000
-- Partitioning
SELECT column_name, partition_type FROM ducklake_partition_column WHERE table_id = 1;
-- date, identity
Tables by Protocol Type¶
SIP (proto_type=1)¶
SIP messages are split into three tables by transaction type:
| Table | Methods | Description |
|---|---|---|
hep_proto_1_call |
INVITE, ACK, PRACK, UPDATE, BYE, CANCEL, INFO | Call messages |
hep_proto_1_registration |
REGISTER | Registrations |
hep_proto_1_default |
OPTIONS, NOTIFY, SUBSCRIBE, PUBLISH, MESSAGE, REFER | Other |
hep_proto_1_siprec |
SIPREC SRS signaling (in-process receiver) | Recording metadata + SIP |
Important: SIP responses (200 OK, 180 Ringing) are routed by the method from CSeq header.
Other Protocols¶
| Proto Type | Table | Description |
|---|---|---|
| 5 | hep_proto_5_default |
RTCP JSON (quality statistics) |
| 34 | hep_proto_34_default |
RTCP binary |
| 35 | hep_proto_35_default |
RTP packets |
| 53 | hep_proto_53_default |
DNS queries |
| 100 | hep_proto_100_default |
Logs |
Dedicated ingest tables¶
| Table | Source | Description |
|---|---|---|
vqrtcpxr_stats |
VQRTCP SIP receiver (ingest.vqrtcp) |
VQ-RTCPXR QoS reports (Call-ID correlated) |
otlp_traces / otlp_metrics / otlp_logs |
OTLP receiver | OpenTelemetry signals |
Partitioning¶
Partitioning Scheme¶
ALTER TABLE homer_lake.hep_proto_1_call
SET PARTITIONED BY (date);
Creates structure:
date=2025-01-27/data.parquet
Benefits¶
- Partition pruning — queries with date filter scan only required directories
- Minimal nesting — 1 level only (date)
- Readable paths — date is immediately visible in directory name
- Simple retention — easy to delete old data by removing date directories
Query Optimization Example¶
-- DuckDB will automatically skip all partitions except date=2025-01-27
SELECT * FROM homer_lake.hep_proto_1_call
WHERE date = '2025-01-27'
AND timestamp >= '2025-01-27 10:00:00'
AND timestamp < '2025-01-27 12:00:00';
Column Schemas¶
SIP Call (hep_proto_1_call)¶
uuid VARCHAR, -- Unique record ID
date DATE, -- Date (for partitioning)
timestamp TIMESTAMP, -- Packet timestamp
session_id VARCHAR, -- Call-ID
caller VARCHAR, -- From user
callee VARCHAR, -- To user
src_ip VARCHAR, -- Source IP
dst_ip VARCHAR, -- Destination IP
src_port UINTEGER, -- Source port
dst_port UINTEGER, -- Destination port
method VARCHAR, -- SIP method (INVITE, BYE, etc.)
response_code VARCHAR, -- Response code (200, 404, etc.)
cseq_method VARCHAR, -- Method from CSeq header
protocol UINTEGER, -- Transport (17=UDP, 6=TCP)
node_id VARCHAR, -- HEP node ID
cid VARCHAR, -- Correlation ID
payload VARCHAR, -- Raw SIP message
data_extra JSON -- Additional headers (via, user_agent, etc.)
SIP Registration (hep_proto_1_registration)¶
uuid VARCHAR,
date DATE,
timestamp TIMESTAMP,
session_id VARCHAR, -- Call-ID
aor VARCHAR, -- Address of Record (user@domain)
contact VARCHAR, -- Contact header
expires VARCHAR, -- Expires value
user_agent VARCHAR, -- User-Agent header
src_ip VARCHAR,
dst_ip VARCHAR,
src_port UINTEGER,
dst_port UINTEGER,
method VARCHAR,
response_code VARCHAR,
protocol UINTEGER,
node_id VARCHAR,
payload VARCHAR,
data_extra JSON
Generic (RTCP, RTP, DNS, LOG)¶
uuid VARCHAR,
date DATE,
timestamp TIMESTAMP,
session_id VARCHAR, -- Correlation ID
src_ip VARCHAR,
dst_ip VARCHAR,
src_port UINTEGER,
dst_port UINTEGER,
protocol UINTEGER,
node_id VARCHAR,
cid VARCHAR,
payload VARCHAR, -- Raw payload / JSON
data_extra JSON -- Metadata
Parquet Files¶
Format¶
- Compression: Snappy (DuckDB default)
- Row groups: ~100K rows
- Column statistics: Min/max for each column
Sizes¶
Typical file sizes: - SIP message: ~2-5 KB (with payload) - RTCP JSON: ~500 bytes - RTP: ~100-200 bytes (without payload)
With batch size 1000 and flush every 10 seconds: - ~10-50 MB per file for SIP - ~5-10 MB per file for RTCP
Throughput and Compaction Tuning (Single Node)¶
Recommended baseline for ~10k packets/sec on local disk:
- writer.worker_count: 8 (or match CPU cores if higher)
- writer.queue_size: 80000-200000 (depends on RAM, smooths bursts)
- ducklake.batch_size: 50000 (target 128-256 MB Parquet files)
- ducklake.flush_interval_sec: 15-30 (controls latency and file size)
- ducklake.compaction.check_interval_sec: 900-1800
- ducklake.compaction.snapshot_expire_interval_sec: 3600-7200
- ducklake.compaction.retention_days: set to policy (e.g. 7/30) — see Data retention
Goal: fewer small files, predictable compaction cycles, stable write latency.
Adjust batch_size and flush_interval_sec together to hit desired file size.
If needed, tune these values while monitoring files/hour and target 128-256 MB per file.
DuckLake Operations¶
Reading Data¶
-- Via DuckDB CLI
duckdb -c "
INSTALL ducklake; LOAD ducklake;
ATTACH 'ducklake:sqlite:/data/homer/homer_catalog.sqlite' AS homer_lake
(DATA_PATH '/data/homer/parquet');
SELECT * FROM homer_lake.hep_proto_1_call
WHERE date = '2025-01-27'
LIMIT 10;
"
Maintenance¶
-- Expire old snapshots
CALL ducklake_expire_snapshots('homer_lake',
older_than => CAST(NOW() - INTERVAL '1 hour' AS TIMESTAMPTZ));
-- Merge small files
CALL ducklake_merge_adjacent_files('homer_lake');
-- Delete files from expired snapshots
CALL ducklake_cleanup_old_files('homer_lake');
-- Delete orphan files
CALL ducklake_delete_orphaned_files('homer_lake');
Time Travel¶
-- Query specific snapshot
SELECT * FROM homer_lake.hep_proto_1_call AT SNAPSHOT 5;
-- Query at specific time
SELECT * FROM homer_lake.hep_proto_1_call AT TIMESTAMP '2025-01-27 10:00:00';
-- List snapshots
SELECT * FROM ducklake_snapshots('homer_lake.hep_proto_1_call');
Configuration¶
Writer config (homer-core-writer.json)¶
{
"ducklake": {
"catalog_path": "/data/homer/homer_catalog.sqlite",
"data_path": "/data/homer/parquet",
"lake_name": "homer_lake",
"batch_size": 1000,
"flush_interval": "10s"
}
}
Disk Recommendations¶
| Component | Recommendation |
|---|---|
| Catalog (SQLite) | SSD, low latency |
| Parquet data | HDD/SSD, capacity over speed |
| Temporary files | SSD for merge operations |
Storage Estimation¶
At 1000 SIP msg/sec: - ~86M messages/day - ~200-500 GB/day (with payload) - ~20-50 GB/day (without payload, metadata only)
Modular Architecture¶
┌─────────────────────────────────────────────────────────────────────┐
│ Homer Server │
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────────────┐ │
│ │ Writer │ │ Node │ │ Coordinator │ │
│ │ │ │ │ │ │ │
│ │ HEP receivers│ │ FlightSQL │ │ REST API + UI │ │
│ │ UDP/TCP/TLS │ │ Server │ │ FlightSQL Client │ │
│ │ │ │ │ │ │ │
│ │ DuckLake │ │ DuckLake │ │ Aggregates queries │ │
│ │ Writer │ │ Read-only │ │ from multiple Nodes │ │
│ └──────┬───────┘ └──────┬───────┘ └──────────────────────┘ │
│ │ │ │
│ ▼ ▼ │
│ ┌──────────────────────────────────────────────────────────────┐ │
│ │ DuckLake Storage │ │
│ │ │ │
│ │ SQLite Catalog ◄──────────────────► Parquet Files │ │
│ │ (metadata) (data) │ │
│ │ │ │
│ │ /data/homer/homer_catalog.sqlite /data/homer/parquet/ │ │
│ └──────────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────┘
See Also¶
- DuckLake README — DuckLake integration details
- STORAGE_ARCHITECTURE.md — General architecture
- DuckLake Documentation — Official documentation