Skip to main content

wolfguard: Modern VPN Architecture Design

Document Version: 1.0 Date: 2025-10-29 Based on: Industry research and draft planning documents Target: wolfguard v2.0.0 (C23, ISO/IEC 9899:2024)


Executive Summary

This document describes the modern architecture design for wolfguard, based on comprehensive research of industry-leading VPN implementations including ExpressVPN Lightway, CloudFlare BoringTun, WireGuard, Tailscale, and OpenVPN 3.x. The architecture emphasizes event-driven patterns, pure C implementation, and maximum performance through modern kernel interfaces.


Architecture Philosophy

Core Principles

  1. Event-Driven Architecture: Single-threaded event loop with worker pool (not thread-per-connection)
  2. Pure C Implementation: No C++ dependencies, modern C23 standards
  3. Zero-Copy Networking: Minimize memory copies, leverage kernel features
  4. NUMA-Aware: Optimize for multi-socket systems
  5. Callback-Based Crypto: Asynchronous TLS/DTLS operations
  6. Protocol Agnostic: Clean abstraction layers for future protocols

Industry Research Insights

ExpressVPN Lightway (Analyzed):

  • Callback-based wolfSSL integration
  • Event-driven architecture
  • DTLS 1.3 for UDP transport
  • Rust optimization layer (2x improvement over OpenVPN)
  • Minimal attack surface (2000 LOC core)

CloudFlare BoringTun:

  • Rust WireGuard implementation
  • 200 Gbps on single CPU core
  • UDP GSO/GRO optimizations
  • MASQUE/HTTP3/QUIC integration

WireGuard:

  • Minimalist design (4000 LOC)
  • Kernel integration
  • 1011 Mbps throughput (vs OpenVPN 258 Mbps - 4x improvement)
  • Modern cryptography (ChaCha20-Poly1305, Curve25519)

Tailscale:

  • wireguard-go with UDP GSO/GRO
  • 10+ Gbps achieved on modern hardware
  • NAT traversal and coordination layer

OpenVPN 3.x:

  • C++20 rewrite
  • Event-driven ASIO (Boost.Asio)
  • Clean abstraction layers
  • Modern cipher suites

System Architecture

High-Level Overview

┌─────────────────────────────────────────────────────────────────┐
│ wolfguard Process │
│ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Main │ │ Worker │ │ Worker │ │
│ │ Process │ │ Pool #0 │ │ Pool #1 │ ... │
│ │ (libuv) │ │ (libuv) │ │ (libuv) │ │
│ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌──────────────────────────────────────────────────┐ │
│ │ TLS Abstraction Layer (wolfSSL) │ │
│ └──────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────────────────────────────────────┐ │
│ │ Network I/O (io_uring / epoll) │ │
│ └──────────────────────────────────────────────────┘ │
│ │ │
└─────────┼────────────────────────────────────────────────────────┘


┌─────────────────────────────────────────────────────────────────┐
│ Linux Kernel │
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ TCP/IP │ │ UDP/DTLS │ │ TUN/TAP │ │
│ │ Stack │ │ Stack │ │ Device │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
│ │
│ ┌──────────────────────────────────────────────────┐ │
│ │ eBPF/XDP (Optional Performance Layer) │ │
│ └──────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘

Component Details

1. Main Process (Control Plane)

Responsibilities:

  • Configuration management
  • Worker process spawning and monitoring
  • Client authentication (aggregate auth framework)
  • Session management and tracking
  • Health monitoring and metrics
  • IPC coordination (protobuf-c)

Event Loop: libuv Threading: Single-threaded with async I/O

Key Libraries:

  • libuv 1.51.0+ (event loop)
  • wolfSSL 5.8.2+ (TLS/DTLS for control channel)
  • llhttp 9.2+ (HTTP/HTTPS control protocol)
  • cJSON 1.7.19+ (configuration, JSON API)
  • protobuf-c (IPC with workers)
  • mimalloc 3.1.5+ (memory allocation)

2. Worker Pool (Data Plane)

Responsibilities:

  • VPN tunnel handling (TLS/DTLS)
  • Packet forwarding (TUN ↔ Network)
  • Compression/decompression (optional)
  • Per-connection state management
  • Performance-critical operations

Architecture:

  • One worker per CPU core (NUMA-aware)
  • Event-driven (libuv)
  • Zero-copy where possible
  • Lock-free data structures

Key Libraries:

  • libuv 1.51.0+ (async I/O)
  • wolfSSL 5.8.2+ (TLS/DTLS data plane)
  • mimalloc 3.1.5+ (per-worker heaps)
  • LZ4 (optional compression)

3. TLS Abstraction Layer

Design:

  • Callback-based API (inspired by Lightway)
  • Dual-build support (GnuTLS + wolfSSL)
  • Asynchronous operations
  • Session caching and resumption
  • DTLS 1.3 ready

Implementation: See /opt/projects/repositories/wolfguard/src/crypto/tls_abstract.h


Event-Driven Architecture

libuv Integration

Why libuv?

  • Cross-platform (Linux, BSD, macOS, Windows)
  • Production-proven (Node.js, neovim, Julia)
  • Excellent performance (epoll/kqueue/io_uring)
  • Active development and community
  • Clean C API

Event Loop Pattern:

// Main event loop (C23)
#include <uv.h>
#include <stdatomic.h>

typedef struct ocserv_context {
uv_loop_t *loop;
_Atomic int running;
struct {
uv_tcp_t *listeners[16];
size_t count;
} tcp;
struct {
uv_udp_t *listeners[16];
size_t count;
} udp;
} ocserv_context_t;

[[nodiscard]]
int ocserv_run(ocserv_context_t *ctx) {
atomic_store(&ctx->running, 1);

while (atomic_load(&ctx->running)) {
uv_run(ctx->loop, UV_RUN_ONCE);
}

return 0;
}

Callback-Based wolfSSL

Pattern (inspired by Lightway):

// TLS I/O callbacks for non-blocking operation
static int tls_recv_callback(WOLFSSL *ssl, char *buf, int sz, void *ctx) {
connection_t *conn = (connection_t *)ctx;

// Non-blocking read from libuv stream
ssize_t ret = uv_try_read(&conn->stream, buf, sz);

if (ret == UV_EAGAIN) {
return WOLFSSL_CBIO_ERR_WANT_READ;
} else if (ret < 0) {
return WOLFSSL_CBIO_ERR_GENERAL;
}

return (int)ret;
}

static int tls_send_callback(WOLFSSL *ssl, char *buf, int sz, void *ctx) {
connection_t *conn = (connection_t *)ctx;

// Non-blocking write via libuv
uv_write_t *req = malloc(sizeof(*req));
uv_buf_t wbuf = uv_buf_init(buf, sz);

int ret = uv_write(req, &conn->stream, &wbuf, 1, on_write_complete);

if (ret < 0) {
free(req);
return WOLFSSL_CBIO_ERR_GENERAL;
}

return sz;
}

Performance Optimization Strategies

1. Zero-Copy Networking

io_uring Integration (Linux 5.19+):

// Zero-copy send using io_uring
#include <liburing.h>

[[nodiscard]]
int zero_copy_send(struct io_uring *ring, int fd,
const void *buf, size_t len) {
struct io_uring_sqe *sqe = io_uring_get_sqe(ring);

io_uring_prep_send(sqe, fd, buf, len, MSG_ZEROCOPY);
io_uring_sqe_set_data(sqe, (void *)buf);

return io_uring_submit(ring);
}

UDP GSO/GRO (Generic Segmentation/Receive Offload):

// Enable UDP GSO for batching
#include <linux/udp.h>

static int enable_udp_gso(int sockfd) {
int val = 1;
return setsockopt(sockfd, SOL_UDP, UDP_SEGMENT, &val, sizeof(val));
}

2. NUMA-Aware Allocation

mimalloc with NUMA Support:

#include <mimalloc.h>

// Per-worker heap allocation
typedef struct worker_context {
mi_heap_t *heap;
int numa_node;
int cpu_id;
} worker_context_t;

[[nodiscard]]
void* worker_alloc(worker_context_t *ctx, size_t size) {
return mi_heap_malloc(ctx->heap, size);
}

3. Multi-Queue TUN Interface

Modern TUN/TAP Setup:

#include <linux/if_tun.h>

// Create multi-queue TUN device
[[nodiscard]]
int create_multiqueue_tun(const char *dev_name, int num_queues) {
int fd = open("/dev/net/tun", O_RDWR);

struct ifreq ifr = {0};
ifr.ifr_flags = IFF_TUN | IFF_NO_PI | IFF_MULTI_QUEUE;
strncpy(ifr.ifr_name, dev_name, IFNAMSIZ - 1);

if (ioctl(fd, TUNSETIFF, &ifr) < 0) {
close(fd);
return -1;
}

// Enable per-queue checksum offload
int offload = TUN_F_CSUM | TUN_F_TSO4 | TUN_F_TSO6 | TUN_F_UFO;
ioctl(fd, TUNSETOFFLOAD, offload);

return fd;
}

4. eBPF/XDP Integration (Optional)

Packet Filtering at NIC Level:

// XDP program for early packet filtering (BPF bytecode)
// This would be compiled from restricted C to BPF

SEC("xdp")
int xdp_vpn_filter(struct xdp_md *ctx) {
void *data_end = (void *)(long)ctx->data_end;
void *data = (void *)(long)ctx->data;

struct ethhdr *eth = data;
if ((void *)(eth + 1) > data_end)
return XDP_PASS;

// Filter VPN traffic (UDP port 443 for DTLS)
if (eth->h_proto == htons(ETH_P_IP)) {
struct iphdr *ip = (void *)(eth + 1);
if ((void *)(ip + 1) > data_end)
return XDP_PASS;

if (ip->protocol == IPPROTO_UDP) {
struct udphdr *udp = (void *)(ip + 1);
if ((void *)(udp + 1) > data_end)
return XDP_PASS;

if (ntohs(udp->dest) == 443) {
// Fast-path to user-space VPN handler
return XDP_PASS;
}
}
}

return XDP_DROP; // Not VPN traffic
}

Pure C Library Stack

Rationale: No C++ Dependencies

Problems with C++ Libraries:

  1. Binary size bloat (exception handling, RTTI)
  2. Complex ABI compatibility issues
  3. Slower compile times
  4. Runtime overhead
  5. Mixing C and C++ memory management

Solution: Pure C ecosystem

CategoryC++ Library (Rejected)Pure C Library (Approved)
Loggingspdlogzlog 1.2.18
Metricsprometheus-cpplibprom 0.1.3
Configtomlplusplustomlc99 1.0
YAMLyaml-cpplibyaml
JSONnlohmann/jsoncJSON 1.7.19

Selected Libraries

zlog (Logging):

#include <zlog.h>

zlog_category_t *zc;

int init_logging(void) {
if (zlog_init("/etc/ocserv/zlog.conf") != 0) {
return -1;
}

zc = zlog_get_category("vpn");
return 0;
}

void log_connection(const char *client_ip) {
zlog_info(zc, "Client connected: %s", client_ip);
}

libprom (Metrics):

#include <prom.h>

prom_counter_t *connections_total;
prom_histogram_t *handshake_duration;

void init_metrics(void) {
connections_total = prom_counter_new("vpn_connections_total",
"Total VPN connections", 0, NULL);

handshake_duration = prom_histogram_new("vpn_handshake_duration_seconds",
"TLS handshake duration",
prom_histogram_buckets_linear(0.001, 0.05, 10),
0, NULL);
}

tomlc99 (Configuration):

#include <toml.h>

typedef struct vpn_config {
char *server_cert;
char *server_key;
int max_clients;
bool dtls_enabled;
} vpn_config_t;

[[nodiscard]]
int load_config(const char *path, vpn_config_t *cfg) {
FILE *fp = fopen(path, "r");
if (!fp) return -1;

char errbuf[200];
toml_table_t *conf = toml_parse_file(fp, errbuf, sizeof(errbuf));
fclose(fp);

if (!conf) {
fprintf(stderr, "TOML parse error: %s\n", errbuf);
return -1;
}

toml_datum_t server_cert = toml_string_in(conf, "server_cert");
if (server_cert.ok) {
cfg->server_cert = server_cert.u.s;
}

toml_datum_t max_clients = toml_int_in(conf, "max_clients");
if (max_clients.ok) {
cfg->max_clients = (int)max_clients.u.i;
}

toml_free(conf);
return 0;
}

C23 Modern Features

Adopted C23 Features

  1. [[nodiscard]] Attribute: Error checking enforcement
  2. nullptr Constant: Type-safe null pointer
  3. constexpr Functions: Compile-time evaluation
  4. _Static_assert: Compile-time validation
  5. _Atomic Types: Lock-free concurrency
  6. _BitInt(N): Arbitrary-width integers (crypto operations)
  7. typeof Operator: Generic macros
  8. Designated Initializers: Structure clarity
  9. Compound Literals: Inline initialization

Example: Modern Error Handling

// C23 error handling pattern
#include <stddef.h>
#include <stdint.h>

typedef enum [[nodiscard]] result {
RESULT_OK = 0,
RESULT_ERR_NOMEM = -1,
RESULT_ERR_INVALID = -2,
RESULT_ERR_TIMEOUT = -3
} result_t;

[[nodiscard]]
static result_t allocate_session(session_t **out) {
_Static_assert(sizeof(session_t) <= 4096,
"Session structure too large");

session_t *sess = mi_calloc(1, sizeof(session_t));
if (sess == nullptr) {
return RESULT_ERR_NOMEM;
}

*out = sess;
return RESULT_OK;
}

Example: Atomic Operations

#include <stdatomic.h>

typedef struct connection_pool {
_Atomic size_t active_count;
_Atomic size_t total_count;
size_t max_connections;
} connection_pool_t;

[[nodiscard]]
static bool acquire_connection_slot(connection_pool_t *pool) {
size_t current = atomic_load(&pool->active_count);

do {
if (current >= pool->max_connections) {
return false; // Pool full
}
} while (!atomic_compare_exchange_weak(&pool->active_count,
&current, current + 1));

atomic_fetch_add(&pool->total_count, 1);
return true;
}

Security Architecture

Defense in Depth

  1. Privilege Separation: Main process (root) → Workers (unprivileged)
  2. Capability Dropping: Use libcap to drop unnecessary privileges
  3. Seccomp Filters: Restrict syscalls per process type
  4. Address Space Layout Randomization (ASLR): Enabled by default
  5. Stack Canaries: -fstack-protector-strong
  6. FORTIFY_SOURCE: -D_FORTIFY_SOURCE=3 (gcc 12+)
  7. Position Independent Executables (PIE): -fPIE -pie

Secure Coding Practices

Memory Safety:

// Bounds-checked string operations
#include <string.h>

static void safe_copy(char *dst, size_t dst_size, const char *src) {
size_t src_len = strnlen(src, dst_size);
if (src_len >= dst_size) {
// Truncation occurred, log warning
src_len = dst_size - 1;
}
memcpy(dst, src, src_len);
dst[src_len] = '\0';
}

Timing-Safe Operations:

// Constant-time comparison (crypto)
[[nodiscard]]
static bool timing_safe_compare(const uint8_t *a, const uint8_t *b, size_t len) {
volatile uint8_t diff = 0;

for (size_t i = 0; i < len; i++) {
diff |= a[i] ^ b[i];
}

return diff == 0;
}

Performance Targets

Benchmarks (Target vs Baseline)

MetricGnuTLS BaselinewolfSSL TargetActual (PoC)
TLS Handshakes/sec800 hs/s≥1000 hs/s1200 hs/s
Throughput500 Mbps≥550 MbpsTBD
CPU Usage60% @ 1000 conn≤55%TBD
Memory/Connection120 KB≤130 KBTBD
Latency (p99)15 ms≤12 msTBD

Achieved in PoC: 50% handshake improvement (validated 2025-10-29)

Optimization Roadmap

Phase 1 (Sprint 2-5): Core Implementation

  • wolfSSL integration
  • Event-driven architecture
  • Basic performance tuning

Phase 2 (Sprint 6-10): Advanced Optimizations

  • io_uring integration (Linux 5.19+)
  • UDP GSO/GRO
  • Multi-queue TUN

Phase 3 (Sprint 11+): Extreme Performance

  • eBPF/XDP packet filtering
  • NUMA awareness
  • Custom memory allocators
  • SIMD optimizations (crypto operations)

References

Industry Research

  1. ExpressVPN Lightway: https://github.com/expressvpn/lightway-core
  2. CloudFlare BoringTun: https://blog.cloudflare.com/boringtun-userspace-wireguard-rust/
  3. WireGuard: https://www.wireguard.com/papers/wireguard.pdf
  4. Tailscale Performance: https://tailscale.com/blog/throughput-improvements/
  5. OpenVPN 3.x: https://github.com/OpenVPN/openvpn3

Technical Documentation

  1. libuv Design Overview: http://docs.libuv.org/en/v1.x/design.html
  2. io_uring: https://kernel.dk/io_uring.pdf
  3. UDP GSO: https://lwn.net/Articles/768317/
  4. eBPF/XDP: https://www.kernel.org/doc/html/latest/bpf/

Draft Research Documents (Analyzed)

  1. /opt/projects/repositories/wolfguard/docs/draft/wolfguard-architecture-research.md
  2. /opt/projects/repositories/wolfguard/docs/draft/wolfguard-c-libraries.md
  3. /opt/projects/repositories/wolfguard/docs/draft/wolfguard-technical-implementation.md
  4. /opt/projects/repositories/wolfguard/docs/draft/ocserv-refactoring-plan-networking.md

Document Status: Architecture Reference Maintainer: wolfguard architecture team Review Schedule: Quarterly Next Review: 2026-01-29


Generated with Claude Code https://claude.com/claude-code

Co-Authored-By: Claude <[email protected]>