A high-performance, encrypted UDP tunnel implementing a SOCKS5-compatible proxy with session multiplexing, optimized for traversing restrictive network environments.
- Architecture Overview
- Key Design Decisions & Trade-Offs
- Core Components Deep Dive
- Failure Modes & Reliability
- Security & Compliance
- Performance Insights
- Extensibility & Future Roadmap
- Setup Instructions
The system implements a split-architecture proxy where:
- Client runs locally, accepting SOCKS5 connections from browsers/applications
- Server runs on a remote VPS, relaying traffic to the internet
All traffic between client and server is tunneled over a single UDP socket using a custom binary protocol with XChaCha20-Poly1305 AEAD encryption.
flowchart LR
subgraph Local["Local Machine"]
Browser["Browser/App"]
Client["proxy-vpn client"]
SOCKS5["SOCKS5 Handler"]
CMux["Multiplexer"]
CDemux["Demultiplexer"]
end
subgraph Remote["Remote VPS"]
Server["proxy-vpn server"]
SDemux["Demultiplexer"]
SMux["Multiplexer"]
Relay["TCP Relay"]
end
subgraph Internet
Target["Target Website"]
end
Browser -->|"TCP (SOCKS5)"| SOCKS5
SOCKS5 --> Client
Client --> CMux
CMux -->|"UDP (Encrypted)"| SDemux
SDemux --> Relay
Relay -->|"TCP"| Target
Target -->|"TCP"| Relay
Relay --> SMux
SMux -->|"UDP (Encrypted)"| CDemux
CDemux --> Client
Client -->|"TCP"| Browser
| Pattern | Implementation | Rationale |
|---|---|---|
| Multiplexer/Demultiplexer | Channel-based goroutines for all UDP I/O | Single UDP socket handles N concurrent sessions |
| Session-per-Connection | SessionContext with sliding window |
Enables packet reordering over unreliable UDP |
| Interface-based Abstraction | Codec, Crypto interfaces |
Hot-swappable serialization and encryption |
| Singleton with Lazy Init | Global codec.C(), crypto.C() accessors |
Avoids dependency injection complexity |
| Object Pool | sync.Pool for 1500-byte buffers |
Zero-allocation hot path |
┌─────────────────────────────────────────────────────────────────────┐
│ Encrypted Packet │
├──────────────┬─────────────────────────────────┬────────────────────┤
│ Nonce (24B) │ Ciphertext │ Poly1305 Tag (16B) │
└──────────────┴─────────────────────────────────┴────────────────────┘
Decrypted payload structure:
┌────────────┬──────┬────────────┬──────────┬────────────────────────┐
│ SessionID │ Type │ SeqID │ Length │ Payload │
│ (4B) │ (1B) │ (4B) │ (2B) │ (variable) │
└────────────┴──────┴────────────┴──────────┴────────────────────────┘
│
└─► TYPE_CONNECT=1, TYPE_DATA=2, TYPE_FIN=3, TYPE_PING=4, TYPE_PONG=5
Header Size: 11 bytes fixed
Max Payload: 1449 bytes (1500 MTU - 11 header - 24 nonce - 16 tag)
Choice: UDP transport between client and server.
Rationale:
- Avoids TCP-over-TCP meltdown (retransmission amplification)
- Lower latency for real-time applications
- Better NAT traversal characteristics
- Mimics legitimate UDP traffic patterns (VoIP, gaming)
Trade-off: Required implementing an application-level ordering layer (sequence-based reordering) over UDP.
Choice: XChaCha20-Poly1305 (extended nonce variant)
| Factor | XChaCha20-Poly1305 | AES-GCM |
|---|---|---|
| Nonce size | 24 bytes (safe random) | 12 bytes (requires counter) |
| Hardware accel | Software-only | AES-NI available |
| Nonce collision risk | ~2^192 birthday bound | ~2^48 birthday bound |
Rationale: 24-byte random nonce eliminates nonce-management complexity—critical for UDP where packet ordering isn't guaranteed. Performance difference is marginal for tunnel workloads.
Choice: Custom binary codec with fixed offsets.
// internal/protocol/codec/binary.go
binary.BigEndian.PutUint32(buf[0:4], h.SessionID)
buf[4] = h.Type
binary.BigEndian.PutUint32(buf[5:9], h.SeqID)
binary.BigEndian.PutUint16(buf[9:11], h.Length)Rationale:
- Zero allocation on encode/decode
- Deterministic 11-byte header
- No schema evolution needed (protocol is internal)
- Protobuf/MsgPack stubs exist but are disabled—future extensibility preserved
Choice: Per-session sequence-based window with strict in-order delivery.
// internal/session/session.go
func (s *SessionContext) InsertPacket(seqID uint32, payload []byte, originalBuffer []byte) {
if seqID < s.NextSeqID {
// Drop late/duplicate packet
return
}
s.Window[seqID] = item{payload: payload, original: originalBuffer}
s.Signal <- struct{}{}
}Trade-off:
- ✅ Out-of-order arrival is buffered
- ✅ TCP byte-stream correctness is preserved (no sequence skipping)
- ❌ No retransmission (missing packets cause head-of-line blocking until they arrive)
Choice: All sessions share one UDP socket.
Architecture implications:
- Client:
Multiplexer.SendChanaggregates all outbound packets - Server:
Demultiplexerroutes incoming packets bySessionID
Trade-off: Simplifies NAT pinhole management but requires careful channel sizing (2000-5000 capacity) to prevent backpressure.
Packet → codec.Encode() → plaintext frame → crypto.Encrypt() → wire bytes
func (b *Builder) Build(p *Packet) (OutboundWork, error) {
encoded, _ := codec.C().Encode(p.Header, p.Buffer) // Header into buffer
encrypted, _ := crypto.C().Encrypt(p.Buffer, encoded) // In-place encrypt
return OutboundWork{Data: encrypted, OriginalBuffer: p.Buffer}, nil
}Key insight: Buffer reuse—p.Buffer is the allocation, and all operations write into it.
wire bytes → crypto.Decrypt() → plaintext → codec.Decode() → Packet
In-place decryption: aead.Open(enc[:0], nonce, enc, nil) overwrites ciphertext.
Each browser connection produces one SessionContext:
type SessionContext struct {
TargetConn net.Conn // Browser (client) or Website (server)
Window map[uint32]item // SeqID → payload for reordering
NextSeqID uint32 // Expected sequence
Signal chan struct{} // Flush trigger
Quit chan struct{} // Shutdown signal
ClientAddr *net.UDPAddr // Server-side: client's UDP address
}Flusher goroutine pattern:
func (s *SessionContext) runFlusher() {
for {
select {
case <-s.Signal:
s.flush() // Flush only contiguous seq: NextSeqID, NextSeqID+1, ...
case <-s.Quit:
return
}
}
}Thread-safe session lookup with sync.RWMutex:
func (r *Registry) Get(sessionID uint32) (*SessionContext, bool) {
r.mu.RLock()
defer r.mu.RUnlock()
sess, ok := r.sessions[sessionID]
return sess, ok
}Full RFC 1928 implementation supporting:
- IPv4 (
0x01) - Domain name (
0x03) - IPv6 (
0x04)
func PerformSOCKS5Handshake(conn net.Conn) (string, error) {
// 1. Read greeting (version + methods)
// 2. Reply with no-auth (0x05, 0x00)
// 3. Read CONNECT request
// 4. Parse address type and extract target
// 5. Send success reply
return net.JoinHostPort(host, port), nil
}No authentication implemented—suitable for local proxy use only.
func HandleBrowserSession(browserConn, registry, multiplexer, builder) {
targetAddr := PerformSOCKS5Handshake(browserConn)
sessID := GenerateSessionID() // atomic increment
sess := session.NewSession(browserConn)
registry.Add(sessID, sess)
// Send CONNECT packet
multiplexer.SendChan <- builder.Build(connectPacket)
// Relay loop: Browser → UDP
for {
n := browserConn.Read(buf[11:1460]) // Offset for header
pkt := NewPacket(sessID, TYPE_DATA, seqID++, payload, buf)
multiplexer.SendChan <- builder.Build(pkt)
}
}func (d *Demultiplexer) handlePacket(buf []byte, n int, clientAddr *net.UDPAddr) {
pkt := d.Parser.Parse(buf[:n], buf)
sess, ok := d.Registry.Get(pkt.Header.SessionID)
switch pkt.Header.Type {
case TYPE_CONNECT:
if !ok {
go d.setupAndRelay(sessionID, targetAddr, clientAddr)
}
case TYPE_DATA:
if ok { sess.InsertPacket(seqID, payload, buf) }
case TYPE_FIN:
if ok { d.Registry.Delete(sessionID); sess.Close() }
}
}Synchronous relay loop per session:
func (d *Demultiplexer) runTCPRelay(sess, sessionID) {
for {
n := sess.TargetConn.Read(buf[11:1460]) // Read from website
pkt := NewPacket(sessionID, TYPE_DATA, seqID++, payload, buf)
d.Multiplexer.SendChan <- OutboundPacket{Data, Addr, Buffer}
}
}type TokenBucket struct {
rate float64 // tokens/second
burst float64 // max capacity
tokens float64 // current
lastCheck time.Time
}
func (tb *TokenBucket) Wait(tokensToConsume int) {
// Blocks until tokens available
// Refills at `rate` tokens/second
}Currently disabled in main.go but infrastructure is in place for bandwidth limiting.
| Failure | Detection | Recovery |
|---|---|---|
| Packet corruption | Poly1305 auth tag verification | Drop packet, return to pool |
| Out-of-order arrival | SeqID gap in window | Buffer until missing seq arrives |
| Session timeout | Configurable idle deadline (default 120s) | Send FIN, cleanup |
| UDP write failure | Error from WriteToUDP |
Log, continue (best-effort) |
| Crypto init failure | Key length validation | panic() at startup |
defer func() {
registry.Delete(sessID)
sess.Close()
}()All handlers use deferred cleanup. Session close is guarded and idempotent, and buffered packets are returned to the pool:
func (s *SessionContext) Close() {
// Mark closed once, close conn, and release queued buffers safely.
}Logging is pervasive but unsophisticated:
log.Printf("[session %d] connection established: client=%s → target=%s (local=%s)",
sessionID, clientAddr, targetAddr, conn.LocalAddr())Current gaps: No structured logging, no metrics, no tracing.
| Property | Implementation |
|---|---|
| Confidentiality | XChaCha20 stream cipher |
| Integrity | Poly1305 MAC (16 bytes) |
| Authenticity | AEAD construction prevents tampering |
| Nonce uniqueness | 24-byte random per packet |
| Key derivation | Raw 32-byte hex from environment |
| Threat | Mitigation |
|---|---|
| Replay attacks | Implicit - no replay protection (stateless packets) |
| Traffic analysis | Partial - fixed header size, but payload length leaked |
| Key compromise | Single pre-shared key compromise is catastrophic |
| Denial of service | Rate limiting infrastructure present but unused |
# .env file
KEY="32 bit hex string" # 64 hex chars = 32 bytesWeaknesses:
- No key rotation mechanism
- Plaintext in environment file
- No authentication handshake—any party with the key can impersonate
| Operation | Time Complexity | Space Complexity |
|---|---|---|
| Packet encode | O(1) | O(1) (in-place) |
| Packet decrypt | O(n) | O(1) (in-place) |
| Session lookup | O(1) avg | O(n) sessions |
| Window insert | O(1) | O(w) window size |
| Window flush | O(k) consecutive | O(1) per item |
var bytePool = sync.Pool{
New: func() any { return make([]byte, 1500) },
}The critical packet processing path is allocation-free:
- Acquire buffer from pool (
pool.Get) - Read data into buffer with reserved header space
- Build packet using the same buffer
- Perform in-place encryption
- Send over UDP via channel
- Return buffer to pool after transmission
This design minimizes GC pressure and ensures consistent performance under load.
const MaxPacketSize = 1500 // MTU-sized- Header: 11 bytes
- Payload: up to 1449 bytes
- Encryption overhead: 40 bytes (24-byte nonce + 16-byte tag)
- Maximum ciphertext fits within standard MTU
| Component | Capacity | Purpose |
|---|---|---|
| Client Multiplexer | 2000 | Absorb burst traffic from many sessions |
| Server Multiplexer | 5000 | Handle higher concurrency on server side |
| Session Signal | 1 | Non-blocking flush notification |
These wrk measurements come from a controlled local setup and do not represent packet-loss behavior of the current strict in-order/no-retransmission implementation.
- Target:
http://example.com - Duration: 30 seconds
- Tool:
wrk(same binary for consistency) - Mode:
proxychains → proxy-vpn (SOCKS5 over UDP) - Threads: 8
- Note: Client and server were running on the same machine (no real network latency)
| Mode | Concurrency | Requests/sec | Transfer/sec |
|---|---|---|---|
| Direct | 100 | 662.92 | 545.10 KB/s |
| UDP Proxy | 100 | 552.17 | 454.03 KB/s |
| Direct | 50 | 280.89 | 230.96 KB/s |
| UDP Proxy | 50 | 672.78 | 553.21 KB/s |
| Metric | Direct | UDP Proxy |
|---|---|---|
| Avg | 130.56 ms | 111.62 ms |
| P50 | 115.09 ms | 101.73 ms |
| P75 | 139.60 ms | 134.96 ms |
| P90 | 184.07 ms | 162.36 ms |
| P99 | 382.21 ms | 214.89 ms |
| Metric | Direct | UDP Proxy |
|---|---|---|
| Avg | 101.69 ms | 35.64 ms |
| P50 | 63.53 ms | 30.72 ms |
| P75 | 93.45 ms | 41.34 ms |
| P90 | 220.86 ms | 54.85 ms |
| P99 | 589.59 ms | 90.93 ms |
| Mode | Concurrency | Read Errors | Timeouts |
|---|---|---|---|
| Direct | 100 | 0 | 96 |
| UDP Proxy | 100 | 55 | 68 |
| Direct | 50 | 0 | 87 |
| UDP Proxy | 50 | 48 | 0 |
- At high concurrency (100), the proxy achieves approximately 83% of direct throughput
- At moderate concurrency (50), the proxy outperforms the direct path in this environment
- Lower median and tail latency observed through the proxy
- At 50 concurrency, latency improves by 2–3×
- Significant reduction in P99 latency
- UDP tunneling avoids TCP-over-TCP contention
- Multiplexing reduces per-connection overhead
- Channel buffering and batching smooth traffic bursts
- Direct execution exhibits higher burst instability
- These local figures do not model packet loss behavior of the current strict in-order build
- Tests were performed locally; real-world WAN conditions are not reflected
- No real packet loss, jitter, or latency variation was present
proxychainsmodifies connection behavior- Results are influenced by buffering and scheduling effects
These results should not be directly generalized to real-world deployments.
- Minimal overhead under high load
- Improved latency characteristics in this setup
- Better tail latency due to traffic smoothing
- Not representative of real packet-loss behavior in the current strict in-order/no-retransmission design
To evaluate real network behavior under strict in-order delivery (without retransmission), Fast.com was used.
- Download Speed: ~6.4 Mbps
- Latency (Loaded): ~350 ms
Observations:
- Consistent and correct page loads
- No corrupted or partial responses
- Occasional stalls under packet loss
- Reduced throughput due to head-of-line blocking
- Download Speed: ~18 Mbps
- Latency (Loaded): ~309 ms
The performance difference reflects the current design:
- Strict in-order delivery is enforced
- No retransmission layer exists
- Missing packets block subsequent data
This results in:
- Reduced throughput under packet loss
- Increased latency during stalled periods
| Aspect | Behavior |
|---|---|
| Data correctness | Guaranteed |
| Packet loss recovery | Not implemented |
| Throughput | Reduced under loss |
| Latency under load | Increased |
The system intentionally avoids TCP-style retransmission mechanisms to prevent:
- TCP-over-TCP inefficiencies
- Complex congestion control interactions
Instead, it focuses on:
- Ordered delivery over UDP
- Minimal protocol complexity
- Clear separation between transport and reliability
The system demonstrates a UDP-based transport that preserves TCP correctness while avoiding full TCP complexity within the tunnel. This approach prioritizes correctness and simplicity, with the trade-off of reduced throughput under packet loss conditions.
- Codec Interface
type Codec interface {
Encode(h *header.Header, payload []byte) ([]byte, error)
Decode(b []byte) (*header.Header, []byte, error)
}- Crypto Interface
type Crypto interface {
Encrypt(dst, plaintext []byte) ([]byte, error)
Decrypt(ciphertext []byte) ([]byte, error)
}- Token Bucket (pre-built, currently disabled)
| Area | Improvement | Complexity |
|---|---|---|
| Reliability | Selective retransmission (ARQ) | High |
| Security | ECDH key exchange | Medium |
| Observability | Metrics and structured logging | Low |
| Performance | UDP batch I/O (recvmmsg) |
Medium |
| NAT Traversal | STUN/TURN integration | High |
| Compression | LZ4 pre-encryption | Low |
- MsgPack codec implementation
- AES-GCM crypto implementation (commented)
- Session manager with rate limiting (commented)
- Go 1.21+ (uses
golang.org/x/crypto) - UDP port accessible on server
Create .env in project root:
SERVER_ADDR=<VPS_IP>:<PORT> # Client: where to connect
SERVER_PORT=8000 # Server: port to listen
CODEC=binary
CRYPTO=chacha20
KEY=<64-hex-chars> # 32 bytes = 256-bit key
CLIENT_ADDR=127.0.0.1:1080 # Client: SOCKS5 listen address
IDLE_TIMEOUT_SECONDS=120 # Optional: idle timeout for client/server session readsGenerate a key:
openssl rand -hex 32# Server (on VPS)
go build -o vpn-server ./cmd/server
./vpn-server
# Client (locally)
go build -o vpn-client ./cmd/client
./vpn-clientConfigure browser to use SOCKS5 proxy at 127.0.0.1:1080.
proxy-vpn/
├── cmd/
│ ├── client/main.go # Client entrypoint
│ └── server/main.go # Server entrypoint
├── internal/
│ ├── client/
│ │ ├── demultiplexer.go # UDP → Session routing
│ │ ├── handler.go # Per-browser session handler
│ │ ├── multiplexer.go # Session → UDP aggregation
│ │ ├── socks5.go # SOCKS5 protocol implementation
│ │ └── utils.go # Session ID generation
│ ├── pool/
│ │ └── pool.go # sync.Pool for byte buffers
│ ├── protocol/
│ │ ├── builder.go # Packet → wire format
│ │ ├── parser.go # Wire format → Packet
│ │ ├── packet.go # Packet struct definitions
│ │ ├── codec/ # Serialization implementations
│ │ ├── crypto/ # Encryption implementations
│ │ └── header/ # Header constants and types
│ ├── server/
│ │ ├── congestion.go # Token bucket rate limiter
│ │ ├── demultiplexer.go # UDP → TCP relay per session
│ │ └── multiplexer.go # TCP → UDP response aggregation
│ └── session/
│ ├── registry.go # Thread-safe session lookup
│ └── session.go # Reordering window implementation
├── .env.example
├── go.mod
└── go.sum

