Crypto Analytics Platform
Quantitative cryptocurrency analytics platform combining C# scoring engine with Python data pipelines for real-time market intelligence.
Overview
Collaborated on a cryptocurrency analytics platform that combines quantitative scoring, real-time data processing, and algorithmic insights to evaluate digital assets. The system integrates a high-performance C# scoring engine with Python data pipelines to provide comprehensive market, fundamental, and technical analysis across multiple cryptocurrencies.
Built with a microservices architecture, the platform processes real-time market data via WebSocket connections and historical data through REST APIs, storing processed information in PostgreSQL with Redis caching for sub-second query performance.
System Architecture
High-Level Design
┌─────────────────────────────────────────────────────────────┐
│ Data Ingestion Layer │
├─────────────────────────────────────────────────────────────┤
│ REST APIs (Historical) │ WebSocket (Real-time) │
│ - Binance │ - Price feeds │
│ - CoinGecko │ - Order book updates │
│ - CryptoCompare │ - Trade streams │
└──────────────┬───────────────────────────┬──────────────────┘
│ │
▼ ▼
┌──────────────────────────┐ ┌──────────────────────────────┐
│ Python ETL Pipeline │ │ C# Scoring Engine │
│ - Data normalization │ │ - Market analysis │
│ - Feature engineering │ │ - Fundamental scoring │
│ - Time-series processing│ │ - Technical indicators │
└──────────┬───────────────┘ └────────────┬─────────────────┘
│ │
▼ ▼
┌─────────────────────────────────────────────────────────────┐
│ Data Storage Layer │
├─────────────────────────────────────────────────────────────┤
│ PostgreSQL (Persistence) │ Redis (Cache + Real-time) │
│ - Historical data │ - Live prices │
│ - Computed scores │ - Session data │
│ - Backtesting results │ - Hot data cache │
└─────────────────────────────────────────────────────────────┘
Quantitative Scoring Methodology
Composite Score Formula
The platform calculates a weighted composite score combining three analytical dimensions:
Composite Score:
S_composite = w₁·S_market + w₂·S_fundamental + w₃·S_technical
where:
w₁ + w₂ + w₃ = 1 (weights sum to 1)
S_market ∈ [0, 100]
S_fundamental ∈ [0, 100]
S_technical ∈ [0, 100]
Confidence Interval:
CI = 1 - (σ_scores / μ_scores)
where:
σ_scores = standard deviation of component scores
μ_scores = mean of component scores
Market Analysis Metrics
1. Liquidity Score
Bid-Ask Spread:
Spread% = ((P_ask - P_bid) / P_bid) × 100
where:
P_ask = best ask price
P_bid = best bid price
Market Depth:
Depth(n) = Σ(i=1 to n) (V_bid,i + V_ask,i)
where:
V_bid,i = volume at i-th bid level
V_ask,i = volume at i-th ask level
n = number of order book levels
Slippage Estimation:
Slippage(Q) = (P_execution - P_mid) / P_mid
where:
P_execution = average execution price for quantity Q
P_mid = (P_bid + P_ask) / 2
2. Volume Profile Analysis
Volume-Weighted Average Price (VWAP):
VWAP = Σ(P_i × V_i) / Σ(V_i)
where:
P_i = price at time i
V_i = volume at time i
Volume Distribution:
VD(p) = V_total(p) / V_total
where:
V_total(p) = total volume traded at price level p
V_total = total volume across all price levels
Fundamental Analysis Metrics
1. On-Chain Metrics
Network Value to Transactions (NVT) Ratio:
NVT = Market_Cap / Daily_Transaction_Volume
Interpretation:
NVT < 50 → Undervalued
NVT > 100 → Overvalued
Market Value to Realized Value (MVRV) Ratio:
MVRV = Market_Cap / Realized_Cap
where:
Realized_Cap = Σ(UTXO_i × Price_at_creation_i)
Interpretation:
MVRV > 3.5 → Potential top
MVRV < 1.0 → Potential bottom
Active Address Momentum:
AAM = (AA_current - AA_avg) / σ_AA
where:
AA_current = current active addresses
AA_avg = 30-day moving average
σ_AA = standard deviation of active addresses
2. Tokenomics Scoring
Supply Inflation Rate:
IR = (S_current - S_previous) / S_previous × 100
where:
S_current = current circulating supply
S_previous = supply 1 year ago
Holder Concentration (Gini Coefficient):
G = (Σ(i=1 to n) Σ(j=1 to n) |x_i - x_j|) / (2n²μ)
where:
x_i = holdings of address i
μ = mean holdings
G ∈ [0, 1] (0 = perfect equality, 1 = perfect inequality)
Technical Analysis Indicators
1. Momentum Indicators
Relative Strength Index (RSI):
RSI = 100 - (100 / (1 + RS))
where:
RS = Average_Gain / Average_Loss
Average_Gain = Σ(Gains over n periods) / n
Average_Loss = Σ(Losses over n periods) / n
Interpretation:
RSI > 70 → Overbought
RSI < 30 → Oversold
Moving Average Convergence Divergence (MACD):
MACD = EMA₁₂ - EMA₂₆
Signal = EMA₉(MACD)
Histogram = MACD - Signal
where:
EMA_n = Exponential Moving Average with period n
EMA_t = α·P_t + (1-α)·EMA_(t-1)
α = 2/(n+1) (smoothing factor)
2. Volatility Indicators
Bollinger Bands:
BB_middle = SMA_n(P)
BB_upper = BB_middle + k·σ_n
BB_lower = BB_middle - k·σ_n
where:
SMA_n = Simple Moving Average with period n
σ_n = standard deviation over n periods
k = 2 (typically)
%B = (P - BB_lower) / (BB_upper - BB_lower)
Average True Range (ATR):
TR = max(H - L, |H - C_prev|, |L - C_prev|)
ATR = (ATR_prev × (n-1) + TR) / n
where:
H = current high
L = current low
C_prev = previous close
n = period (typically 14)
3. Trend Indicators
Exponential Moving Average (EMA):
EMA_t = α·P_t + (1-α)·EMA_(t-1)
where:
α = 2/(n+1)
P_t = price at time t
n = period
Parabolic SAR:
SAR_t = SAR_(t-1) + AF × (EP - SAR_(t-1))
where:
AF = Acceleration Factor (starts at 0.02, max 0.20)
EP = Extreme Point (highest high or lowest low)
Statistical Models & Risk Metrics
Return Analysis
Log Returns:
r_t = ln(P_t / P_(t-1))
Properties:
- Time-additive: r_(t,t+n) = Σ(i=t to t+n) r_i
- Approximately normal for small returns
Volatility (Standard Deviation):
σ = √(Σ(r_i - μ)² / (n-1))
where:
μ = mean return
n = number of observations
Annualized: σ_annual = σ_daily × √252
Risk-Adjusted Performance Metrics
Sharpe Ratio:
SR = (R_p - R_f) / σ_p
where:
R_p = portfolio return
R_f = risk-free rate
σ_p = portfolio standard deviation
Interpretation:
SR > 1.0 → Good
SR > 2.0 → Very good
SR > 3.0 → Excellent
Sortino Ratio:
Sortino = (R_p - R_target) / σ_downside
where:
R_target = target return
σ_downside = √(Σ(min(0, r_i - R_target)²) / n)
Advantage: Only penalizes downside volatility
Maximum Drawdown:
MDD = max(DD_t) for all t
where:
DD_t = (Peak_t - Trough_t) / Peak_t
Peak_t = max(P_i) for i ≤ t
Trough_t = min(P_i) for Peak_t ≤ i ≤ t
Calmar Ratio:
Calmar = Annual_Return / |MDD|
Interpretation:
Higher is better (return per unit of drawdown risk)
Correlation & Covariance
Pearson Correlation:
ρ(X,Y) = Cov(X,Y) / (σ_X × σ_Y)
where:
Cov(X,Y) = E[(X - μ_X)(Y - μ_Y)]
ρ ∈ [-1, 1]
Beta (Market Sensitivity):
β = Cov(R_asset, R_market) / Var(R_market)
Interpretation:
β = 1 → Moves with market
β > 1 → More volatile than market
β < 1 → Less volatile than market
Backtesting Framework
Walk-Forward Optimization
Process:
1. Split data: Training (T) | Validation (V) | Out-of-sample (O)
2. Optimize parameters on T
3. Validate on V
4. Test on O
5. Roll window forward
6. Repeat
Window sizes:
Training: 60% of total period
Validation: 20% of total period
Out-of-sample: 20% of total period
Monte Carlo Simulation
Price Path Simulation:
S_t = S_(t-1) × exp((μ - σ²/2)Δt + σ√Δt × Z)
where:
S_t = price at time t
μ = drift (expected return)
σ = volatility
Δt = time step
Z ~ N(0,1) (standard normal)
Value at Risk (VaR):
VaR_α = -Quantile_α(Returns)
where:
α = confidence level (e.g., 0.95, 0.99)
Example: VaR_0.95 = $10,000
→ 95% confidence that loss won't exceed $10,000
Conditional Value at Risk (CVaR):
CVaR_α = E[Loss | Loss > VaR_α]
Advantage: Captures tail risk beyond VaR
Feature Engineering
Time-Series Features
Lag Features:
X_t-1, X_t-2, ..., X_t-n
Creates temporal dependencies
Rolling Statistics:
μ_rolling(n) = (1/n) × Σ(i=t-n+1 to t) X_i
σ_rolling(n) = √((1/n) × Σ(i=t-n+1 to t) (X_i - μ)²)
Rate of Change:
ROC_n = (P_t - P_(t-n)) / P_(t-n) × 100
Fourier Transform for Cyclical Patterns
Discrete Fourier Transform:
X_k = Σ(n=0 to N-1) x_n × e^(-2πikn/N)
where:
k = frequency component
N = number of samples
Identifies dominant cycles in price data
Data Pipeline Architecture
ETL Process Flow
Extract:
- REST API polling (1-minute intervals)
- WebSocket streaming (real-time)
- Rate limiting: 1200 requests/minute
- Retry logic: Exponential backoff
Transform:
- Data normalization (z-score, min-max)
- Missing value imputation (forward-fill, interpolation)
- Outlier detection (IQR method, z-score > 3)
- Feature scaling for ML models
Load:
- Batch inserts (PostgreSQL): 10,000 records/batch
- Stream updates (Redis): Sub-millisecond latency
- Data partitioning: By symbol and date range
- Indexing strategy: B-tree on (symbol, timestamp)
Database Optimization
PostgreSQL Time-Series Optimization:
-- Partitioning by date range
CREATE TABLE price_data_2024_01 PARTITION OF price_data
FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');
-- Hypertable for time-series (TimescaleDB)
SELECT create_hypertable('price_data', 'timestamp');
-- Continuous aggregates
CREATE MATERIALIZED VIEW hourly_ohlcv AS
SELECT
time_bucket('1 hour', timestamp) AS hour,
symbol,
first(price, timestamp) AS open,
max(price) AS high,
min(price) AS low,
last(price, timestamp) AS close,
sum(volume) AS volume
FROM price_data
GROUP BY hour, symbol;
Redis Caching Strategy:
- Hot data: TTL = 5 minutes
- Warm data: TTL = 1 hour
- Cold data: Eviction policy = LRU
- Memory limit: 4GB with maxmemory-policy
Performance Metrics
System Benchmarks
| Metric | Value | Method |
|---|---|---|
| Asset Processing | <5ms | Parallel execution (10 threads) |
| Batch Scoring (100) | <500ms | Async/await pattern |
| WebSocket Latency | <50ms | Direct TCP connection |
| DB Query (cached) | <10ms | Redis in-memory cache |
| DB Query (cold) | <100ms | Indexed PostgreSQL |
| ETL Throughput | 10K+ records/s | Bulk inserts, vectorized ops |
Scalability Characteristics
Horizontal Scaling:
- Stateless microservices
- Load balancing: Round-robin
- Database replication: Master-slave
- Cache distribution: Redis Cluster
Vertical Scaling:
- CPU: 16 cores for parallel processing
- RAM: 32GB for in-memory operations
- Storage: SSD for low-latency I/O
Technologies Demonstrated
This project showcases expertise in:
- Quantitative Finance - Statistical modeling, risk metrics, portfolio theory
- Data Engineering - ETL pipelines, time-series databases, real-time streaming
- Backend Development - C# / .NET 8, Python 3.11+, microservices
- Database Design - PostgreSQL optimization, Redis caching, partitioning
- Performance Engineering - Parallel processing, async programming, vectorization
- Financial Mathematics - Stochastic calculus, Monte Carlo methods, option pricing theory
License
Private Project - Proprietary
Status: Active Development | Architecture: Microservices | Deployment: Cloud-based