Zeitgaist
Cross-Lingual Social Intelligence
Ask in English. Find Chinese, Russian, Arabic takes you'd never discover otherwise.
89%
Relevance@10
6
Social Platforms
20+
Languages
<200ms
p95 Latency

The Problem
Decision-makers in finance, marketing, and research need to understand public opinion across Twitter/X, Reddit, Hacker News, Mastodon, Bluesky, and other platforms. Manual monitoring is time-consuming, critical insights in non-English sources are missed, and traditional search lacks temporal context and source attribution. Ask "why did real estate prices surge recently?" against Chinese sources and you get completely different answers than English - geopolitical drivers, local policies, cultural factors that Western media doesn't cover.
The Solution
Built a unified backend serving two complementary products - an AI chatbot for conversational queries and an analytics dashboard for trend visualization. The production Hybrid RAG system features two-stage retrieval (dense embedding search + cross-encoder re-ranking) — a Corrective RAG pattern that improved relevance@10 from 72% to 89% compared to dense-only retrieval. Initially tried single-stage retrieval but found accuracy degraded on cross-lingual queries - the two-stage approach with language-specific re-ranking solved this at acceptable latency (~200ms p95).
Tech Stack
Backend
AI/ML
Frontend
Infrastructure
My Role: Founder & Lead Developer
- Designed and built complete Hybrid RAG architecture with pgvector
- Implemented Corrective RAG pattern: Sentence Transformers + cross-encoder re-ranking
- Built multi-language support for 20+ languages with automatic query translation
- Created real-time WebSocket streaming with MessagePack binary serialization
- Developed conversation memory with context-aware query reformulation
- Built two SvelteKit frontends (Chat + Social dashboard)
- Deployed production infrastructure with Docker Swarm and Traefik
Key Differentiators
Multi-Platform Aggregation: Unified search across 6 diverse social networks
Hybrid RAG with Corrective Retrieval: Dense embedding + cross-encoder re-ranking for 89% relevance
Cross-Lingual Intelligence: Ask in English, get insights from Chinese, Arabic, Russian sources
Temporal Awareness: LLM-powered understanding of time-based queries
Full Source Attribution: Every AI response includes numbered citations with timestamps
Interested in a Similar Project?
Discuss a similar RAG project
Other Projects
Enterprise IoT Platform
Multi-Tenant Workflow Automation Infrastructure
Enables provisioning of 1000+ isolated workflow instances with full OpenTelemetry observability and 1-second reconciliation.
Foretale
Real-Time NLP & Multi-Source Data Fusion Platform
Platform processes 100K+ daily data points across 7+ sources with sub-second latency, running 24/7 with ML-powered sentiment analysis and OCR extraction.