Live 2023 - Present (Side Project)

Zeitgaist

Cross-Lingual Social Intelligence

Ask in English. Find Chinese, Russian, Arabic takes you'd never discover otherwise.

2023

Post-ChatGPT Launch

Built while mainstream assistants were still knowledge-cutoff-limited

89%

Internal Top-10 Relevance

Dense retrieval plus reranking vs 72% embedding-only baseline

Social Platforms

Twitter, Reddit, HN, Mastodon, Bluesky, 4Chan

20+

Languages

Cross-lingual search with automatic translation

Visit Zeitgaist

The Problem

Zeitgaist started immediately after ChatGPT's first public launch, when mainstream assistants still answered from training data and a fixed knowledge cutoff rather than live retrieval. Decision-makers in finance, marketing, and research needed current public-opinion context across Twitter/X, Reddit, Hacker News, Mastodon, Bluesky, and other platforms. Manual monitoring is time-consuming, critical insights in non-English sources are missed, and traditional search lacks temporal context, source attribution, and precise filters. Ask "why did real estate prices surge recently?" against Chinese sources and you get completely different answers than English - geopolitical drivers, local policies, cultural factors that Western media doesn't cover.

The Solution

Built a unified backend serving two complementary products - an AI chatbot for conversational queries and an analytics dashboard for trend visualization. In product terms, it was a Perplexity-like answer engine with a deeper social-media index and user-controlled time, language, platform, and location filters. The retrieval system uses dense embedding search followed by cross-encoder reranking, which improved internally evaluated top-10 relevance from 72% to 89% over an embedding-only baseline. Initially tried single-stage retrieval but found accuracy degraded on cross-lingual queries - the two-stage approach improved result quality at acceptable latency (~200ms p95 in the retrieval path).

Tech Stack

Backend

Python 3.12FastAPIPostgreSQLpgvectorRedisSupabase

AI/ML

Sentence TransformersCross-EncoderGPT-4 / ClaudeFastTextRAG Pipeline

Frontend

SvelteKitTypeScriptTailwind CSSDaisyUIChart.js

Infrastructure

Docker SwarmTraefikOpenTelemetryWebSocketMessagePack

My Role: Founder & Lead Developer

Designed and built two-stage retrieval architecture with pgvector
Positioned the product as an early real-time RAG answer engine while mainstream assistants were still knowledge-cutoff-limited
Implemented Sentence Transformers retrieval with cross-encoder reranking
Built multi-language support for 20+ languages with automatic query translation
Created real-time WebSocket streaming with MessagePack binary serialization
Developed conversation memory with context-aware query reformulation
Built two SvelteKit frontends (Chat + Social dashboard)
Deployed production infrastructure with Docker Swarm and Traefik

Key Differentiators

Early Post-ChatGPT Real-Time RAG: Built live retrieval and source attribution while mainstream assistants were still knowledge-cutoff-limited

Perplexity-Like With Social Depth: Combined answer generation with a social-media index and precise time, language, platform, and location filters

Multi-Platform Aggregation: Unified search across 6 diverse social networks

Dense Retrieval + Cross-Encoder Reranking: internal relevance evaluation improved over embedding-only search

Cross-Lingual Intelligence: Ask in English, get insights from Chinese, Arabic, Russian sources

Temporal Awareness: LLM-powered understanding of time-based queries

Full Source Attribution: Every AI response includes numbered citations with timestamps

Want to discuss this experience?

I am open to full-time Senior AI/ML Platform Engineer roles where this kind of production AI, data, and platform work is useful.

View Resume Contact

Other Projects

Enterprise IoT Platform

Multi-Tenant Workflow Automation Infrastructure

Reduced customer onboarding time by 70% and eliminated 40+ hours/month of manual DevOps while the architecture was designed for 1000+ isolated workflow instances.

Foretale

No-Code Crypto Trading & Real-Time NLP Platform

Platform processed 100K+ daily data points across 7+ sources with ML-powered sentiment analysis, OCR extraction, and visual workflow automation.