RiverScript:how it was built

This is a presentation of a solo SaaS project developed from scratch over 6 months. RiverScript is a workspace for speech-to-text conversion powered by AI. It is designed for people who work with calls, YouTube videos, webinars, podcasts, and long-form content every day.

About this project

An AI SaaS. Built solo in 6 months.

Built independently: no tasks, no meetings. Just deep focus and day-to-day work.

Solo SaaS Project
6 Months Development
Launch: June 2026

Superpowers

What makes RiverScript special.

Heavy files without slowdowns

A desktop client built with Tauri + Rust handles files up to 50 GB. No full cloud uploads — speed and bandwidth savings are dramatically better right away.

Desktop Audio Capture

You can record and transcribe audio from any application or browser tab directly. WASAPI on Windows, ScreenCaptureKit on macOS. Youtube, meetings, streams - everything is captured without extra steps.

Fully on-device Voice Activity

Silero VAD + ONNX Runtime runs locally. Silence is removed before anything is sent to the server. This significantly reduces costs, bandwidth usage, AI hallucinations and transcription time.

Supports basically everything

Works with any audio or video format supported by FFmpeg (MP3, MP4, MKV, MOV, FLAC, OGG, WEBM, and 100+ more). If FFmpeg can read it - RiverScript can transcribe it.

Tech stack: how this was built

Fully cross-platform application

One frontend, three platforms. Where possible - we split the logic. Where necessary - we use native capabilities.

Web Application

Next.js
React
TypeScript
Tailwind CSS
Zustand
NextAuth

Desktop client · Windows

Tauri
Rust
React
FFmpeg
WASAPI
Silero VAD
ONNX Runtime

Desktop client · macOS

Tauri
Rust
React
FFmpeg
Screen Capture Kit
Silero VAD
ONNX Runtime

Infrastructure

Self-hosted. Multi-server Architecture.

I didn’t want to depend on third-party cloud services like Vercel and Supabase, so I built my own self-hosted and reliable infrastructure using four Hetzner servers.

1

Web Server

CoolifyNext.jsRedisUmamiBarman
2

Rust Worker

Audio & video processingFFmpegVAD pipelineRedis queue
3

DB Primary

PostgreSQL 16pgBouncerPostgREST
4

DB Replica

PostgreSQL 16Streaming replicationAuto-failoverFloating IP

Tools & services

Coolify
Docker
GitHub
Redis
PostgreSQL
pgBouncer
PostgREST
Barman

AI Integrations

Multi-provider AI. Automatic fallback.

If one provider goes down, the system instantly switches to another.

Speech to Text

Self-hosted Whisper V3
Deepgram Nova-3
ElevenLabs Scribe v2

Text Processing · LLM

W&B Qwen3
OpenAI GPT-4o
DeepInfra Qwen3

Audio Processing

Silero VAD
ONNX Runtime

Monitoring & reliability

Full transparency of everything happening.

I can see everything happening in the system - in real time.

Grafana Cloud
Sentry
PostHog
Umami
UptimeRobot

Third-party services

Integrations that make it run.

Polar
Resend
Cloudflare
Hetzner

The pre-launch demo is live.

The landing page is locked until June 2026 - but the app is already fully deployed, functional, and available on the free tier right now. You can start using it right away.