ML Engineer @ Riverline

About Riverline

Riverline is building the bank of 2035 which will leverage AI to serve personalized, financial services to millions of credit-starved Indians.

As you know, today at Riverline, we have AI agents deployed across call, WhatsApp & email, which communicate with thousands of borrowers everyday, who are unable to repay and communicate to understand their situation and help them repay. These AI agents are always here to listen. They never get tired and they persistently follow up.

We’re a team from IIT Madras, CMU and Navi, backed by South Park Commons, GradCapital and DeVC - using AI to supercharge credit-wellness in India.

What we’re offering you

CTC	₹40L - ₹90L (base + ESOPs) - varies based on experience
Location	HSR Layout, Bengaluru
Nature of work	In-person, 6 days a week (Mon-Sat)
Joining	Immediate
Benefits	Sponsored healthy lunch and dinner, gym subscription, learning budget, no limits on tech tools you like.

What you'll own

Build and maintain the evaluation infrastructure that measures prompt and model quality numerically — tracking feedback loops (goal → monitor → response) and shipping improvements directly to prod
Set up and own end-to-end tracing across our voice, WhatsApp and email agents — so every LLM output is explainable, auditable, and feeds a dataset for future agent improvement
Build a changelog system that auto-logs every change to STT, TTS, LLM, voice-prompt and PTP-prompt — with date, change type, performance snapshot, author and remarks — so the team can iterate fast, roll back instantly and measure net impact
Set up a test sandbox for prompt experiments and model comparisons; instrument red-teaming via promptfoo and explore lightweight SLMs via nanochat
Define and solve ML problems within agent workflows — improving prompt design and tool-calling through rigorous numerical experiments
Build toward self-evolving systems that automate the discovery and resolution of ML problems in production

What we're looking for

2–3 years of hands-on experience in classical/statistical ML and running evals in production AI applications
Familiarity with agentic frameworks, LLMs and speech models (STT/TTS/VAD)
Experience setting up observability and tracing for LLM systems — you should know what a bad LLM output looks like and be able to instrument the system to catch it
Comfort with prompt engineering as a rigorous discipline: A/B testing prompts, measuring regressions, shipping and rolling back with confidence