About Riverline
Riverline is building the bank of 2035 which will leverage AI to serve personalized, financial services to millions of credit-starved Indians.
As you know, today at Riverline, we have AI agents deployed across call, WhatsApp & email, which communicate with thousands of borrowers everyday, who are unable to repay and communicate to understand their situation and help them repay. These AI agents are always here to listen. They never get tired and they persistently follow up.
We’re a team from IIT Madras, CMU and Navi, backed by South Park Commons, GradCapital and DeVC - using AI to supercharge credit-wellness in India.
What we’re offering you
| CTC |
₹40L - ₹90L (base + ESOPs) - varies based on experience |
| Location |
HSR Layout, Bengaluru |
| Nature of work |
In-person, 6 days a week (Mon-Sat) |
| Joining |
Immediate |
| Benefits |
Sponsored healthy lunch and dinner, gym subscription, learning budget, no limits on tech tools you like. |

What you'll own
- Build and maintain the evaluation infrastructure that measures prompt and model quality numerically — tracking feedback loops (goal → monitor → response) and shipping improvements directly to prod
- Set up and own end-to-end tracing across our voice, WhatsApp and email agents — so every LLM output is explainable, auditable, and feeds a dataset for future agent improvement
- Build a changelog system that auto-logs every change to STT, TTS, LLM, voice-prompt and PTP-prompt — with date, change type, performance snapshot, author and remarks — so the team can iterate fast, roll back instantly and measure net impact
- Set up a test sandbox for prompt experiments and model comparisons; instrument red-teaming via promptfoo and explore lightweight SLMs via nanochat
- Define and solve ML problems within agent workflows — improving prompt design and tool-calling through rigorous numerical experiments
- Build toward self-evolving systems that automate the discovery and resolution of ML problems in production
What we're looking for
- 2–3 years of hands-on experience in classical/statistical ML and running evals in production AI applications
- Familiarity with agentic frameworks, LLMs and speech models (STT/TTS/VAD)
- Experience setting up observability and tracing for LLM systems — you should know what a bad LLM output looks like and be able to instrument the system to catch it
- Comfort with prompt engineering as a rigorous discipline: A/B testing prompts, measuring regressions, shipping and rolling back with confidence