ML @ Riverline - Technical Assignment

General Guidelines

Keep your code short and concise.
Feel free to use any tools that help you solve this assignment.
The assignment is designed to be hard and lengthy. Starting early will help.
There are no right answers. Approach matters more than correctness.
The submission form is at the end of this document.
Reach out to us if you have any queries ([email protected])
The deadline (mentioned on the email) will be strict - no extensions! So, plan accordingly.

Problem statement

Riverline’s AI agent must decide what to do next when a customer’s issue remains open. Building a robust Next-Best-Action (NBA) system requires 3 core skills: (1) turning messy, multi-source text into a reliable daily feed, (2) learning from user-patterns, and (3) choosing the smartest follow-up. The assignment below lets you showcase all 3 using only public text datasets.

Datasets

Customer Support on Twitter (CST) – 3 M+ tweets and brand replies, with user IDs, timestamps and language tags. It is large enough to show multi-turn issues and reply patterns. (Dataset: kaggle.com)
(Bonus) Reddit MBTI – 11 773 authors labelled with Myers-Briggs type and 13 M posts; use this only if you want to enrich customer personas. (Dataset: kaggle.com)

You may reference additional open text sources for sentiment, author profiling or persona cues (e.g., Sentiment140 kaggle.com, PAN Author Profiling pan.webis.de, Synthetic-Persona-Chat github.com, Empathetic Dialogues kaggle.com) if they strengthen your logic.

Core Tasks

1. Data Pipeline

Create an ingestion job that pulls new CST records, normalises them into a single interaction table, and guarantees idempotent re-runs. You decide the storage, scheduling, and deduplication approach, just explain the rationale. Best-practice references on ML data pipelines may help but are not prescriptive

2. Observe user behavior

Users behave in their own way, leading to different “conversation-flows” (i.e. how a user responds in conversation). In this part, study patterns in user-behavior to figure out different conversation flows and see if you can tag/cohort customers based on attributes like “nature_of_support_request”, “customer_sentiment” or “conversation_history”.

Important: Some support-request-queries would have already been resolved so we want you to tag and log the count of them and then exclude them, as we will evaluate your model based on how it acts for open customer-support-tickets.