Lead Data Scientist – Synthetic Systems
Populix
- West Jakarta
- Permanent
- Full-time
- Lead the design and implementation of behavioral simulation responses and demographic patterns using generative models, statistical modeling, and controlled simulations.
- Collaborate with the research and marketing teams to create simulation-driven whitepapers and internal studies, helping communicate the value of synthetic insight across use cases like campaign testing, segmentation, and hypothetical trends.
- Drive automation of research workflows that involve open-ended responses and audio data, including pipelines for transcription, classification, summarization, and sentiment analysis.
- Work with the Head of Data Science to translate high-level product and research strategy into technical roadmaps, experiment plans, and model architecture decisions.
- Help scale our AI insight engine by contributing to Retrieval-Augmented Generation (RAG) workflows and collaborating with LLM engineers on modular pipelines for context-rich output generation.
- Collaborate closely with engineers, designers, and product teams to ship robust ML-powered tools into production across the Populix platform.
- Provide mentorship to other data scientists, sharing knowledge, reviewing modeling work, and helping maintain a culture of experimentation, reproducibility, and ethical AI.
- Master’s degree required, preferably in Computer Science, Statistics, Data Science, or a related quantitative field; PhD is a strong plus.
- 5+ years of experience in data science or applied machine learning, including at least 1 year in a technical leadership role.
- Deep experience in generative modeling (e.g., GANs, VAEs), simulation, or behavioral data modeling, with a strong grounding in statistics and hypothesis testing.
- Hands-on experience with Retrieval-Augmented Generation (RAG) architectures and knowledge integration with LLMs.
- Solid programming skills in Python and experience with tools like LangGraph, LangSmith, scikit-learn, PyTorch, Hugging Face, or equivalent frameworks.
- Familiarity with both structured (e.g., survey data) and unstructured (e.g., audio, text) data workflows, including preprocessing, feature extraction, and integration into insight pipelines
- Experienced in creating ideas and coding them into effective AI-driven solutions to real-world problems.
- Strong communication skills and the ability to translate complex modeling approaches into product or research value.
- Prior experience in market research, behavioral analytics, or social data modeling.
- Exposure to speech processing, voice-to-text systems, and sentiment detection from audio or conversational data.
- Knowledge of synthetic data generation ethics, validation strategies, and mixed-method evaluation.
- Experience working with cloud-based analytics environments and orchestration tools (e.g., BigQuery, Airflow, Kubeflow, MLflow).
- Experienced in working as an individual contributor.