Available for new opportunities

Data Scientist &
Engineer

Specializing in high-performance data pipelines, LLMs, and scalable ML solutions. "Just gets stuff done."

About Me

I am an accomplished Data Scientist and Engineer with hands-on experience at top-tier companies like Google (via acquisition) and OpenAI (via Turing). I specialize in architecting scalable, end-to-end machine learning solutions and optimizing critical ETL pipelines.

Currently completing a Master’s in Data Science at SUNY Buffalo, holding a previous Master’s from IIT Madras. I am a V-shaped all-rounder passionate about LLMs, MLOps, and Heavy Engineering.

Professional Experience

Python Developer (RLHF)

Dec 2022 – May 2024

Turing Inc. (Client: OpenAI)

  • Authored and curated a high-quality dataset of 800+ novel programming conversations, directly improving model reasoning.
  • Maintained a top-tier quality score of 6.7/7 (over 95%) for training data.
  • Contributed to GPT-4o, yielding a 5× performance and speed improvement over GPT-4.

Lead Data Scientist

Dec 2022 – Oct 2023

Argonaut AI (Acquired by Google)

  • Client: TeamCare: Optimized a critical ETL pipeline, achieving a 24× performance increase (8 mins to 20 secs).
  • Engineered backend services to generate vector embeddings for LLM consumption.
  • Client: Aspirion: Architected a self-service "AI Studio" enabling non-technical teams to test LLM chatbots.

Senior Software Engineer

Aug 2019 – Nov 2022

HCLTech Ltd.

Awarded "O’Infinity Achiever" trophy.

  • Client: Rubrik Inc: Slashed regression testing cycles by 67% (3 weeks to 1 week) via full-scale automation.
  • Client: Pure Storage: Engineered a scalable, S3-compatible distributed object storage system using Node.js and Docker Swarm.

Selected Projects

ML-Driven Protein Function Prediction

Python GCN

Developed a pipeline using a custom graph-aware k-NN model achieving 3x higher F1-score than baseline GCNs. Reduces experimental validation workload by over 90%.

View Code

Automated Content Summary Notifier

Gemini API Docker

Engineered a Python app to monitor Substack blogs and generate AI-powered summaries using Gemini API, delivered via Postmark. Containerized with Docker.

View Code

MUBI Movie Data Analysis

SQL Python

End-to-end analysis of the MUBI dataset. Built a normalized SQLite database from raw CSVs to uncover subscriber behavior insights.

View Report

Traffic Signal Control w/ RL

RL SUMo

Master's Thesis. Adaptive traffic signal system using Reinforcement Learning. Achieved a 6.5-hour reduction in total vehicle waiting time for 8,000 vehicles.

View Thesis

Technical Skills

Languages

Python C/C++ SQL JavaScript Golang

AI / LLM

LangChain Llama Index VertexAI Gemini

DevOps

AWS GCP Docker Ansible Jenkins

Frameworks

FastAPI Django Node.js

Education

State University of New York (SUNY Buffalo)

Master of Science in Engineering Sciences (Data Science)

Expected May 2026 Buffalo, NY

Indian Institute of Technology Madras (IITM)

Master of Technology in Computer Science and Engineering

2017 – 2019 Chennai, India

SGGSIE&T

Bachelor of Technology in Computer Science and Engineering

2013 – 2017 Nanded, India

Ready to Collaborate?

I am always eager to learn something new and grow. Looking for new challenges in Data Science and Heavy Engineering.

© 2025 Akash Ankush Kamble. All rights reserved.