AI Product Engineer · Kodamai · open to relocation & remote

I build the unglamorous parts of machine learning systems

Production agents, LLM pipelines, and the backend systems that carry them at scale. I write Python and Julia when the GIL gets in the way, and I patch the ML libraries I depend on. Most of my work is the layer where AI systems quietly succeed or fail: retrieval quality, evaluation, failover, deployment.

3+
years in ML
#5
google/flax · 12mo
20
commits in flax core
~10TB
doc pipeline
01020304050607
01 — about

What I actually do

I build production AI systems: agents, LLM pipelines, and automated workflows that replace manual processes in real business operations. My background spans healthcare, supply chain, enterprise automation, and open-source ML infrastructure.

Right now I'm an AI Product Engineer at Kodamai, productizing Kelvingrove — an enterprise agent platform that validates LLM outputs against typed interfaces before execution. I joined as a Forward Deployed AI Engineer working directly with clients, and moved to the product team in May 2026. Before that I was the sole data scientist on a physician-facing clinical AI platform at Exora AI, validated with in-house physicians and pilot clinics before live patient use.

I also own the systems that carry these models in production: APIs, data pipelines, queues, caches, observability. Most of my work is as much about the systems around the model as the model itself. And I read other people's source code the way some people read novels — if something is broken I have a hard time leaving it alone, which is how most of my open-source patches happened.

02 — work

Experience

AI Product Engineer · Kodamai
May 2026 — Present
Glasgow, UK · Remote

Product transition team, turning Kelvingrove (typed-interface agent platform) into customer-facing products.

  • Moved a GIL-bound Python ingestion path to multithreaded Julia for a ~15x speedup across multi-plant device fleets.
  • Building Python/Julia adapters that connect Kelvingrove's typed composition primitives to client enterprise systems; defining product APIs and launch documentation.
JuliaPythonKelvingroveMCP
Forward Deployed AI Engineer · Kodamai
Nov 2025 — May 2026
Glasgow, UK · Remote

Enterprise AI delivery for supply chain and manufacturing clients.

  • Madfo3 — accounts-payable automation on GCP Vertex AI: VLM-based invoice extraction over Arabic and English documents, 3-way PO/GR/invoice matching, discrepancy classification, approval routing. 94%+ extraction accuracy across hundreds of pilot invoices, on track to cut a 15–20 day manual cycle to under 5 days.
  • Improved retrieval accuracy 18% on Arabic and 7% on English content by tuning hybrid dense + sparse search with reranking and benchmarking chunking strategies.
  • Built a multimodal extraction pipeline over a ~10TB Arabic/English corpus (OCR, Docling, VLM fallback) with per-document-type quality checks.
  • Hudoor — multi-plant biometric time-and-attendance ETL (ZKTeco → rules-based cleansing → automated SAP load) for a major KSA manufacturer.
GCP Vertex AIRAGDoclingSAP
Senior Data Scientist · Exora AI
Jul 2024 — Nov 2025
Singapore · Remote

Sole data scientist on LoqumAI, a clinical AI platform for triage, consultation, and SOAP documentation. Owned the AI/ML layer end-to-end plus backend services and deployment.

  • Built the triage service (real-time SSE streaming, patient profiling, risk stratification), grounded via hybrid RAG over Singapore clinical sources; validated with physicians and pilot clinics before live patient use.
  • Cut end-to-end latency ~30% with async orchestration, caching, and routing lightweight queries to smaller local models, reserving reasoning models for complex triage.
  • Designed the multi-provider orchestration layer (hosted APIs primary, local-model fallback) serving 7+ clinical task types with circuit-breaker failover.
  • Ran 50 automated, LLM-driven test conversations daily against the triage service, scored by an async DeepEval service over RabbitMQ on relevance, toxicity, and consistency.
  • Medical RAG pipeline: hybrid dense + BM25 retrieval over Qdrant, multimodal image analysis (GPT-4 Vision with LLaVA fallback, CLIP similarity), Speechmatics STT for clinical voice.
  • Owned reliability: Docker Swarm blue/green zero-downtime releases, GitHub Actions CI/CD with secret scanning and approval gates, OpenTelemetry tracing.
RAGDeepEvalQdrantSpeechmaticsDocker Swarm
Machine Learning Engineer · DiveDeepAI
Apr 2023 — Jan 2024
Islamabad, PK

Built the data layer of an e-commerce price-intelligence platform.

  • ETL pipelines ingesting 5,000+ product and pricing records daily from Amazon, eBay, and Costco — site-specific extraction, anti-bot handling, validation — feeding the team's downstream pricing models.
  • Owned the serving layer: FastAPI on self-managed AWS EC2, PostgreSQL, Redis, Celery, with CI/CD introduced from scratch.
FastAPICeleryAWSPostgreSQL
Machine Learning Engineer · Upwork (Freelance)
Jul 2022 — Jul 2024
Remote

Top Rated, 100% Job Success. 20+ delivered projects for clients across the US, UK, and Europe.

  • ComfyUI content-generation pipeline on Azure for Ossa.ai; content fine-tuning for Visme.
  • LLM fine-tuning (LoRA), computer vision (YOLOv5, segmentation), model quantization/compression, and classical ML across diverse domains.
LoRAYOLOv5QuantizationNLP
03 — open source

Merged patches in core ML & language infrastructure

All in my own time, alongside full-time work. Every line below links to the real pull request.

google/flax
#5 · last 12 months★ 6.4k
@@20 commits · +1,075 / −506 merged into Flax NNX core
+fixed nnx.tabulate crash on empty / None values  #4891
+fixed variable-hook display bugs in the module system  #5008
both shipped in v0.12.1 · required working through JAX functional-transform internals
python/cpython
★ 63kC / Python
+fixed a hard-crash assertion in the C text-I/O core Modules/_io/textio.c  #141331
backported to Python 3.13 and 3.14
+documentation fixes across re, asyncio, io, and the object model  #144696
uber/causalml
★ 5.1kPython
+fixed estimation_sample_size not propagating to individual trees  #850
+fixed ValueError on read-only arrays in BaseSLearner.predict()  #878
+fixed seed TypeError in BaseDRLearner bootstrap CI  #879
chaoss/augur
Linux Foundation
+fixed hardcoded collection intervals  #3346
credited in the official v0.91.0 release notes

↗ full activity: github.com/mohsinm-dev

04 — projects

Things I built on my own time

PyTorch · research

KDA Attention

A from-scratch PyTorch implementation of Kimi Delta Attention — a gated linear recurrence with fast-weight memory for efficient long-context modeling. Multi-head layer with trainable decay/learning-rate gates, separate training (scan) and generation (recurrent) modes, and a test suite verifying mathematical correctness against a naive reference.

github.com/mohsinm-dev/kda-attention →
MCP · LLM tooling

Racing MCP Server

An open-source Model Context Protocol server connecting Claude and other LLMs to live horse-racing data: 35+ tools spanning entity resolution, racecards, results, and statistical and breeding analysis, with TTL caching, rate limiting, and both stdio and HTTP/SSE transports.

github.com/mohsinm-dev/racing-mcp-server →
05 — research interests

What I'm thinking about

/ 01

Reliable agentic systems

Most agents are overconfident about their own success rates. In the Exora clinical work this was the real failure mode: on ambiguous input the system would act autonomously when it should have escalated. Benchmarks did not catch it — only a physician-in-the-loop eval did. I'm interested in explicit confidence gating, escalation policies a reviewer can actually read, and evals built from real failure modes rather than academic test sets. The harder the domain, the more this matters.

/ 02

LLM efficiency & quantization

The gap between a model that works in a demo and one that's affordable under real load is mostly engineering: caching, routing between large and small models, and knowing which requests actually need the expensive path.

/ 03

Continual & reinforcement learning

Active reading thread, not yet production work — how systems keep learning from their own deployment without forgetting what already works.

06 — skills

Stack

LLMs & Agents
LangGraphLangChainPydanticAIMCP serversRAGmulti-provider routingfailover
Evaluation
DeepEvalLLM-as-judgeautomated conversation testingregression suites
Core ML & Fine-tuning
PyTorchJAX / Flax (NNX)XGBoostscikit-learnHF TransformersLoRAquantization
Backend & Infra
FastAPIasync workersRabbitMQRedisPostgreSQLQdrantDocker SwarmGCP Vertex AIAWSOpenTelemetry
Languages
PythonJuliaSQLBashC++ (DS & algorithms)C (CPython core)
07 — education

Education

B.Sc. Computer Science
2019 — 2023
Capital University of Science & Technology · Islamabad
  • Founded the first Google Developer Student Club (GDSC) at CUST.
  • Competed in the ICPC Regional Preliminary round.
  • Volunteered in an inclusive learning program for children with special needs.
→ contact

Working on something that has to stay reliable under load?

I take on a small number of outside projects at a time. Usually reply same-day.