Mohsin Mehmood

Mohsin
Mehmood.

AI / ML engineer working across core ML, vision-language models, and production agentic systems.

Based in
Wah, Pakistan
Role
Forward Deployed
Connect
Currently at Kodamai · Forward Deployed AI Engineer
§ 01

About.

About

I build production AI systems: agents, LLM pipelines, and automated workflows that replace manual processes in real business operations. My background spans healthcare, supply chain, enterprise automation, and open-source ML infrastructure.

Currently working as a Forward Deployed AI Engineer at Kodamai, where I act as the primary technical partner for enterprise clients across multiple simultaneous engagements. Previously built a physician-facing clinical AI platform at Exora AI that passed HIPAA and Singapore PDPA audit and was approved for live patient use.

My foundation is in classical and deep ML: XGBoost pipelines at scale, causal inference, computer vision (YOLOv11, segmentation, SAM 2), and fine-tuning foundation models with LoRA and modern quantization (AWQ, GPTQ, GGUF). I work daily with the current generation of vision-language models (Qwen3-VL, GLM-4.6V, InternVL3.5, MiniCPM-V) on document intelligence, medical imaging, ANPR, and long-context multimodal reasoning that now stretches to 256K interleaved tokens.

I contribute to open-source ML projects in my own time. I am ranked #5 on the official Google/Flax contributor leaderboard over the last twelve months, with patches merged into the Flax NNX core. I have six merged PRs in CPython and three in Uber/CausalML.

§ 02

Work.

Nov 2025 – Present Glasgow · Remote

Forward Deployed AI Engineer at Kodamai

Building production agentic systems across multiple enterprise client engagements. Current projects include Madfo3 (accounts payable automation with LangGraph and SAP/ERP integration), Nazir (ANPR with active learning, YOLOv11 vehicle detection plus Qwen3-VL verification on edge), a 20TB document intelligence pipeline on GCP Vertex AI using Qwen3-VL and layout-aware embeddings served through vLLM, and a warehouse logistics automation system with SAP HANA integration.

Jul 2024 – Nov 2025 Singapore · Remote

Lead Data Scientist at Exora AI

Designed and shipped a multi-agent clinical AI assistant serving live physician workflows. Covered the full stack: LangGraph orchestration, RAG pipelines with hybrid vector search, Whisper-based STT, multimodal embeddings (SigLIP-2 + InternVL3) for clinical imagery, PHI-compliant observability across seven microservices, and evaluation frameworks tracking clinical competency across model versions. Reduced LLM latency 35% and cost 25%. Passed HIPAA and Singapore PDPA audit.

Jan – May 2024 Toronto · Remote

Senior ML Engineer at The Quell App

Sole engineer on a pre-seed health and wellness startup. Built and shipped the full GenAI product in six weeks. Evaluated 8+ LLMs, selected and fine-tuned Mistral-7B with LoRA, built self-hosted inference with quantized serving, safety guardrails, and A/B evaluation framework. Demo contributed to closing a $250K pre-seed round.

Apr 2023 – Jan 2024 Islamabad, PK

ML Engineer at DiveDeepAI

Built a production price-intelligence pipeline. XGBoost models at ~12% MAPE, ~5,000 products per week at 90%+ accuracy on AWS EC2 with FastAPI, PostgreSQL, Redis, and Celery. Feature engineering across tabular and scraped data. ~99% uptime.

Jan 2022 – Apr 2023 Top-Rated · Upwork

ML Engineer at Upwork

Delivered LLM fine-tuning (LoRA, QLoRA), NLP pipelines, computer vision systems (YOLOv5, CNN architectures, segmentation), and classical ML / forecasting for international clients across diverse domains.

§ 03

Open source.

Google / Flax
#5 · last 12 months
20 commits
+1,075 additions
−506 deletions
Top contributor on the official leaderboard. Merged PRs in Flax NNX core: PR #4891 fixed nnx.tabulate crash with empty/None values. PR #5008 fixed variable hook display bugs. Both shipped in v0.12.1.
CPython
6 merged PRs
PR #141331 fixed TextIOWrapper.tell() assertion failure with standalone carriage return (backported to 3.13 and 3.14). PR #144696 fixed re.Match.group() doc claiming [1..99] range limit. Also documented asyncio Task cancellation propagation and inaccurate object-comparison docs.
Uber / CausalML
3 merged PRs
PR #850 fixed estimation_sample_size not propagating to individual trees in UpliftRandomForestClassifier. PR #878 fixed ValueError on read-only arrays in BaseSLearner.predict(). PR #879 fixed seed parameter TypeError in BaseDRLearner bootstrap CI.
§ 04

Research.

i.

Reliable agentic systems

How to build agents that know when to act and when to escalate. Most agents are overconfident about their own success rates. This matters more in regulated domains like healthcare and finance, where a wrong autonomous action has real consequences.

ii.

Evaluation for AI systems

Offline benchmarks (MMMU, MathVista, MMStar, DocVQA) often do not predict production quality. I am interested in evaluation methods that catch what actually matters: user outcomes, business metrics, and real-world failure modes that benchmarks systematically miss.

iii.

LLM efficiency & quantization

Quantization (AWQ, GPTQ, GGUF, FP8), context compression, distillation, and SLM routing for deployment in environments with strict latency, cost, or data-residency requirements. Currently exploring how quantization affects reasoning quality in non-English languages, Urdu specifically, where the damage is often invisible to standard automatic metrics.

iv.

Vision-language models in applied settings

How well current VLMs (Qwen3-VL, GLM-4.6V, InternVL3.5, Llama 4 multimodal, Gemini 3) actually generalize to messy real-world data: medical imagery, scanned documents, surveillance feeds, long-form video. Particular interest in long-context multimodal reasoning at 256K+ tokens and the gap between MMMU/MathVision scores and production reliability.

v.

Multilingual and low-resource LLMs

Building systems that work reliably for speakers of non-Latin script languages, where current models, tokenizers, and evaluation infrastructure are weakest. Particular interest in how training choices (tokenization, data mixture, instruction tuning) propagate into downstream failure modes for users of under-represented languages.

§ 05

Skills.

Stack
Languages
Python · C++ · Bash · TypeScript (working)
Core ML & DL
PyTorch · JAX / Flax (NNX) · scikit-learn · XGBoost · LightGBM · causal inference (uplift modeling, DR / S / T-learners) · time-series forecasting · Bayesian methods
Vision-Language Models
Qwen3-VL · GLM-4.6V · InternVL3 / 3.5 · Llama 4 multimodal · MiniCPM-V · Molmo · Kimi-VL · Gemma 3 · PaliGemma 2 · DeepSeek-VL2 · Tarsier2 · Gemini 3 Vision · GPT-5 vision (long-context, video, layout-aware)
Vision Encoders & CV
SigLIP-2 · DINOv3 · CLIP · EVA-CLIP · YOLOv8 / v11 · SAM 2 · Grounding DINO · Donut · LayoutLMv3 · Nougat · OCR (PaddleOCR, dots.ocr) · ANPR · segmentation
LLMs & Agents
LangChain · LangGraph · PydanticAI · MCP servers · multi-agent orchestration · RAG · hybrid search · RAGAS · prompt engineering · tool use
Fine-Tuning & Efficiency
LoRA · QLoRA · DoRA · SFT · RLHF · DPO · GRPO · knowledge distillation · quantization (AWQ, GPTQ, GGUF, FP8) · HuggingFace Transformers · Axolotl · Unsloth
Inference & Serving
vLLM · SGLang · TensorRT-LLM · llama.cpp · Ollama · Triton · BentoML · KServe
Multimodal Evals
MMMU · MathVista · MathVision · MMStar · DocVQA · ChartQA · OCRBench · Video-MME · custom production evals
Infrastructure
FastAPI · PostgreSQL · pgvector · Redis · Kafka · Docker · Kubernetes · CI/CD · GCP Vertex AI · AWS · MLflow · W&B
Compliance
HIPAA · Singapore PDPA · PHI / PII redaction · audit trail design
§ 06

Education.

Formal
B.Sc. Computer Science
Capital University of Science & Technology, Islamabad
2019 – 2023
§ 07

Get in touch.

Available for production AI engagements and research collaborations.

mohsinmahmood675@gmail.com