Projects — Sina Torfi

LLMs NLP

Enhanced RAG

Advanced Retrieval-Augmented Generation system with multi-stage retrieval, re-ranking, and query decomposition for enterprise conversational AI.

Challenge

Enterprise conversational AI systems struggle with hallucination, outdated knowledge, and inability to reason over large proprietary corpora in real time.

Approach

Multi-stage RAG-Fusion pipeline with parallel query generation and reciprocal rank fusion (RRF) for robust document retrieval
Dense retrieval with FAISS indexing over million-scale document collections
Contrastive learning with Momentum Contrast (MoCo) for domain-adaptive embedding fine-tuning
Domain-Adaptive Pretraining (DAPT) and in-context learning for specialized verticals

Results

Significant improvement in factual grounding and answer relevance across enterprise benchmarks
Sub-second retrieval latency at million-document scale
Reduced hallucination rate through multi-hop evidence aggregation

Impact

Core architecture powering the Amazon Q Chat Engine, serving millions of enterprise users with accurate, grounded conversational AI.

LLMs RAG-Fusion FAISS MoCo DAPT RRF In-Context Learning

LLMs

Agentic Function Calling & Tool Orchestration

Multi-step agentic reasoning framework with structured function calling, tree-of-thought planning, and constrained decoding for enterprise tool orchestration.

Challenge

Enterprise LLMs must decompose complex user requests into sequences of API calls, database queries, and tool invocations — maintaining coherence across multi-hop reasoning chains while handling errors gracefully and respecting authorization boundaries.

Approach

Structured function-calling framework with JSON schema validation and type-safe tool definitions supporting parallel and sequential execution
ReAct-style iterative reasoning with tree-of-thought planning for multi-step task decomposition and dependency graph construction
Constrained decoding via grammar-guided generation (GCD) for guaranteed schema adherence and valid structured outputs
Fine-tuned function-calling models using synthetic trajectory generation and DPO on execution feedback signals — learning from whether tool calls succeed, not just human preference

Results

Significant improvement in multi-step task completion rate over chain-of-thought baselines
Near-perfect schema adherence via constrained generation, eliminating malformed tool calls
Robust error recovery and autonomous re-planning when tool calls fail mid-chain

Impact

Core orchestration layer for enterprise agentic AI, enabling complex multi-tool workflows with reliable structured outputs across diverse enterprise knowledge sources and APIs.

ReAct Tree-of-Thought Constrained Decoding DPO Function Calling JSON Schema Trajectory Synthesis

LLMs Generative AI

LLM Post-Training Alignment & Safety

End-to-end alignment pipeline combining DPO/KTO preference optimization, constitutional AI guardrails, automated red-teaming, and PII-aware decoding for enterprise LLM deployment.

Challenge

Deploying LLMs in enterprise environments demands rigorous alignment — balancing helpfulness with safety, preventing PII leakage, mitigating hallucination, and ensuring compliance with enterprise policies — where a single violation can have serious consequences.

Approach

Multi-stage post-training pipeline: supervised fine-tuning → DPO/KTO preference optimization → constitutional AI filtering, eliminating the reward model bottleneck of classical RLHF
Automated red-teaming infrastructure with adversarial prompt generation, gradient-based attack simulation, and jailbreak taxonomy coverage analysis
PII-aware decoding with named entity recognition guardrails and real-time output sanitization layers
Reward model ensembles with uncertainty quantification for calibrated refusal — the model knows when it doesn't know

Results

Significant reduction in policy violations while maintaining task performance on enterprise benchmarks
Near-zero PII leakage rate across enterprise deployment surfaces
Scalable red-teaming pipeline generating diverse adversarial test cases at thousands per hour

Impact

Safety and alignment infrastructure for trusted enterprise LLM deployment, ensuring compliant and reliable AI interactions at scale across regulated industries.

DPO KTO Constitutional AI Red-Teaming PII Detection Reward Modeling Safety Classifiers

LLMs

Efficient LLM Serving & Speculative Decoding

Production LLM serving stack combining speculative decoding, KV-cache compression, quantization-aware fine-tuning, and continuous batching for enterprise-scale inference.

Challenge

Serving large language models at enterprise scale demands sub-second latency, high throughput, and cost efficiency — while preserving output quality across diverse concurrent workloads.

Approach

Speculative decoding with a distilled draft model — small model proposes token sequences, large model verifies in parallel, improving throughput with mathematically identical outputs
KV-cache compression with grouped-query attention (GQA) and sliding window strategies for long-context serving without linear memory growth
Quantization-aware fine-tuning (AWQ/GPTQ) with task-specific calibration sets for INT4 deployment with minimal quality degradation
Dynamic continuous batching with PagedAttention for GPU memory management, maximizing hardware utilization across heterogeneous request lengths

Results

Significant inference throughput improvement via speculative decoding with mathematically identical outputs
Substantial memory reduction through KV-cache compression, enabling longer context windows on existing hardware
Production INT4 quantization with minimal quality degradation on task-specific benchmarks

Impact

Powers high-throughput, low-latency LLM serving infrastructure for enterprise-scale deployment, reducing inference cost while maintaining quality guarantees.

Speculative Decoding AWQ GPTQ PagedAttention KV-Cache GQA Continuous Batching

LLMs Generative AI

LLM Evaluation & Post-Training Optimization

Comprehensive evaluation infrastructure and post-training optimization suite for measuring, diagnosing, and improving LLM capabilities across reasoning, instruction following, safety, and enterprise task completion.

Challenge

Post-training gains are invisible without rigorous, multi-dimensional evaluation. Standard benchmarks miss critical enterprise failure modes — instruction adherence, multi-turn consistency, tool-use accuracy, and safety regressions — making it impossible to iterate confidently on SFT, DPO, or distillation without a purpose-built eval harness.

Approach

Built a modular evaluation suite spanning 12+ benchmark dimensions: MMLU, GSM8K, HumanEval, IFEval, MT-Bench, TruthfulQA, safety, toxicity, function-calling accuracy, and custom enterprise task sets
Automated pre vs. post-training regression testing — every SFT/DPO checkpoint evaluated in CI and compared against baseline with statistical significance tests
LLM-as-judge pipelines with calibrated rubrics for open-ended generation quality, factual grounding, and instruction adherence scoring
Eval-driven feedback loop: diagnosis from benchmark results → targeted training data curation → model retraining → re-evaluation, enabling data-driven quality iteration

Results

Measurable improvements across post-training benchmarks through eval-driven iteration cycles
Identified safety regressions pre-deployment that standard benchmarks missed
Accelerated post-training iteration cycles through automated evaluation and targeted data curation

Impact

Core evaluation infrastructure driving post-training decisions for enterprise LLM deployment — every model checkpoint is evaluated, compared, and promoted through a data-driven pipeline, ensuring measurable quality gains at each iteration.

Evaluation Harness LLM-as-Judge IFEval MT-Bench HumanEval Regression Testing Data Curation CI/CD

Computer Vision Multimodal

Semantic Segmentation for AR/VR

High-accuracy semantic segmentation for AR/VR eye tracking using SWIN Vision Transformers with knowledge distillation for on-device deployment.

Challenge

AR/VR devices require pixel-precise eye region segmentation at real-time speeds on power-constrained hardware, with robustness to extreme lighting and motion.

Approach

SWIN Vision Transformer backbone with UperNet decoder for multi-scale feature fusion
Knowledge distillation from large teacher to compact student model
Quantization-aware training (QAT) for INT8 on-device inference
Adversarial domain adaptation with diffusion-based augmentation for distribution shift
Active learning pipeline with weak supervision for efficient annotation

Results

State-of-the-art mIoU on internal AR/VR eye segmentation benchmark
Significant inference speedup through distillation + quantization for on-device deployment
Robust to cross-device and cross-user distribution shifts

Impact

Deployed in Meta Reality Labs AR/VR pipeline, enabling precise gaze tracking and foveated rendering for next-generation headsets.

SWIN Transformer UperNet Knowledge Distillation QAT Domain Adaptation Active Learning

Multimodal CV NLP

MultiModal Knowledge Transfer

Cross-modal knowledge transfer framework using CLIP and Vision Transformers for zero-shot Visual Question Answering and Image Captioning.

Challenge

Bridging vision and language modalities for VQA and captioning requires massive paired datasets. Enabling zero-shot transfer to new domains without task-specific fine-tuning remains an open problem.

Approach

CLIP-based vision-language alignment with cross-attention fusion layers
Contrastive learning objectives for joint embedding space optimization
Masked language modeling + pseudo-labeling for self-training on unlabeled data
Distributed training across multi-GPU clusters for billion-parameter models

Results

Zero-shot generalization to unseen visual domains and question types
Competitive with supervised baselines using 10x less labeled data
Scalable to billion-parameter models with linear training efficiency

Impact

Framework adopted across Meta product surfaces for visual understanding tasks, reducing annotation cost and enabling rapid domain expansion.

CLIP Vision Transformers Cross-Attention Contrastive Learning Pseudo-Labeling Distributed Training

LLMs NLP

Low-Resource LLMs

Cross-lingual language understanding using self-supervised transformers for underrepresented languages with minimal labeled data.

Challenge

Most NLP advances concentrate on high-resource languages. Extending LLM capabilities to hundreds of low-resource languages requires novel transfer learning and data augmentation strategies.

Approach

Multilingual pre-training with mBERT and XLM-R on cross-lingual corpora
Cross-lingual alignment via adversarial training on shared embedding spaces
Back-translation augmentation to synthetically expand low-resource training data
Masked language modeling (MLM) fine-tuning with language-adaptive layers

Results

Strong performance gains on NER, classification, and QA tasks for low-resource languages
Effective zero-shot cross-lingual transfer from high-resource to unseen languages

Impact

Enables language understanding for underrepresented populations, supporting equitable AI deployment across global markets.

mBERT XLM-R Cross-Lingual Transfer Adversarial Training Back-Translation MLM

Computer Vision Healthcare

Cancer Detection

AI-driven pathology analysis system for tumor detection and localization using hierarchical CNNs and self-supervised learning on whole-slide images.

Challenge

Manual pathology review is slow and error-prone. Whole-slide images are gigapixel-scale, requiring architectures that handle extreme resolution while maintaining fine-grained localization.

Approach

Hierarchical CNN with attention mechanisms for multi-scale feature extraction from gigapixel slides
Self-supervised pre-training on unlabeled pathology images to learn robust histological representations
Transfer learning from ImageNet with progressive fine-tuning
End-to-end deployment on AWS SageMaker with Lambda-based inference API

Results

High sensitivity and specificity for tumor detection on clinical datasets
Precise tumor localization with attention-guided heatmaps
Scalable to clinical throughput on cloud infrastructure

Impact

Accelerates pathology workflows and provides decision support for clinicians, reducing diagnostic turnaround time.

Hierarchical CNN Self-Supervised Learning Attention Mechanisms AWS SageMaker Transfer Learning

Healthcare Generative AI

Disease Prediction

Adversarial AI and causal inference framework for unbiased disease prediction, combining GANs with Double ML for robust diagnostic models.

Challenge

Clinical prediction models inherit biases from training data — demographic, socioeconomic, and selection biases — leading to disparate outcomes across patient populations.

Approach

Adversarial learning to decorrelate predictions from sensitive attributes
Double ML framework for causal effect estimation under confounding
VQ-VAE and GAN-based synthetic data augmentation for minority groups
Clustering-based patient stratification for personalized risk scoring

Results

Measurable reduction in prediction disparity across demographic groups
Maintained clinical accuracy while improving fairness metrics
Robust causal estimates under confounding scenarios

Impact

Advances equitable healthcare AI by ensuring diagnostic models perform fairly across all patient populations.

Adversarial Learning Double ML VQ-VAE GANs Causal Inference Patient Stratification

Generative AI Healthcare

Synthetic Data Generation

Privacy-preserving synthetic medical data using diffusion models and convolutional GANs with formal differential privacy guarantees.

Challenge

Healthcare AI research is bottlenecked by data access — patient privacy regulations (HIPAA) prevent sharing real medical records, limiting model development and reproducibility.

Approach

Convolutional GAN architecture with differential privacy (DP-SGD) for formal privacy guarantees
Diffusion model pipelines with controlled noise scheduling for high-fidelity generation
Privacy auditing via membership inference attacks to validate protection
Statistical fidelity metrics ensuring synthetic data preserves clinical distributions

Results

Synthetic data passes privacy audits while maintaining downstream model utility
Published in Information Sciences (140+ citations)
Enables HIPAA-compliant data sharing for multi-site research

Impact

Unlocks healthcare AI research by providing shareable, privacy-safe synthetic datasets — cited 140+ times and adopted by research groups globally.

Diffusion Models DP-SGD Convolutional GANs Privacy Auditing FAISS U-Net

Recommender Systems NLP

Deep Sequence Recommender

Production recommender system using Transformer-XL and meta-learning for temporal-aware personalization at billion-user scale.

Challenge

User preferences evolve over time and new users lack interaction history. Traditional collaborative filtering fails to capture temporal dynamics and suffers from cold-start problems.

Approach

Transformer-XL architecture for long-range sequential dependency modeling
Model-Agnostic Meta-Learning (MAML) for few-shot cold-start user adaptation
NLP-enriched item representations using BERT and contextual embeddings
Production deployment with ONNX/TensorRT optimization and Kubernetes orchestration

Results

Improvement in recommendation relevance metrics over production baseline
Effective cold-start handling with meta-learned user priors
Sub-100ms inference latency at billion-scale with optimized serving

Impact

Deployed in Meta's production recommendation surfaces, directly impacting user engagement and content discovery at scale.

Transformer-XL MAML BERT ONNX TensorRT Kubernetes