About Me
I'm Duy Nguyen, an MS Data Science student at Seattle University (GPA 4.0 — College of Science and Engineering Dean's Graduate Student Honor Roll, Winter 2026). My focus is building data systems that researchers can actually use: pipelines that are auditable, databases that reproduce the original analysis exactly, and infrastructure that outlasts the person who built it.
At Seattle University, I hold two research positions. With Dr. Brian Fischer (Mathematics), I designed a 7-table normalized MySQL database for barn owl auditory neuroscience by ingesting data from two researchers across 110 neurons, 8 owls, and ~261 experiments stored as 14,000+ files in six proprietary formats with no consistent structure. I identified 8 data quality problems, built a 6-phase ETL pipeline to resolve them, and implemented 4 separate loaders to handle the completely different internal .mat file formats each researcher used. The fitting methods alone cover two-sided asymmetric Gaussians, rate-level sigmoids with 5 physiological parameters, Akima spline interpolation, and SVD for response separability. The end result: a 40-line analysis loop that previously required 228 file loads now runs as a single SQL query. With Dr. Wenjing Yang, I'm investigating a quieter problem in medical AI: most vision RAG pipelines for clinical imaging skip measuring whether the retrieval step actually works. I'm trying to quantify that gap on mammography data and understand what it takes to fix it.
I'm looking for Summer 2026 data science or AI/ML engineering internships where the work has clear stakes and the feedback is real.
Featured Projects
MOSAIC - AI Immigration Chatbot
Built an AI-powered chatbot with SFU Blueprint for MOSAIC, serving 660,000+ users across Canadian immigration services. Recognized as Top 4 in SFU CS Diversity Award.
- Analyzed 660K+ user interactions to build structured query categorization system
- Designed knowledge graph (Neo4j) mapping programs, services, eligibility criteria
- Built validation pipeline achieving 90% accuracy with full lineage tracking
- Partnered with legal/operations teams to codify domain requirements
UC Berkeley ML/AI Capstone - Hospital Resource Optimization
Selected as Capstone Project Exemplar for UC Berkeley's ML/AI certification. Developed predictive models projecting $30.4M annual savings in hospital resource management.
- Built neural network achieving 80% accuracy predicting patient length of stay
- Analyzed 180K+ patient records to identify key predictive factors
- Quantified business impact: $30.4M savings vs baseline approach
- Integrated AI chatbot for stakeholder decision-making guidance
NASA Flight Data Analysis - Aircraft Fuel Optimization
Analyzed 1.88 million NASA flight recorder measurements to identify fuel consumption drivers. Findings challenge industry conventional wisdom about flight optimization.
- Achieved 95.9% predictive accuracy (R² = 0.959) across 312 flights
- Discovered engine performance explains 64.4% of fuel variance—2.2x more than flight planning
- Applied ANOVA, nested F-tests, variance decomposition, and interaction analysis
- Provided evidence-based recommendations prioritizing engine monitoring
Technical Skills
Languages & Tools
ML & Deep Learning
Specialties
Other Projects
Garbage Classification - Deep Learning
94% accuracy, 100% minority class recall. ResNet34 transfer learning with live demo.
Duy Integral Theorem - ML Theory
Novel mathematical framework for understanding generalization in neural networks.
SFU Faisal Lab - Medical RAG
RAG system translating natural language to JSON for CT/MRI scan retrieval.
AI Agent - ML-Business Alignment
Agent bridging ML teams and business stakeholders for strategic alignment.
Algorithm Learning Tool
Interactive visualization for mastering tree algorithms with AI feedback.
Programmatic Business Card
Print-ready business cards built with HTML/CSS/JS. Dynamic QR codes, professional design.
Contact
Email: dcnguyen060899@gmail.com
LinkedIn: https://www.linkedin.com/in/duwe-ng/