- Conduct undergraduate AI research on MerryQuery with a focus on instructional clarity and reliability.
- Engineered a Python-based evaluation framework that improved AI teaching assistant reliability by ~50% across core computing concepts.
- Designed and implemented a fine-tuning pipeline using DPO with OpenAI’s API to align model responses with expert instructional feedback.
- Enhanced robustness by integrating RAG, improving response accuracy and resilience by ~50%.
- Co-developed a rubric-based evaluation system spanning 7 instructional dimensions for structured assessment.
- Built an automated inter-rater reliability pipeline achieving a ≥ 0.75 agreement threshold.