Events

DMS Statistics and Data Science Seminar

Time: Nov 12, 2025 (01:00 PM)
Location: 358 Parker Hall

Details:
ahnnguyen
 
Speaker: Dr. Anh Nguyen (Computer Science Department, Auburn University)
 
Title: How to make Vision Language Models see and explain themselves
 
 
Abstract: Large Language Models, or LLMs, with their massive world knowledge learned from text, have completely changed the game. They’ve introduced a new era: vision-language models (VLMs). In these models, images and text live in the same representation space, and instead of predicting from a fixed set of labels, they draw predictions from an open vocabulary. In this talk, I will walk you through three challenges of integrating vision capabilities into LLMs: 
      First, LLMs are strongly biased, for example, they might (over)prefer the number 7 or certain names like Biden, and that bias comes straight from their training data.
      Second, a language bias is usually seen as a blessing that helps models generalize beyond training data but also a curse in vision tasks that demand careful, detailed image analysis. 
      And third, it turns out that VLMs do not have very good "eyesight" when tested on a test similar to the eye exams for humans. Because of this, VLMs can sometimes behave in ways we don’t expect, which calls for an interface that allows humans to understand the thought process of VLMs. However, there is not yet a natural way to explain VLM decisions on an image like chain of thoughts in text.
 
I’ll share my proposal for general Explainable Bottleneck, and our implementation of Part-Based Explainable and Editable Bottleneck (PEEB) networks. In fine-grained image classification, PEEB does not only explain its predictions by describing each visual part of an object, but also lets users reprogram the classifier’s logic using natural language---right at test time.