Vision-Language Models | Adobe Media and Data Science Research (MDSR) Laboratory

ZIPP: Zero-shot Image Personalization from Personas

Text-to-image diffusion models are increasingly deployed in creative contexts, yet remain impersonal — optimized for aggregate …

S I Harini, Somesh Singh, Yaman Kumar Singla, David Doermann, Rajiv Ratn Shah

BrandFusion: Aligning Image Generation with Brand Styles

While recent text-to-image models excel at generating realistic content, they struggle to capture the nuanced visual characteristics …

Parul, Varun Khurana, Yaman Kumar Singla, Balaji Krishnamurthy, Abhinav Dhall

SPRO: Improving Image Generation via Self-Play

Recent advances in diffusion models have dramatically improved image fidelity and diversity. However, aligning these models with …

Ritika Jha, Aanisha Bhattacharyya, Yaman Kumar Singla, Rajiv Ratn Shah, Changyou Chen, Balaji Krishnamurthy

Evaluating Variance in Visual Question Answering Benchmarks

Multimodal large language models (MLLMs) have emerged as powerful tools for visual question answering (VQA), enabling reasoning and …

HIRE: Lightweight High-Resolution Image Feature Enrichment for Multimodal LLMs

The integration of high-resolution image features in modern multimodal large language models has demonstrated significant improvements …

Nikitha SR, Aradhya Neeraj Mathur, Tarun Ram Menta, Rishabh Jain, Mausoom Sarkar

Measuring And Improving Engagement of Text-to-Image Generation Models

Recent advances in text-to-image generation have achieved impressive aesthetic quality, making these models usable for both personal …

Varun Khurana, Yaman Kumar Singla, Jayakumar Subramanian, Changyou Chen, Rajiv Ratn Shah, Zhiqiang Xu, Balaji Krishnamurthy

Measuring And Improving Engagement of Text-to-Image Generation Models

DeAR Debiasing vision-language models are with additive residuals

Large pre-trained vision-language models (VLMs) reduce the time for developing predictive models for various vision-grounded language …

Ashish Seth, Mayur Hemani, Chirag Agarwal.