Conference | Adobe Media and Data Science Research (MDSR) Laboratory

ZIPP: Zero-shot Image Personalization from Personas

Text-to-image diffusion models are increasingly deployed in creative contexts, yet remain impersonal — optimized for aggregate …

S I Harini, Somesh Singh, Yaman Kumar Singla, David Doermann, Rajiv Ratn Shah

Social Agents: Collective Intelligence Improves LLM Predictions

In human society, collective decision making has often outperformed the judgment of individuals. Classic examples range from estimating …

Aanisha Bhattacharyya, Abhilekh Borah, Yaman Kumar Singla, Rajiv Ratn Shah, Changyou Chen, Balaji Krishnamurthy

Social Agents: Collective Intelligence Improves LLM Predictions

ALPHA: Action-Based Learning for Pluralistic Human Alignment in Large Language Models

Large language models are widely used, but aligning them with societal values remains challenging. Current approaches often rely on …

Aanisha Bhattacharyya, Susmit Aggarwal, Yaman Kumar Singla, Tarun Menta, Nikitha SR, Rajiv Ratn Shah, Changyou Chen, Balaji Krishnamurthy

BrandFusion: Aligning Image Generation with Brand Styles

While recent text-to-image models excel at generating realistic content, they struggle to capture the nuanced visual characteristics …

Parul, Varun Khurana, Yaman Kumar Singla, Balaji Krishnamurthy, Abhinav Dhall

Unsupervised Memorability Modeling from Tip-of-the-Tongue Retrieval Queries

Visual content memorability has intrigued the scientific community for decades, with applications ranging widely, from understanding …

Sree Bhattacharyya, Yaman Kumar Singla, Sudhir Yarram, Somesh Singh, S I Harini, James Z. Wang

SPRO: Improving Image Generation via Self-Play

Recent advances in diffusion models have dramatically improved image fidelity and diversity. However, aligning these models with …

Ritika Jha, Aanisha Bhattacharyya, Yaman Kumar Singla, Rajiv Ratn Shah, Changyou Chen, Balaji Krishnamurthy

Evaluating Variance in Visual Question Answering Benchmarks

Multimodal large language models (MLLMs) have emerged as powerful tools for visual question answering (VQA), enabling reasoning and …

HIRE: Lightweight High-Resolution Image Feature Enrichment for Multimodal LLMs

The integration of high-resolution image features in modern multimodal large language models has demonstrated significant improvements …

Nikitha SR, Aradhya Neeraj Mathur, Tarun Ram Menta, Rishabh Jain, Mausoom Sarkar

Learning Together to Perform Better: Teaching Small-Scale LLMs to Collaborate via Preferential Rationale Tuning

LLMs such as GPT-4 have shown a remarkable ability to solve complex questions by generating step-by-step rationales. Prior works have …

Sohan Patnaik, Milan Aggarwal, Sumit Bhatia, Balaji Krishnamurthy

EOPose: Exemplar-based object reposing using Generalized Pose Correspondences

Reposing generic objects without the use of 3D models poses a significant challenge due to the absence of a standardized pose …

Sarthak Mehrotra, Rishabh Jain, Mayur Hemani, Balaji Krishnamurthy, Mausoom Sarkar