Publications

FairRAG: Fair Human Generation via Fair Retrieval Augmentation
Robik Shrestha, Yang Zou, Qiuyu Chen, Zhiheng Li, Yusheng Xie, Siqi Deng

Existing text-to-image generative models reflect or even amplify societal biases ingrained in their training data. This is especially concerning for human image generation where models are biased against certain demographic groups. Existing attempts to rectify this issue are hindered by the inherent limitations of the pre-trained models and fail to substantially improve demographic diversity. In this work, we introduce Fair Retrieval Augmented Generation (FairRAG), a novel framework that conditions pre-trained generative models on reference images retrieved from an external image database to improve fairness in human generation. FairRAG enables conditioning through a lightweight linear module that projects reference images into the textual space. To enhance fairness, FairRAG applies simple-yet-effective debiasing strategies, providing images from diverse demographic groups during the generative process. Extensive experiments demonstrate that FairRAG outperforms existing methods in terms of demographic diversity, image-text alignment, and image fidelity while incurring minimal computational overhead during inference.

Conference on Computer Vision and Pattern Recognition (CVPR 2024)

Paper Bibtex

@article{shrestha2024fairrag,
                            title={FairRAG: Fair Human Generation via Fair Retrieval Augmentation},
                            author={Shrestha, Robik and Zou, Yang and Chen, Qiuyu and Li, Zhiheng and Xie, Yusheng and Deng, Siqi},
                            journal={CVPR},
                            year={2024}
                          }

OccamNets: Mitigating Dataset Bias by Favoring Simpler Hypotheses
Robik Shrestha, Kushal Kafle, Christopher Kanan

Dataset bias and spurious correlations can significantly impair generalization in deep neural networks. Many prior efforts have addressed this problem using either alternative loss functions or sampling strategies that focus on rare patterns. We propose a new direction: modifying the network architecture to impose inductive biases that make the network robust to dataset bias. Specifically, we propose OccamNets, which are biased to favor simpler solutions by design. OccamNets have two inductive biases. First, they are biased to use as little network depth as needed for an individual example. Second, they are biased toward using fewer image locations for prediction. While OccamNets are biased toward simpler hypotheses, they can learn more complex hypotheses if necessary. In experiments, OccamNets outperform or rival state-of-the-art methods run on architectures that do not incorporate these inductive biases. Furthermore, we demonstrate that when the state-of-the-art debiasing methods are combined with OccamNets, results further improve.

European Conference on Computer Vision. (ECCV 2022)

Paper Code Bibtex

@article{shrestha2022occamnets,
                        title={OccamNets: Mitigating Dataset Bias by Favoring Simpler Hypotheses},
                        author={Shrestha, Robik and Kafle, Kushal and Kanan, Christopher},
                        booktitle={European Conference on Computer Vision (ECCV)},
                        year={2022}
                        }

An Investigation of Critical Issues in Bias Mitigation Techniques
Robik Shrestha, Kushal Kafle, Christopher Kanan

A critical problem in deep learning is that systems learn inappropriate biases, resulting in their inability to perform well on minority groups. This has led to the creation of multiple algorithms that endeavor to mitigate bias. However, it is not clear how effective these methods are. This is because study protocols differ among papers, systems are tested on datasets that fail to test many forms of bias, and systems have access to hidden knowledge or are tuned specifically to the test set. To address this, we introduce an improved evaluation protocol, sensible metrics, and a new dataset, which enables us to ask and answer critical questions about bias mitigation algorithms. We evaluate seven state-of-the-art algorithms using the same network architecture and hyperparameter selection policy across three benchmark datasets. We introduce a new dataset called Biased MNIST that enables assessment of robustness to multiple bias sources. We use Biased MNIST and a visual question answering (VQA) benchmark to assess robustness to hidden biases. Rather than only tuning to the test set distribution, we study robustness across different tuning distributions, which is critical because for many applications the test distribution may not be known during development. We find that algorithms exploit hidden biases, are unable to scale to multiple forms of bias, and are highly sensitive to the choice of tuning set. Based on our findings, we implore the community to adopt more rigorous assessment of future bias mitigation methods

IEEE/CVF Winter Conference of Applications on Computer Vision (WACV 2022)

Paper Code Bibtex

@article{shrestha2021investigation,
  title={An investigation of critical issues in bias mitigation techniques},
  author={Shrestha, Robik and Kafle, Kushal and Kanan, Christopher},
  journal={Workshop on Applications of Computer Vision},
  year={2021}
}

Detecting Spurious Correlations With Sanity Tests for Artificial Intelligence Guided Radiology Systems
Usman Mahmood, Robik Shrestha, David D. B. Bates, Lorenzo Mannelli, Giuseppe Corrias, Yusuf Emre Erdi and Christopher Kanan

Artificial intelligence (AI) has been successful at solving numerous problems in machine perception. In radiology, AI systems are rapidly evolving and show progress in guiding treatment decisions, diagnosing, localizing disease on medical images, and improving radiologists' efficiency. A critical component to deploying AI in radiology is to gain confidence in a developed system's efficacy and safety. The current gold standard approach is to conduct an analytical validation of performance on a generalization dataset from one or more institutions, followed by a clinical validation study of the system's efficacy during deployment. Clinical validation studies are time-consuming, and best practices dictate limited re-use of analytical validation data, so it is ideal to know ahead of time if a system is likely to fail analytical or clinical validation. In this paper, we describe a series of sanity tests to identify when a system performs well on development data for the wrong reasons. We illustrate the sanity tests' value by designing a deep learning system to classify pancreatic cancer seen in computed tomography scans.

Frontiers in Digital Health (2021)

Paper Bibtex

@ARTICLE{10.3389/fdgth.2021.671015,

AUTHOR={Mahmood, Usman and Shrestha, Robik and Bates, David D. B. and Mannelli, Lorenzo and Corrias, Giuseppe and Erdi, Yusuf Emre and Kanan, Christopher},

TITLE={Detecting Spurious Correlations With Sanity Tests for Artificial Intelligence Guided Radiology Systems},

JOURNAL={Frontiers in Digital Health},

VOLUME={3},

YEAR={2021},

URL={https://www.frontiersin.org/article/10.3389/fdgth.2021.671015},

DOI={10.3389/fdgth.2021.671015},

ISSN={2673-253X},

ABSTRACT={Artificial intelligence (AI) has been successful at solving numerous problems in machine perception. In radiology, AI systems are rapidly evolving and show progress in guiding treatment decisions, diagnosing, localizing disease on medical images, and improving radiologists' efficiency. A critical component to deploying AI in radiology is to gain confidence in a developed system's efficacy and safety. The current gold standard approach is to conduct an analytical validation of performance on a generalization dataset from one or more institutions, followed by a clinical validation study of the system's efficacy during deployment. Clinical validation studies are time-consuming, and best practices dictate limited re-use of analytical validation data, so it is ideal to know ahead of time if a system is likely to fail analytical or clinical validation. In this paper, we describe a series of sanity tests to identify when a system performs well on development data for the wrong reasons. We illustrate the sanity tests' value by designing a deep learning system to classify pancreatic cancer seen in computed tomography scans.}
}

A negative case analysis of visual grounding methods for VQA
Robik Shrestha, Kushal Kafle, Christopher Kanan

Existing Visual Question Answering (VQA) methods tend to exploit dataset biases and spurious statistical correlations, instead of producing right answers for the right reasons. To address this issue, recent bias mitigation methods for VQA propose to incorporate visual cues (e.g., human attention maps) to better ground the VQA models, showcasing impressive gains. However, we show that the performance improvements are not a result of improved visual grounding, but a regularization effect which prevents over-fitting to linguistic priors. For instance, we find that it is not actually necessary to provide proper, humanbased cues; random, insensible cues also result in similar improvements. Based on this observation, we propose a simpler regularization scheme that does not require any external annotations and yet achieves near state-of-the-art performance on VQA-CPv2.

Association for Computational Linguistics (ACL 2020)

Paper Code Bibtex

@inproceedings{shrestha-etal-2020-negative,
title = "A negative case analysis of visual grounding methods for {VQA}",
author = "Shrestha, Robik  and
  Kafle, Kushal  and
  Kanan, Christopher",
booktitle = "Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics",
month = jul,
year = "2020",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://www.aclweb.org/anthology/2020.acl-main.727",
pages = "8172--8181"
}

On the Value of Out-of-Distribution Testing: An Example of Goodhart's Law
Damien Teney, Kushal Kafle, Robik Shrestha, Ehsan Abbasnejad, Christopher Kanan, Anton van den Hengel

Out-of-distribution (OOD) testing is increasingly popular for evaluating a machine learning system's ability to generalize beyond the biases of a training set. OOD benchmarks are designed to present a different joint distribution of data and labels between training and test time. VQA-CP has become the standard OOD benchmark for visual question answering, but we discovered three troubling practices in its current use. First, most published methods rely on explicit knowledge of the construction of the OOD splits. They often rely on 'inverting' the distribution of labels, e.g. answering mostly 'yes' when the common training answer is 'no'. Second, the OOD test set is used for model selection. Third, a model's in-domain performance is assessed after retraining it on in-domain splits (VQA v2) that exhibit a more balanced distribution of labels. These three practices defeat the objective of evaluating generalization, and put into question the value of methods specifically designed for this dataset. We show that embarrassingly-simple methods, including one that generates answers at random, surpass the state of the art on some question types. We provide short- and long-term solutions to avoid these pitfalls and realize the benefits of OOD evaluation.

Neural Information Processing Systems (NeurIPS 2020)

Paper Bibtex


    @article{teney2020value,
  title={On the Value of Out-of-Distribution Testing: An Example of Goodhart's Law},
  author={Teney, Damien and Kafle, Kushal and Shrestha, Robik and Abbasnejad, Ehsan and Kanan, Christopher and Hengel, Anton van den},
  booktitle={Advances in neural information processing systems (NeurIPS)},
  year={2020}
}

REMIND Your Neural Network to Prevent Catastrophic Forgetting
Tyler L. Hayes, Kushal Kafle, Robik Shrestha, Manoj Acharya, and Christopher Kanan ( denotes equal contribution)

In lifelong machine learning, an agent must be incrementally updated with new knowledge, instead of having distinct train and deployment phases. For incrementally training convolutional neural network models, prior work has enabled replay by storing raw images, but this is memory intensive and not ideal for embedded agents. Here, we propose REMIND, a tensor quantization approach that enables efficient replay with tensors. Unlike other methods, REMIND is trained in a streaming manner, meaning it learns one example at a time rather than in large batches containing multiple classes. Our approach achieves state-of-the-art results for incremental class learning on the ImageNet-1K dataset. We demonstrate REMIND's generality by pioneering multi-modal incremental learning for visual question answering (VQA), which cannot be readily done with comparison models.

European Conference on Computer Vision (ECCV 2020)

Paper Code Bibtex

@article{hayes2019remind,
  title={REMIND Your Neural Network to Prevent Catastrophic Forgetting},
  author={Hayes, Tyler L and Kafle, Kushal and Shrestha, Robik and Acharya, Manoj and Kanan, Christopher},
  journal={arXiv preprint arXiv:1910.02509},
  year={2019}
}

Answering Questions about Data Visualizations using Efficient Bimodal Fusion
Kushal Kafle, Robik Shrestha, Scott Cohen, Brian Price, and Christopher Kanan

Parallel Recurrent Fusion for Chart Question Answering

Chart question answering (CQA) is a newly proposed visual question answering (VQA) task where an algorithm must answer questions about data visualizations, e.g. bar charts, pie charts, and line graphs. Here, we propose a novel CQA algorithm called parallel recurrent fusion of image and language (PReFIL). PReFIL first learns bimodal embeddings by fusing question and image features and then intelligently aggregates these learned embeddings to answer the given question. Despite its simplicity, PReFIL greatly surpasses state-of-the art systems and human baselines on both the FigureQA and DVQA datasets. Additionally, we demonstrate that PReFIL can be used to reconstruct tables by asking a series of questions about a chart.

IEEE Winter Conference on Applications of Computer Vision (WACV 2020)

Paper Bibtex

@inproceedings{kafle2020answering,
  title={Answering Questions about Data Visualizations using Efficient Bimodal Fusion},
  author={Kafle, Kushal and Shrestha, Robik and Cohen, Scott and Price, Brian and Kanan, Christopher},
  booktitle={The IEEE Winter Conference on Applications of Computer Vision},
  pages={1498--1507},
  year={2020}
}}

Challenges and Prospects in Vision and Language Research
Kushal Kafle, Robik Shrestha, and Christopher Kanan

Language grounded image understanding tasks have often been proposed as a method for evaluating progress in artificial intelligence. Ideally, these tasks should test a plethora of capabilities that integrate computer vision, reasoning, and natural language understanding. However, rather than behaving as visual Turing tests, recent studies have demonstrated state-of-the-art systems are achieving good performance through flaws in datasets and evaluation procedures. We review the current state of affairs and outline a path forward.

Frontiers in Artificial Intelligence - Language and Computation (2019)

Paper Bibtex

@article{kafle2019challenges,
  title={Challenges and Prospects in Vision and Language Research},
  author={Kafle, Kushal and Shrestha, Robik and Kanan, Christopher},
  journal={arXiv preprint arXiv:1904.09317},
  year={2019}
}

Answer Them All! Toward Universal Visual Question Answering Models
Robik Shrestha, Kushal Kafle, and Christopher Kanan

Visual Question Answering (VQA) research is split into two camps: the first focuses on VQA datasets that require natural image understanding and the second focuses on synthetic datasets that test reasoning. A good VQA algorithm should be capable of both, but only a few VQA algorithms are tested in this manner. We compare five state-of-the-art VQA algorithms across eight VQA datasets covering both domains. To make the comparison fair, all of the models are standardized as much as possible, E.g., they use the same visual features, answer vocabularies, etc. We find that methods do not generalize across the two domains. To address this problem, we propose a new VQA algorithm that rivals or exceeds the state-of-the-art for both domains.

IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2019)

Paper Code Bibtex

@inproceedings{shrestha2019ramen,
title={Answer Them All! Toward Universal Visual Question Answering Models},
  author={Shrestha, Robik and Kafle, Kushal and Kanan, Christopher},
  booktitle={CVPR},
  year={2019}
    }

Robik Shrestha

Applied Scientist at Amazon AGI

Latest News

Dec 2024: Our org launched Amazon Nova Foundation Models! You can read the technical report here.

May 2024: Started as an Applied Scientist at Amazon AGI

Feb 2024: The work done during my Amazon internship: FairRAG has been accepted to CVPR 2024

Feb 2024: Joined University of Rochester for a temporary role on building Generative AI models to optimize inertial confinement fusion

Dec 2023: Successfully defended my Ph.D.

May 2023: Started as an Applied Scientist Intern at Amazon AWS AI Labs. I am a part of the Responsible AI team

July 2022: Our paper "OccamNets: Mitigating Dataset Bias by Favoring Simpler Hypotheses " was accepted at ECCV 2022 (Oral)

Jan 2022:Our paper "An Investigation of Critical Issues in Bias Mitigation Techniques " was accepted at WACV.

June 2021:Our paper "Detecting Spurious Correlations With Sanity Tests for Artificial Intelligence Guided Radiology Systems " was accepted at Frontiers in Digital Health.

May - Sep 2021: Worked as a Research Intern at Adobe Inc!

Oct 2020:Our paper "On the Value of OOD Testing: An Example of Goodhart's Law" was accepted at NeurIPS!

April 2020:Our paper "A negative case analysis of visual grounding methods for VQA" was accepted at ACL!

Jul 2020: Our paper "REMIND Your Neural Network to Prevent Catastrophic Forgetting" was accepted at ECCV!

Publications

FairRAG: Fair Human Generation via Fair Retrieval Augmentation
Robik Shrestha, Yang Zou, Qiuyu Chen, Zhiheng Li, Yusheng Xie, Siqi Deng

Conference on Computer Vision and Pattern Recognition (CVPR 2024)

OccamNets: Mitigating Dataset Bias by Favoring Simpler Hypotheses
Robik Shrestha, Kushal Kafle, Christopher Kanan

European Conference on Computer Vision. (ECCV 2022)

An Investigation of Critical Issues in Bias Mitigation Techniques
Robik Shrestha, Kushal Kafle, Christopher Kanan

IEEE/CVF Winter Conference of Applications on Computer Vision (WACV 2022)

Detecting Spurious Correlations With Sanity Tests for Artificial Intelligence Guided Radiology Systems
Usman Mahmood, Robik Shrestha, David D. B. Bates, Lorenzo Mannelli, Giuseppe Corrias, Yusuf Emre Erdi and Christopher Kanan

Frontiers in Digital Health (2021)

A negative case analysis of visual grounding methods for VQA
Robik Shrestha, Kushal Kafle, Christopher Kanan

Association for Computational Linguistics (ACL 2020)

On the Value of Out-of-Distribution Testing: An Example of Goodhart's Law
Damien Teney, Kushal Kafle, Robik Shrestha, Ehsan Abbasnejad, Christopher Kanan, Anton van den Hengel

Neural Information Processing Systems (NeurIPS 2020)

REMIND Your Neural Network to Prevent Catastrophic Forgetting
Tyler L. Hayes, Kushal Kafle, Robik Shrestha, Manoj Acharya, and Christopher Kanan ( denotes equal contribution)

European Conference on Computer Vision (ECCV 2020)

Answering Questions about Data Visualizations using Efficient Bimodal Fusion
Kushal Kafle, Robik Shrestha, Scott Cohen, Brian Price, and Christopher Kanan

IEEE Winter Conference on Applications of Computer Vision (WACV 2020)

Challenges and Prospects in Vision and Language Research
Kushal Kafle, Robik Shrestha, and Christopher Kanan

Frontiers in Artificial Intelligence - Language and Computation (2019)

Answer Them All! Toward Universal Visual Question Answering Models
Robik Shrestha, Kushal Kafle, and Christopher Kanan

IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2019)

Robik Shrestha

Applied Scientist at Amazon AGI

Latest News

Dec 2024: Our org launched Amazon Nova Foundation Models! You can read the technical report here.

May 2024: Started as an Applied Scientist at Amazon AGI

Feb 2024: The work done during my Amazon internship: FairRAG has been accepted to CVPR 2024

Feb 2024: Joined University of Rochester for a temporary role on building Generative AI models to optimize inertial confinement fusion

Dec 2023: Successfully defended my Ph.D.

May 2023: Started as an Applied Scientist Intern at Amazon AWS AI Labs. I am a part of the Responsible AI team

July 2022: Our paper "OccamNets: Mitigating Dataset Bias by Favoring Simpler Hypotheses " was accepted at ECCV 2022 (Oral)

Jan 2022:Our paper "An Investigation of Critical Issues in Bias Mitigation Techniques " was accepted at WACV.

June 2021:Our paper "Detecting Spurious Correlations With Sanity Tests for Artificial Intelligence Guided Radiology Systems " was accepted at Frontiers in Digital Health.

May - Sep 2021: Worked as a Research Intern at Adobe Inc!

Oct 2020:Our paper "On the Value of OOD Testing: An Example of Goodhart's Law" was accepted at NeurIPS!

April 2020:Our paper "A negative case analysis of visual grounding methods for VQA" was accepted at ACL!

Jul 2020: Our paper "REMIND Your Neural Network to Prevent Catastrophic Forgetting" was accepted at ECCV!

Publications

FairRAG: Fair Human Generation via Fair Retrieval Augmentation Robik Shrestha, Yang Zou, Qiuyu Chen, Zhiheng Li, Yusheng Xie, Siqi Deng

Conference on Computer Vision and Pattern Recognition (CVPR 2024)

OccamNets: Mitigating Dataset Bias by Favoring Simpler Hypotheses Robik Shrestha, Kushal Kafle, Christopher Kanan

European Conference on Computer Vision. (ECCV 2022)

An Investigation of Critical Issues in Bias Mitigation Techniques Robik Shrestha, Kushal Kafle, Christopher Kanan

IEEE/CVF Winter Conference of Applications on Computer Vision (WACV 2022)

Detecting Spurious Correlations With Sanity Tests for Artificial Intelligence Guided Radiology Systems Usman Mahmood, Robik Shrestha, David D. B. Bates, Lorenzo Mannelli, Giuseppe Corrias, Yusuf Emre Erdi and Christopher Kanan

Frontiers in Digital Health (2021)

A negative case analysis of visual grounding methods for VQA Robik Shrestha, Kushal Kafle, Christopher Kanan

Association for Computational Linguistics (ACL 2020)

On the Value of Out-of-Distribution Testing: An Example of Goodhart's Law Damien Teney, Kushal Kafle, Robik Shrestha, Ehsan Abbasnejad, Christopher Kanan, Anton van den Hengel

Neural Information Processing Systems (NeurIPS 2020)

REMIND Your Neural Network to Prevent Catastrophic Forgetting Tyler L. Hayes*, Kushal Kafle*, Robik Shrestha*, Manoj Acharya, and Christopher Kanan (* denotes equal contribution)

European Conference on Computer Vision (ECCV 2020)

Answering Questions about Data Visualizations using Efficient Bimodal Fusion Kushal Kafle, Robik Shrestha, Scott Cohen, Brian Price, and Christopher Kanan

IEEE Winter Conference on Applications of Computer Vision (WACV 2020)

Challenges and Prospects in Vision and Language Research Kushal Kafle, Robik Shrestha, and Christopher Kanan

Frontiers in Artificial Intelligence - Language and Computation (2019)

Answer Them All! Toward Universal Visual Question Answering Models Robik Shrestha, Kushal Kafle, and Christopher Kanan

IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2019)

FairRAG: Fair Human Generation via Fair Retrieval Augmentation
Robik Shrestha, Yang Zou, Qiuyu Chen, Zhiheng Li, Yusheng Xie, Siqi Deng

OccamNets: Mitigating Dataset Bias by Favoring Simpler Hypotheses
Robik Shrestha, Kushal Kafle, Christopher Kanan

An Investigation of Critical Issues in Bias Mitigation Techniques
Robik Shrestha, Kushal Kafle, Christopher Kanan

Detecting Spurious Correlations With Sanity Tests for Artificial Intelligence Guided Radiology Systems
Usman Mahmood, Robik Shrestha, David D. B. Bates, Lorenzo Mannelli, Giuseppe Corrias, Yusuf Emre Erdi and Christopher Kanan

A negative case analysis of visual grounding methods for VQA
Robik Shrestha, Kushal Kafle, Christopher Kanan

On the Value of Out-of-Distribution Testing: An Example of Goodhart's Law
Damien Teney, Kushal Kafle, Robik Shrestha, Ehsan Abbasnejad, Christopher Kanan, Anton van den Hengel

REMIND Your Neural Network to Prevent Catastrophic Forgetting
Tyler L. Hayes, Kushal Kafle, Robik Shrestha, Manoj Acharya, and Christopher Kanan ( denotes equal contribution)

Answering Questions about Data Visualizations using Efficient Bimodal Fusion
Kushal Kafle, Robik Shrestha, Scott Cohen, Brian Price, and Christopher Kanan

Challenges and Prospects in Vision and Language Research
Kushal Kafle, Robik Shrestha, and Christopher Kanan

Answer Them All! Toward Universal Visual Question Answering Models
Robik Shrestha, Kushal Kafle, and Christopher Kanan