Hi, I'm Priyanka Kargupta!

Let's make both models and humans think critically.

I am a third year PhD candidate working in natural language processing at University of Illinois at Urbana-Champaign, advised by Prof. Jiawei Han. I am supported by the NSF Graduate Research Fellowship. Prior to this, I was working on 3D scene representations with Prof. Ren Ng at the University of California, Berkeley, funded by the Intel SRC Fellowship.

My research aims to improve critical and creative reasoning in both models and their human users by exploiting structure in both their knowledge and reasoning.

My work explores this in both the scientific (AI for Research) and educational (LLMs + Education) domains. Specifically, I am interested in exploring how to structure and unstructure the reasoning of both models and humans, as well as structure the knowledge critical to guiding such reasoning.

Email / CV / Scholar / Twitter / Github / Linkedin

Highlights

Just released my latest work on Cognitive Foundations for Reasoning and Their Manifestation in LLMs!
I spent my summer interning at Microsoft Research! Working on incentivizing creative ideation in LLMs
I am giving an oral presentation for Tree-of-Debate and EpiMine at ACL 2025!
Four of my first-authored papers were accepted to ACL 2025 Mains!
I presented TreeInstruct at EMNLP'24.
I received an EMNLP'24 Outstanding Reviewer Award!
I presented my poster at the NeurIPS'24 workshop for Large Foundation Models for Educational Assessment.
TreeInstruct is accepted to EMNLP'24 Findings!
I received the 2024 NSF Graduate Research Fellowship.

Research

My research begins by exploring how explicit structure can empower both models and researchers to reason more effectively about scientific literature. By embedding structured frameworks into the reasoning process, we can leverage LLMs to augment critical tasks, such as analyzing nuanced scientific findings, tracing the evolution of multidimensional research contributions, and distinguishing novel insights from existing work. However, novel scientific research is inherently open-ended and creative, which has led me to investigate the interplay between structure, flexibility, and creativity. I hypothesize that (1) structure enables the systematic navigation of complex knowledge and problems, (2) flexibility enables the ability to breaking free from unhelpful structures, and (3) creativity guides the exploration of new, promising approaches that can then be tackled again through structure. Ultimately, my work seeks to uncover how structured, critical reasoning and adaptive creativity can be integrated to drive human–LLM scientific discovery.

	Tree-of-Debate: Multi-Persona Debate Trees Elicit Critical Thinking for Scientific Comparative Analysis Priyanka Kargupta, Ishika Agarwal, Tal August, Jiawei Han ACL'25 Oral Paper / Code Determining significant novelties, incremental findings, and equivalent approaches between works is challenging, especially when the papers are not explicitly connected through citations. In order to elicit the critical reasoning required for comprehending the contribution degree of a paper, we propose converting the papers to LLM personas which debate one another. In other words, we propose a tree-of-debate (ToD), where we focus more on the personas' comparative reasoning induced by the debate, as opposed to its final outcome. ToD can dynamically construct a debate tree to reason about fine-grained arguments discussed in scholarly articles.
	Synergizing Unsupervised Episode Detection with LLMs for Large-Scale News Events Priyanka Kargupta, Yunyi Zhang, Yizhu Jiao, Siru Ouyang, Jiawei Han ACL 2025 Oral Paper / Code Introduces a novel task, episode detection, which identifies episodes within a news corpus of key event articles. Detecting episodes poses unique challenges, as they lack explicit temporal or locational markers and cannot be merged using semantic similarity alone. While large language models (LLMs) can aid with these reasoning difficulties, they suffer with long contexts typical of news corpora. To address these challenges, we introduce EpiMine, an unsupervised framework that identifies a key event's candidate episodes by leveraging natural episodic partitions in articles, estimated through shifts in discriminative term combinations. These candidate episodes are more cohesive and representative of true episodes, synergizing with LLMs to better interpret and refine them into final episodes.
	Beyond True or False: Retrieval-Augmented Hierarchical Analysis of Nuanced Claims Priyanka Kargupta, Runchu Tian, Jiawei Han ACL 2025 Main Conference Paper / Code Scientific and political claims are often nuanced and are not strictly “true” or “false” (e.g., Vaccine A is better than B). However, a claim (e.g., "vaccine A is better than vaccine B") can be dissected into its integral aspects and sub-aspects (e.g., efficacy, safety, distribution), which are individually easier to validate. Thus, we propose ClaimSpect, a retrieval-augmented generation-based framework for automatically constructing a hierarchy of aspects typically considered when addressing a claim and enriching them with corpus-specific perspectives. This structure hierarchically partitions an input corpus to retrieve relevant segments, which assist in discovering new sub-aspects. Moreover, these segments enable the discovery of varying perspectives towards an aspect of the claim (e.g., support, neutral, or oppose) and their respective prevalence (e.g., "how many biomedical papers believe vaccine A is more transportable than B?").
	TaxoAdapt: Aligning LLM-Based Multidimensional Taxonomy Construction to Evolving Research Corpora Priyanka Kargupta, Nan Zhang, Yunyi Zhang, Rui Zhang, Prasenjit Mitra, Jiawei Han ACL 2025 Main Conference* Paper / Code TaxoAdapt is a framework that dynamically adapts an LLM-generated taxonomy to a given corpus across multiple dimensions. TaxoAdapt performs iterative hierarchical classification, expanding both the taxonomy width and depth based on corpus' topical distribution. We demonstrate its state-of-the-art performance across a diverse set of computer science conferences over the years to showcase its ability to structure and capture the evolution of scientific fields. As a multidimensional method, TaxoAdapt generates taxonomies that are 26.51% more granularity-preserving and 50.41% more coherent than the most competitive baselines judged by LLMs.
	Instruct, Not Assist: LLM-based Multi-Turn Planning and Hierarchical Questioning for Socratic Code Debugging Priyanka Kargupta, Ishika Agarwal, Dilek Hakkani-Tur, Jiawei Han EMNLP'24 Findings Paper / Code An Instructor agent guided by a novel state space-based planning algorithm. TreeInstruct asks probing questions to help students independently identify and resolve errors. It estimates a student's conceptual and syntactical knowledge to dynamically construct a question tree based on their responses and current knowledge state, effectively addressing both independent and dependent mistakes concurrently in a multi-turn interaction setting.
	MEGClass: Extremely Weakly Supervised Text Classification via Mutually-Enhancing Text Granularities Priyanka Kargupta, Tanay Komarlu, Susik Yoon, Xuan Wang, Jiawei Han EMNLP'23 Findings Paper / Code An extremely weakly-supervised text classification method that leverages Mutually-Enhancing Text Granularities. MEGClass utilizes coarse- and fine-grained context signals obtained by jointly considering a document's most class-indicative words and sentences. This approach enables the learning of a contextualized document representation that captures the most discriminative class indicators.
	Reaction miner: An integrated system for chemical reaction extraction from textual data Ming Zhong, Siru Ouyang, Yizhu Jiao, Priyanka Kargupta, Leo Luo, Yanzhen Shen, Bobby Zhou, Xianrui Zhong, Xuan Liu, Hongxiang Li, Jinfeng Xiao, Minhao Jiang, Vivian Hu, Xuan Wang, Heng Ji, Martin Burke, Huimin Zhao, Jiawei Han EMNLP'23 Demo, 2023 Paper / Code A system which interacts with raw scientific literature, delivering precise and more informative chemical reactions. Going beyond mere extraction, Reaction Miner integrates a holistic workflow: it accepts PDF files as input, bypassing the need for pre-processing and bolstering user accessibility. Subsequently, a text segmentation module ensures that the refined text encapsulates complete chemical reactions, augmenting the accuracy of extraction. Moreover, Reaction Miner broadens the scope of existing pre-defined reaction roles, including vital attributes previously neglected, thereby offering a more comprehensive depiction of chemical reactions. Evaluations conducted by chemistry domain users highlight the efficacy of each module in our system, demonstrating Reaction Miner as a powerful tool in this field.