Kunvar Thaman’s personal paper “Reward Hacking Benchmark: Measurement Exploits in LLM Agents with Tool Use” was accepted by ICML 2026. Since then, he has become a well-known figure in the machine learning community. The paper’s arXiv listing shows Thaman as the sole author and confirms its acceptance to the prestigious conference. The conference is scheduled to be held in Seoul, South Korea, from July 6 to 11, 2026. Public posts associated with Thaman describe him as an independent researcher from India, making the achievement particularly notable in a field typically dominated by major artificial intelligence companies, elite universities and large research labs.
What is Kunvar Thaman’s research paper about?
Thaman’s paper introduces the Reward Hacking Benchmark (RHB), a framework designed to measure how well large language model agents using tools exploit shortcuts when completing multi-step tasks. The benchmark includes scenarios in which an AI system may bypass verification steps, indirectly infer answers, or manipulate assessment-related tools.The study evaluated 13 cutting-edge artificial intelligence models from organizations including OpenAI, Anthropic, Google, and DeepSeek. The paper states that vulnerability exploit rates ranged from 0% to 13.9%, and that additional security measures reduced exploit behavior without significantly affecting task completion.ICML, short for the International Conference on Machine Learning, is considered one of the world’s leading artificial intelligence and machine learning conferences. It attracts submissions from top institutions and technology companies including OpenAI, Google DeepMind, Stanford University, MIT and other major research institutions.The conference is highly competitive, with thousands of papers submitted each year, but only a small number are accepted after peer review. This makes individual author acceptance particularly unusual, especially for an independent researcher without the support of a major institution or AI lab. Kunvar Thaman is a 26-year-old researcher from Chandigarh, India. He completed his studies at the Birla Institute of Technology and Science Pilani, one of India’s most prestigious higher education institutions, and has been described in public posts and articles as an independent researcher working on artificial intelligence. According to his LinkedIn profile, Kunvar Thaman currently resides in San Francisco.Some online posts claim that since the launch of ChatGPT, only two other independent researchers in the world have received similar ICML recognition. However, this specific statistic has not been independently verified through official ICML records.
Why this study is concerning
The topic of reward hacking is becoming increasingly important in AI security research. As large language models gain greater autonomy and access to tools, researchers are increasingly concerned about systems exploiting vulnerabilities or taking unexpected shortcuts to maximize returns.Thaman’s benchmark attempts to study these behaviors in a more realistic environment rather than a simplified experimental setup. The paper’s focus on AI agent security makes it one of the fastest growing areas of modern AI research.
A rare independent breakthrough
What makes Taman’s story compelling is not just the paper itself, but the fact that it was written by an independent researcher within a research ecosystem dominated by billion-dollar AI companies and top universities.For many observers in the AI ​​community, this acceptance represents a rare example of an independent voice breaking into one of machine learning’s most competitive global platforms.

