NYU Professor Uses AI to Combat AI Cheating with Oral Exams

An NYU business school professor has turned to AI-powered oral examinations to combat the growing problem of AI-assisted cheating in coursework. Panos Ipeirotis, a professor at NYU’s Stern School of Business who teaches data science, implemented an innovative solution after noticing that student assignments appeared polished and professional—“like a McKinsey memo”—but students couldn’t defend their work when questioned in class.

In a blog post published last week, Ipeirotis detailed his “fight fire with fire” approach, reviving the traditional oral exam format but scaling it using artificial intelligence. The professor built an AI examiner using ElevenLabs’ conversational speech technology, which he said took only minutes to set up. The system administered comprehensive oral exams to 36 students over nine days, with each session lasting approximately 25 minutes.

The exam structure consisted of two parts: first, the AI agent questioned students about their capstone projects, probing their decision-making and reasoning processes. Then it selected a case study discussed in class and challenged students to analyze it in real-time. The total compute costs for all 36 students came to just $15, compared to hundreds of dollars that human teaching assistants would charge for equivalent assessment time.

Ipeirotis also employed AI for grading, using a “council of LLMs” approach. Three AI models—Claude, Gemini, and ChatGPT—independently assessed each transcript, then reviewed one another’s evaluations and revised their scores before producing a final grade, with Claude acting as the “chair” to synthesize decisions. According to Ipeirotis, this multi-model approach graded “more consistently than humans” and provided “more strictly, but more fairly” assessments.

The professor noted that the AI-generated feedback was superior to what humans typically produce and even revealed gaps in how the material had been taught. However, student reception was mixed. Only a small minority preferred the AI oral exams, with many finding them more stressful than written tests, though they acknowledged the format better measured genuine understanding.

Ipeirotis told Business Insider that students were “mentally outsourcing” their work to AI rather than using it productively, and the oral exam approach forces them to “use their brains, and leverage AI to be even better.” The experiment highlights the broader challenge universities face in the AI era, as traditional assessment methods become increasingly vulnerable to AI-assisted cheating.

Key Quotes

If you cannot defend your own work live, then the written artifact is not measuring what you think it is measuring

Professor Panos Ipeirotis explained his rationale for implementing oral exams after observing that students who submitted impressive written work couldn’t explain their reasoning when questioned in class, revealing a fundamental flaw in traditional assessment methods.

We need assessments that evolve toward formats that reward understanding, decision-making, and real-time reasoning. Oral exams used to be standard until they could not scale. Now, AI is making them scalable again.

Ipeirotis articulated the core philosophy behind his experiment, highlighting how AI technology is enabling a return to assessment formats that better measure genuine comprehension while overcoming the historical scalability limitations that led to their abandonment.

The feedback was better than any human would produce

The professor described the quality of AI-generated grading and feedback from his “council of LLMs” approach, suggesting that multi-model AI assessment may not only be more cost-effective but potentially superior to human evaluation in consistency and depth.

We could not let this happen, we wanted them to use their brains, and leverage AI to be even better

In comments to Business Insider, Ipeirotis explained his concern about students “mentally outsourcing” their work to AI, emphasizing that the goal is to teach students to use AI as a productivity tool while maintaining their own critical thinking skills.

Our Take

This experiment represents a fascinating paradox in AI education: using the very technology that enables cheating to prevent it. What makes Ipeirotis’ approach particularly noteworthy is the multi-model grading system, which leverages the strengths of different AI models while mitigating individual biases—a sophisticated implementation that goes beyond simple automation. The mixed student reception reveals an important truth: effective learning is often uncomfortable, and the stress students reported may actually indicate the system is working as intended by forcing genuine engagement. The $15 cost for 36 comprehensive assessments is remarkable and could democratize rigorous evaluation methods previously available only to well-funded institutions. However, this raises questions about the future role of human educators and whether we’re comfortable with AI making nuanced judgments about student understanding. The broader implication is clear: the AI era demands we optimize for understanding and adaptability rather than artifact production, a shift with profound implications for both education and professional development.

Why This Matters

This development represents a significant shift in how educational institutions are responding to the AI revolution in academia. As generative AI tools become ubiquitous, traditional assessment methods like essays and written assignments are increasingly vulnerable to exploitation, forcing educators to fundamentally rethink how they measure student learning and comprehension.

Ipeirotis’ experiment demonstrates that AI can be part of the solution rather than just the problem. By using AI to scale oral examinations—a format that historically couldn’t be implemented widely due to cost and time constraints—educators now have a viable alternative that tests genuine understanding rather than writing ability or AI prompt engineering skills.

The implications extend beyond academia into the workplace, where the ability to defend ideas, think critically in real-time, and demonstrate genuine expertise matters more than producing polished documents. This approach also highlights the growing sophistication of AI systems in educational contexts, from conversational agents that can conduct nuanced interviews to multi-model grading systems that may outperform human consistency. As universities worldwide grapple with what one study called the “wicked problem” of AI in assessment, this experiment offers a practical blueprint for adaptation.

Source: https://www.businessinsider.com/nyu-professor-ai-oral-exam-mckinsey-memo-business-school-2026-1