The article discusses how AI companies like OpenAI might face challenges with their training data practices as Chinese competitor DeepSeek demonstrates the effectiveness of model distillation. DeepSeek created a powerful AI model by learning from GPT-4’s outputs rather than using copyrighted training data, potentially avoiding legal issues that OpenAI faces. This technique, called model distillation, allows newer AI models to learn from established ones without accessing original training data. The article suggests this could be “karma” for OpenAI, which faces multiple copyright lawsuits over its training practices. DeepSeek’s approach shows that companies might not need vast amounts of copyrighted material to create competitive AI models, potentially undermining OpenAI’s defense in copyright cases. The article also highlights how this development could shift the AI industry’s approach to training, with companies potentially moving away from scraping copyrighted content. Experts note that while distillation might not completely match the original model’s capabilities, it can produce surprisingly effective results. This situation raises questions about the future of AI training methods and the validity of current copyright defenses used by major AI companies. The emergence of successful distilled models could influence both legal and technical aspects of AI development in the coming years.