当计算预算低时,重复使用高质量数据更好;当不差钱时,使用大量数据更有利。
![图片](https://image.jiqizhixin.com/uploads/editor/d88009bf-22f2-4f71-8b4c-666abb7484ff/640.png)
论文标题:Scaling Laws for Data Filtering—Data Curation cannot be Compute Agnostic 论文地址:https://arxiv.org/pdf/2404.07177.pdf 代码地址:https://github.com/locuslab/scaling_laws_data_filtering
![图片](https://image.jiqizhixin.com/uploads/editor/6ea18eb9-02ec-4416-bb72-52b8374e1829/640.png)
![图片](https://image.jiqizhixin.com/uploads/editor/6ee18c38-1f29-44a3-9322-c0588e105554/640.png)
![图片](https://image.jiqizhixin.com/uploads/editor/34dfa453-2150-4298-a208-a704ded37251/640.png)
![图片](https://image.jiqizhixin.com/uploads/editor/06601bff-b3c7-4ba3-a6b5-5467bb47ff84/640.png)
![图片](https://image.jiqizhixin.com/uploads/editor/cd889f2a-6fad-42d2-845f-77a6234bf078/640.png)
![图片](https://image.jiqizhixin.com/uploads/editor/2bfc4689-aa09-4121-95bb-45feae74ca05/640.png)
![图片](https://image.jiqizhixin.com/uploads/editor/5ed4d1e1-ade5-45c9-9b28-94b86bbae079/640.png)
![图片](https://image.jiqizhixin.com/uploads/editor/fb3f5a6f-3839-477e-bf83-d05754640ea8/640.png)
![图片](https://image.jiqizhixin.com/uploads/editor/77a898b5-d4cf-4bcf-90ef-b8c9042d72aa/640.png)
![图片](https://image.jiqizhixin.com/uploads/editor/5c000eb5-83f4-4a25-b565-4620d80eaa42/640.png)
![图片](https://image.jiqizhixin.com/uploads/editor/19aba7ec-ffc6-4ee7-98d8-cbaf9a75515b/640.png)
Auto Byte
专注未来出行及智能汽车科技
微信扫一扫获取更多资讯
Science AI
关注人工智能与其他前沿技术、基础学科的交叉研究与融合发展
微信扫一扫获取更多资讯
当计算预算低时,重复使用高质量数据更好;当不差钱时,使用大量数据更有利。
机器学习是人工智能的一个分支,是一门多领域交叉学科,涉及概率论、统计学、逼近论、凸分析、计算复杂性理论等多门学科。机器学习理论主要是设计和分析一些让计算机可以自动“学习”的算法。因为学习算法中涉及了大量的统计学理论,机器学习与推断统计学联系尤为密切,也被称为统计学习理论。算法设计方面,机器学习理论关注可以实现的,行之有效的学习算法。
一种简单的模型或启发法,用作比较模型效果时的参考点。基准有助于模型开发者针对特定问题量化最低预期效果。