大型语言模型已经引起了公众的注意,短短五年内,Transforme等模型几乎完全改变了自然语言处理领域。此外,它们还开始在计算机视觉和计算生物学等领域引发革命。
Jay Alammar撰写的《The Illustrated Transformer》
Lilian Weng撰写的《The Transformer Family》
Xavier Amatriain撰写的《Transformer models: an introduction and catalog — 2023 Edition》
Andrej Karpathy写的nanoGPT库
论文地址:https://arxiv.org/pdf/1409.0473.pdf
论文3:《BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding》
论文4:《Improving Language Understanding by Generative Pre-Training》
如果你对这个研究分支感兴趣,那么可以跟进GPT-2和GPT-3的论文。此外,本文将在后面单独介绍InstructGPT方法。
论文1:《A Survey on Efficient Training of Transformers》
论文地址:https://arxiv.org/abs/2302.01107
论文2:《FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness》
论文地址:https://arxiv.org/abs/2205.14135
论文3:《Cramming: Training a Language Model on a Single GPU in One Day》
论文地址:https://arxiv.org/abs/2212.14034
论文4:《Training Compute-Optimal Large Language Models》
论文地址:https://arxiv.org/abs/2203.15556
此外,还有论文《Training Compute-Optimal Large Language Models》
论文地址:https://arxiv.org/abs/2203.15556
论文1:《Training Language Models to Follow Instructions with Human Feedback》
论文地址:https://arxiv.org/abs/2203.02155
论文2:《Constitutional AI: Harmlessness from AI Feedback》
论文地址:https://arxiv.org/abs/2212.08073
论文1:《BLOOM: A 176B-Parameter Open-Access Multilingual Language Model》
论文地址:https://arxiv.org/abs/2211.05100
论文2:《OPT: Open Pre-trained Transformer Language Models》
论文地址:https://arxiv.org/abs/2205.01068
论文1《LaMDA: Language Models for Dialog Applications》
论文地址:https://arxiv.org/abs/2201.08239
论文2:《Improving alignment of dialogue agents via targeted human judgements》
论文地址:https://arxiv.org/abs/2209.14375
论文3:《BlenderBot 3: a deployed conversational agent that continually learns to responsibly engage》
论文地址:https://arxiv.org/abs/2208.03188
论文1:《 ProtTrans: Towards Cracking the Language of Life’s Code Through Self-Supervised Learning 》
论文地址:https://arxiv.org/abs/2007.06225
论文2:《Highly accurate protein structure prediction with AlphaFold》
论文地址:https://www.nature.com/articles/s41586-021-03819-2
论文3:《Large Language Models Generate Functional Protein Sequences Across Diverse Families》
论文地址:https://www.nature.com/articles/s41587-022-01618-2