近些年,人工智能领域发生了飞跃性的突破,更使得许多科技领域的学生或工作者对这一领域产生了浓厚的兴趣。在入门人工智能的道路上,The Master Algorithm 可以说是必读书目之一,其重要性不需多言。作者 Pedro Domingos 看似只是粗略地介绍了机器学习领域的主流思想,然而几乎所有当今已出现的、或未出现的重要应用均有所提及。本书既适合初学者从更宏观的角度概览机器学习这一领域,又埋下无数伏笔,让有心人能够对特定技术问题进行深入学习,是一本不可多得的指导性入门书籍。诙谐幽默的行文风格也让阅读的过程充满趣味。
以这本书为载体,机器之心「人工智能研学社 · 入门组」近期将正式开班!
我们邀请所有对人工智能、机器学习感兴趣的初学者加入我们,通过对 The Master Algorithm 的阅读与讨论,宏观、全面地了解人工智能的发展历史与技术原理。
- 人工智能研学社 · 入门组 | 一起研习Pedro Domingos的《终极算法》
- 研学社 · 入门组 | 《终极算法》前两章总结及第三章学习
- 研学社 · 入门组 | 《终极算法》第三章总结及第四章学习
- 研学社 · 入门组 | 第四期:进化是大自然的学习算法
- 研学社 · 入门组 | 第五期:进入贝叶斯的殿堂
- 研学社 · 入门组 | 第六期:初入贝叶斯网络
- 研学社 · 入门组 | 第七期:不需要老师的无监督学习
- 研学社 · 入门组 | 第八期:通向终极算法的可能
第九章 复习
章节总结
第九章是一个逻辑推理章节,它始于科学发现背后的原则。各种各样的现象暗示着相同的原则,而且可以用统一的原则来解释, 所以定义和发展一切的统一者是很重要的。怎样来结合不同的运算法则呢?作者提供了两种方法。第一种是元学习(Metalearning), 比如堆叠,袋装和提升。但它不够深入且计算昂贵。第二种方法是主算法(master algorithm),它是一种结合不同运算方法的统一体。这章内容很好的总结了各种部分的特征:
A | Representation | Evaluation | Optimization |
---|---|---|---|
符号学家 | 逻辑 | 精确度 | 反演 |
连接主义 | 神经网络 | 平方差 | 梯度下降 |
进化论者 | 遗传项目 | 适合度 | 遗传搜索 |
贝叶斯学派 | 图模型 | 后验概率 | 概率推理 |
类推学者 | 支持向量 | 间隔 | 约束优化 |
表示(representation)是指学习者表达模型时的一种正式语言。评估(evaluation)部分是一个说明模型有多好的评分函数。最优化(optimization)是搜索并返回最高分模型的运算法则。在探索之后,我们达到的统一学习者可以使用MLN表示,其中后验概率为评估函数, 遗传搜索和梯度下降作为优化器。如果我们想要,我们可以通过一些其他的精确方法替代后验概率,或通过爬山算法执行基因搜索。
到目前为止,我们似乎已经掌握了主算法的初始状态, 在本章节中它被称为炼金术(Alchemy)。炼金术和其他方法之间的转换很容易。然而,它仍然不够完善,具有很多缺点,比如大规模的应用。最后,作者提到了CanceRx项目,该系统会被投以癌症基因数据和相对应的治疗药品。它是前途光明的代表性应用之一。
第8周问答集
堆叠,袋装,提升算法有什么区别?
- a. 堆叠是通过学习者的预测来代替每个原始样本的属性而学习权重,然后选择那些经常预测正确的学习者。
- b. 袋装通过重新采样产生随机变化的训练集,对每个样本应用相同的学习者,并通过投票综合结果。
- c. 提升: 不是结合不同的学习者, 提升方法对数据重复应用相同的分类器,使用每个新模型改正以前的错误。它通过分配权重来训练样本; 在每轮学习结束后,每个错误分类样本所占的权重都会增长,导致了后轮会更关注它。
如何结合逻辑和概率?
- Markov逻辑网络(MLN)
炼金术有什么挑战?
- 它还没有扩展到真正的大数据, 在机器学习领域没有达到博士的人将很难使用。
Recommended Reference
- https://pdfs.semanticscholar.org/dc91/ca192ed16901e1225aa57c956e49cd4c892e.pdf
- https://pdfs.semanticscholar.org/fa0f/b9450cc00d45335566f43a04f4d7b94a560e.pdf
- http://insidebigdata.com/2016/01/12/book-review-master-algorithm/
- https://quantdare.com/what-is-the-difference-between-bagging-and-boosting/
第10章 复习
章节总结
正如在本章节开始时提到的,“本章将帮助你充分利用你的生活,并且为下一步做好准备”。如果主算法允许使用者插入评分函数(你认为学习者的目标是什么,或者,更精确的说,它的所有者的目的是什么)和数据(你认为它知道什么)建立自己的模型, 你想要什么样的模型?你将提供什么数据?作者还将讨论智能模型如何工作,人们如何使用它们,社会将会受到什么影响等。然后,作者深入了解了数据和数据隐私,解释用户拥有的数据类型, 他们可以如何以及在哪分享这些数据。之后,作者探讨了各种有争议的机器学习的相关主题: 智能机器如何窃取我的工作?如果我们有了机器人战争该怎么办?人工智能的威胁依旧存在么?机器只能如何影响人类进化?
重要章节
性别,谎言和机器学习
- 作者以网络约会为例展示了现今机器学习应用中使用者与学习者之间的沟通渠道是如何狭窄。这归结为“学习者能够拥有的模型有多好,以及你想用模型解决什么的问题”。
数码镜
- 这个简短的部分谈论了数据模式的“你”可以带来的自我提升。例如,它可以阅读和回复你的电子邮件,过滤你想要阅读的内容,查找日期,管理个人生活等。
模型社会
- 作者指出,如果模型可以做那么多事情,就像求职和公司面试一样,那么使用者生活的世界也将称为一个模型。
分享或不分享,如何,以及在哪里
- 在这个章节,作者探讨的主要问题是:我们应该分享什么数据?怎么分享?我们在哪里可以分享?
神经网络偷走了我的工作
- 作者首先指出“可以从数据中轻松学习的狭义的任务(已被机器人取代),但是那些需要广泛结合技能和知识的任务,机器人无法取代”。作者也认为,当自动化在市场上接管了一些工作并创造了一些其他的工作时,人类应该充分利用机器而不是与它们竞争。
战争不适合人类
- 这部分显示了人类可能还没有准备好进行机器人战争的两个原因。首先,教导机器人认识相关概念和学习道德规范可能不是最好的选择, 因为人类又是也违反他们的道德原则,而这可能会混淆机器人。第二,机器人会为了战争改变道德规范,因为人们不需要进入战场或面对生死有关的情况。
谷歌+主算法=skynet?
- 关于人类与人工智能的威胁,作者提到,只要人类关心评估部分,人工智能就永远是我们的最好的工作伙伴而不是威胁。
进化,第2部分
- 作者认为自然已经经历了三个阶段:进化,大脑,和文化。“机器学习是下一个阶段的一部分,它反映了人类在未来有可能定向地进化。
重要概念
- 心理理论:你心中的计算机理论
- 转折点:机器智能超出人类智慧时
测验
- 对于什么工作会被自动化取代作者是什么态度?
- 人们对于智能机器的三大担忧是什么?
- 心理理论是什么?
- 转折点是什么?
- 使用者将有的四种数据是什么?
Chapter #9 Review
【Chapter Summary】
Chapter 9 is a logical section. It begins with the principles behind the scientific discovery. Various phenomenon indicates the same principle and could be explained by a unified principle, so the unifier to define and develop everything would be substantial. How to combine the different algorithms? There are two ways provided by the author. The first one is using the metalearning such as stacking, bagging and boosting. But it is not deep enough and computationally expensive. The second one is the master algorithm, which is a kind of unifier for the different algorithms. The features from the various sectors are summarized very well in this chapter:
Representation here is the formal language in which the learner expresses its models. The evaluation component is a scoring function that says how good a model is. Optimization is the algorithm that searches for the highest-scoring model and returns it. After the exploration of the virtual city, the unified learner we’ve arrived at uses MLNs as the representation, posterior probability as the evaluation function, and genetic search coupled with gradient descent as the optimizer. If we want, we can easily replace the posterior by some other accuracy measure, or genetic search by hill climbing.
So far it seems we have already got the initial status of the Master Algorithm, in this section it is called Alchemy. The transformation among the alchemy and other algorithms are easy. However, it is still underdeveloped with many disadvantages such as the large scale application. Finally, the author mentioned the CanceRx, could be treated as a program which is fed with eh cancer's genome and provides the drug to kill it with. It is one of the representative application with bright future.
Week 8 Q & A Collection
- What is different among stacking, bagging and boosting?
- Stacking is to “replace the attributes of each original example by the learners' predictions to learn the weights and then choose the learners that often predict the correct class”.Bagging “generates random variations of the training set by resampling, applies the same learner to each one, and combines the results by voting”Boosting: Instead of combining different learners, boosting repeatedly applies the same classier to the data, using each new model to correct the previous ones’ mistakes. It does this by assigning weights to the training examples; the weight of each misclassified example is increased after each round of learning, causing later rounds to focus more on it.
- How to combine the logic and probability?
- Markov logic network (MLN)
- What are the challenges of the Alchemy?
- “It does not yet scale to truly big data, and someone without a PhD in machine learning will find it hard to use.”
Recommended Reference
- https://pdfs.semanticscholar.org/dc91/ca192ed16901e1225aa57c956e49cd4c892e.pdf
- https://pdfs.semanticscholar.org/fa0f/b9450cc00d45335566f43a04f4d7b94a560e.pdf
- http://insidebigdata.com/2016/01/12/book-review-master-algorithm/
- https://quantdare.com/what-is-the-difference-between-bagging-and-boosting/
Chapter #10 Preview
【Chapter Summary】
As the author mentions at the beginning of the chapter, “this chapter will help you make the most of it in your life and be ready for what comes next.” If the Master Algorithm allows users to plug in the score function (what you think the learner’s goals are, or, more precisely, its owner’s) and the data (what you think it knows) to build your own model, what model do you want to have? What data would you feed it? The author opens up a discussion about how intelligent models work, how people use them and how the society will be affected. Then, the author dives into data and data privacy, explaining what kind of data that users have, how and where they can share it. After that, the author explores various controversial machine learning related topics: How did intelligent machines steal my job? What if we have a robot war? Does AI threat still exist? How can intelligent machines influence the human evolution?
【Important Sections】
- Sex, Lies and machine learning
- The author uses online dating as an example to show how narrow the communication channel between the users and the learner in machine learning application is nowadays. It comes down to “the question of how good a model of you a learner can have and what you'd want to do with the model.”
- The digital mirror
- This short section talks about the self-improvement that the digital model of “you” can bring. For example, it can read and reply your emails for you, filter what you want to read next, find a date and manage personal life.
- A society of models
- The author points out the phenomena that if the model can do so much, things like job hunting and company interviewing, for the user, then the model will become a model of the world in which the user lives.
- To share or not to share, and how and where
- The main questions the author explores in this section are: What data should we share? How to share? Where can we share?
- A neural network stole my job
- The author first points out that “narrowly defined tasks are easily learned from data [are replaced by robots], but tasks that require a broad combination of skills and knowledge aren't.” The author also thinks that humans should make the best of machine rather than competing with them, while automation takes away some jobs but creates some others in the market.
- War is not for humans
- This section shows that human might not be ready for robot war yet for two main reasons. First, teaching robots to recognize relevant concepts and learn ethics by observing humans might not be the best option since humans sometimes violate their ethical principles as well, and this might confuse robots. Second, robots change the ethics for war since people don't need to go to the battlefield or face life-death situations.
- Google + Master Algorithm = Skynet?
- In terms of human and AI threats, the author mentions that as long as the human takes care of the evaluation part, then AI will always be the best working partner we have instead of a threat.
- Evolution, part 2
- The author thinks that nature has already been through its three phases — evolution, the brain, and culture — and “machine learning is the local next stage of this progression”, which reflects that human-directed evolution might be possible in the future.
【Key Concepts】
- Theory of mind: the computer theory of your mind.
- Turning point: the point where machine intelligence exceeds human intelligence.
【Quiz】
- What's the author's attitude toward what job will be replaced by automation?
- What are the three worries people have about intelligence machines?
- What is theory of mind?
- What is Turning point?
- What are the four kinds of data a user will have?