Written by
Thomas Clapper
Thomas Clapper
Category
Book Club
Oct
12

Designing machine learning systems-book-review

An book overview

Chip Huyen has provided machine learning tools at Netflix, Nvidia, Primer and has even launched her own real-time machine learning platform: Claypot AI. Huyen focuses on creating systems of machine learning that fit the problem it is working to solve. Huyen articulates this point: “Before we develop an ML system, we must understand why the system is needed. For example, if the system is built for a business, it must be driven by business objectives, which will need to be translated into ML objectives to guide the development of ML models” (Huyen, 25).


In this way, ML should be seen primarily as another tool that can be used to solve complex problems. The issue is believing that ML is the right answer to all problems. Or worse, all ML solutions are not equal – choosing the correct system is critical to gain success.


Data is dumb

When looking at the overall state of ML, it is critical to understand that this is a rapidly-changing field. One example of this is the current reliance on data as the primary tool for ML models.


Huyen looks to Dr. Judea Pearl for insight:


“In the mind-over-data camp, there’s Dr. Judea Pearl, a Turing Award winner best known for his work on causal inference and Bayesian networks. The introduction to his book “The Book of Why” is called “Mind over Data”, in which he emphasizes: “Data is profoundly dumb?” In one of his more controversial posts on Twitter in 2020, he expressed his strong opinion against ML approaches that rely heavily on data and warned that data-centric ML people might be out of a job in three to five years: “ML will not be the same in 3-5 years, and ML folks who continue to follow the current data-centric paradigm will find themselves outdated, if not jobless. Take note” (Huyen, 43).


The point is not that data is not useful – clearly, data is needed in large quantities to make connections. Instead, Dr. Pearl argues that the true figure of ML is causation instead of correlation. The example he has used in other resources is that a machine might correlate fevers with malaria. However, a genuinely exceptional AI model could tell you malaria causes fevers. Furthermore, in a perfect world, it may be able to tell you why.


This “why” reasoning does not exist within the data itself. Rather it exists in the mind of those architecting the models. The “why” connectors will become more critical as ML seeps further into daily life.


If the industry moves in this way, causal ML practitioners need to take note from Dr. Pearl – or they may just find themselves out of work.


Building better models

Huyen spends the rest of the book focusing on how to build models that are durable, consistent, and solve real problems. Building models and, more so, systems that support complex models in a way that follows best practices is necessary to create meaningful solutions to complex problems.