Terraform

Machine Learning: An In-Depth Guide - Overview, Goals, Learning Types, and Algorithms

By Alex Castrounis • Jan 27, 2016

Share the knowledge!

Articles in This Series

Overview, goals, learning types, and algorithms

Data selection, preparation, and modeling

Model evaluation, validation, complexity, and improvement

Model performance and error analysis

Unsupervised learning, related fields, and machine learning in practice

Introduction

Welcome! This is the first article of a five-part series about machine learning.

Machine learning is a very hot topic for many key reasons, and because it provides the ability to automatically obtain deep insights, recognize unknown patterns, and create high performing predictive models from data, all without requiring explicit programming instructions.

InnoArchiTech post image

Despite the popularity of the subject, machine learning’s true purpose and details are not well understood, except by very technical folks and/or data scientists.

This series is intended to be a comprehensive, in-depth guide to machine learning, and should be useful to everyone from business executives to machine learning practitioners. It covers virtually all aspects of machine learning (and many related fields) at a high level, and should serve as a sufficient introduction or reference to the terminology, concepts, tools, considerations, and techniques of the field.

This high level understanding is critical if ever involved in a decision-making process surrounding the usage of machine learning, how it can help achieve business and project goals, which machine learning techniques to use, potential pitfalls, and how to interpret the results.

Note that most of the topics discussed in this series are also directly applicable to fields such as predictive analytics, data mining, statistical learning, artificial intelligence, and so on.

Machine Learning Defined

The oft quoted and widely accepted formal definition of machine learning as stated by field pioneer Tom M. Mitchell is:

A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P if its performance at tasks in T, as measured by P, improves with experience E

The following is my less formal way to describe machine learning.

Machine learning is a subfield of computer science, but is often also referred to as predictive analytics, or predictive modeling. Its goal and usage is to build new and/or leverage existing algorithms to learn from data, in order to build generalizable models that give accurate predictions, or to find patterns, particularly with new and unseen similar data.

InnoArchiTech post image

Machine Learning Process Overview

Imagine a dataset as a table, where the rows are each observation (aka measurement, data point, etc), and the columns for each observation represent the features of that observation and their values.

At the outset of a machine learning project, a dataset is usually split into two or three subsets. The minimum subsets are the training and test datasets, and often an optional third validation dataset is created as well.

Once these data subsets are created from the primary dataset, a predictive model or classifier is trained using the training data, and then the model’s predictive accuracy is determined using the test data.

As mentioned, machine learning leverages algorithms to automatically model and find patterns in data, usually with the goal of predicting some target output or response. These algorithms are heavily based on statistics and mathematical optimization.

Optimization is the process of finding the smallest or largest value (minima or maxima) of a function, often referred to as a loss, or cost function in the minimization case. One of the most popular optimization algorithms used in machine learning is called gradient descent, and another is known as the the normal equation.

In a nutshell, machine learning is all about automatically learning a highly accurate predictive or classifier model, or finding unknown patterns in data, by leveraging learning algorithms and optimization techniques.

Types of Learning

The primary categories of machine learning are supervised, unsupervised, and semi-supervised learning. We will focus on the first two in this article.

In supervised learning, the data contains the response variable (label) being modeled, and with the goal being that you would like to predict the value or class of the unseen data. Unsupervised learning involves learning from a dataset that has no label or response variable, and is therefore more about finding patterns than prediction.

As i’m a huge NFL and Chicago Bears fan, my team will help exemplify these types of learning! Suppose you have a ton of Chicago Bears data and stats dating from when the team became a chartered member of the NFL (1920) until the present (2016).

Imagine that each row of the data is essentially a team snapshot (or observation) of relevant statistics for every game since 1920. The columns in this case, and the data contained in each, represent the features (values) of the data, and may include feature data such as game date, game opponent, season wins, season losses, season ending divisional position, post-season berth (Y/N), post-season stats, and perhaps stats specific to the three phases of the game: offense, defense, and special teams.

In the supervised case, your goal may be to use this data to predict if the Bears will win or lose against a certain team during a given game, and at a given field (home or away). Keep in mind that anything can happen in football in terms of pre and game-time injuries, weather conditions, bad referee calls, and so on, so take this simply as an example of an application of supervised learning with a yes or no response (prediction), as opposed to determining the probability or likelihood of ‘Da Bears’ getting the win.

Since you have historic data of wins and losses (the response) against certain teams at certain football fields, you can leverage supervised learning to create a model to make that prediction.

Now suppose that your goal is to find patterns in the historic data and learn something that you don’t already know, or group the team in certain ways throughout history. To do so, you run an unsupervised machine learning algorithm that clusters (groups) the data automatically, and then analyze the clustering results.

With a bit of analysis, one may find that these automatically generated clusters seemingly groups the team into the following example categories over time:

Strong defense, weak running offense, strong passing offense, weak special teams, playoff berth

Strong defense, strong running offense, weak passing offense, average special teams, playoff berth

Weak defense, strong all-around offense, strong special teams, missed the playoffs

and so on

An example of unsupervised cluster analysis would be to find a potential reason why they missed the playoffs in the third cluster above. Perhaps due to the weak defense? Bears have traditionally been a strong defensive team, and some say that defense wins championships. Just saying…

In either case, each of the above classifications may be found to relate to a certain time frame, which one would expect. Perhaps the team was characterized by one of these groupings more than once throughout their history, and for differing periods of time.

To characterize the team in this way without machine learning techniques, one would have to pour through all historic data and stats, manually find the patterns and assign the classifications (clusters) for every year taking all data into account, and compile the information. That would definitely not be a quick and easy task.

Alternatively, you could write an explicitly coded program to pour through the data, and that has to know what team stats to consider, what thresholds to take into account for each stat, and so forth. It would take a substantial amount of time to write the code, and different programs would need to be written for every problem needing an answer.

Or… you can employ a machine learning algorithm to do all of this automatically for you in a few seconds.

InnoArchiTech post image

Machine Learning Goals and Outputs

Machine learning algorithms are used primarily for the following types of output:

Clustering (Unsupervised)

Two-class and multi-class classification (Supervised)

Regression: Univariate, Multivariate, etc. (Supervised)

Anomaly detection (Unsupervised and Supervised)

Recommendation systems (aka recommendation engine)

Specific algorithms that are used for each output type are discussed in the next section, but first, let’s give a general overview of each of the above output, or problem types.

As discussed, clustering is an unsupervised technique for discovering the composition and structure of a given set of data. It is a process of clumping data into clusters to see what groupings emerge, if any. Each cluster is characterized by a contained set of data points, and a cluster centroid. The cluster centroid is basically the mean (average) of all of the data points that the cluster contains, across all features.

Classification problems involve placing a data point (aka observation) into a pre-defined class or category. Sometimes classification problems simply assign a class to an observation, and in other cases the goal is to estimate the probabilities that an observation belongs to each of the given classes.

A great example of a two-class classification is assigning the class of Spam or Ham to an incoming email, where ham just means ‘not spam’. Multi-class classification just means more than two possible classes. So in the spam example, perhaps a third class would be ‘Unknown’.

Regression is just a fancy word for saying that a model will assign a continuous value (response) to a data observation, as opposed to a discrete class. A great example of this would be predicting the closing price of the Dow Jones Industrial Average on any given day. This value could be any number, and would therefore be a perfect candidate for regression.

Note that sometimes the word regression is used in the name of an algorithm that is actually used for classification problems, or to predict a discrete categorical response (e.g., spam or ham). A good example is logistic regression, which predicts probabilities of a given discrete value.

Another problem type is anomaly detection. While we’d love to think that data is well behaved and sensible, unfortunately this is often not the case. Sometimes there are erroneous data points due to malfunctions or errors in measurement, or sometimes due to fraud. Other times it could be that anomalous measurements are indicative of a failing piece of hardware or electronics.

Sometimes anomalies are indicative of a real problem and are not easily explained, such as a manufacturing defect, and in this case, detecting anomalies provides a measure of quality control, as well as insight into whether steps taken to reduce defects have worked or not. In either case, there are times where it is beneficial to find these anomalous values, and certain machine learning algorithms can be used to do just that.

The final type of problem is addressed with a recommendation system, or also called recommendation engine. Recommendation systems are a type of information filtering system, and are intended to make recommendations in many applications, including movies, music, books, restaurants, articles, products, and so on. The two most common approaches are content-based and collaborative filtering.

Two great examples of popular recommendation engines are those offered by Netflix and Amazon. Netflix makes recommendations in order to keep viewers engaged and supplied with plenty of content to watch. In other words, to keep people using Netflix. They do this with their “Because you watched …“, “Top Picks for Alex”, and “Suggestions for you” recommendations.

Amazon does a similar thing in order to increase sales through up-selling, maintain sales through user engagement, and so on. They do this through their “Customers Who Bought This Item Also Bought”, “Recommendations for You, Alex”, “Related to Items You Viewed”, and “More Items to Consider” recommendations.

Machine Learning Algorithms

We’ve now covered the machine learning problem types and desired outputs. Now we will give a high level overview of relevant machine learning algorithms.

Here is a list of algorithms, both supervised and unsupervised, that are very popular and worth knowing about at a high level. Note that some of these algorithms will be discussed in greater depth later in this series.

Supervised Regression

Simple and multiple linear regression

Decision tree or forest regression

Artificial Neural networks

Ordinal regression

Poisson regression

Nearest neighbor methods (e.g., k-NN or k-Nearest Neighbors)

Supervised Two-class & Multi-class Classification

Logistic regression and multinomial regression

Artificial Neural networks

Decision tree, forest, and jungles

SVM (support vector machine)

Perceptron methods

Bayesian classifiers (e.g., Naive Bayes)

Nearest neighbor methods (e.g., k-NN or k-Nearest Neighbors)

One versus all multiclass

Unsupervised

K-means clustering

Hierarchical clustering

Anomaly Detection

Support vector machine (one class)

PCA (Principle component analysis)

Note that a technique that’s often used to improve model performance is to combine the results of multiple models. This approach leverages what’s known as ensemble methods, and random forests are a great example (discussed later).

If nothing else, it’s a good idea to at least familiarize yourself with the names of these popular algorithms, and have a basic idea as to the type of machine learning problem and output that they may be well suited for.

Summary

Machine learning, predictive analytics, and other related topics are very exciting and powerful fields.

While these topics can be very technical, many of the concepts involved are relatively simple to understand at a high level. In many cases, a simple understanding is all that’s required to have discussions based on machine learning problems, projects, techniques, and so on.

Part two of this series will provide an introduction to model performance, cover the machine learning process, and discuss model selection and associated tradeoffs in detail.

$=========================================================

This is a list of qualities possessed by effective machine learning products. I've been honing it for a while to guide my own thinking on the subject but decided others might find it interesting or useful.

First, here are some pedantic definitions to put us on the same page. Let a product to be something that solves a problem and that someone with that problem might pay money for. Let effective mean that someone with that problem would consider the cost (in money and other resources) of that product to be significantly lower than their estimated cost of solving the same problem using other means. In other words, this page is about the qualities of machine learning products that customers would actually buy.

Solve a Domain-Specific Problem

Generality is not a virtue. There are dozens of idioms on this subject (jack of all trades, etc.). The power of machine learning comes fully into play when you can extract the right features from the data, apply the right combination of processes and methods, incorporate the right kinds of user feedback, and display the results in the right form. Only when you are solving a specific problem in a specific domain do you have the information necessary to know what is 'right'. Don't lose sight of the product's value proposition by doing ML for ML's sake.

Incorporate User Feedback

Machines learn better with a teacher. User feedback is not merely an optimization; Some of the information necessary to learn correctly may not be present in the data, at all. In this (common) case, user feedback serves not merely to improve the solution but to fill the gaps left by the data. This missing information can take many forms, from the immediate context in which a problem is being posed, to domain or expert knowledge, to common sense that a table of data would not deign to reproduce.

Build Process

The learning process---especially feature engineering---dominates the quality of the solution. This process typically involves tasks like sampling, soliciting user feedback, deriving new features, validating models and parameters, evaluating end-to-end quality, and managing the artifacts of learning. The algorithm at the heart of a machine learning product often accounts for less than 1% of the total complexity (using an arbitrary scale of my choosing). This overall complexity means that the engineering required to build a machine learning product is usually indistinguishable from building any other piece of software.

Perform Magic

Nurture the user's perception that you are doing something magical (but remember that you aren't). Customers seek out products that leverage machine learning to solve a problem because they don't know how to solve that problem directly. Often, they don't want to know, and it's important to have an awareness of where to place the metaphorical curtain so the user sees that you are solving their problem but doesn't see the confusing, ugly details that they sought to avoid in the first place. Don't hide everything; Fully automatic systems don't feel as smart as ones that ask smart questions.

Explain Yourself

Communicate what was learned. This can be a visualization, but unfamiliar viz requires interpretation as well, and so a few words are often more effective at conveying the point. The explanation should be about why the system reached the conclusion or made the suggestion that it did. This quality is not in conflict with Magic; It is a corrolary. A system that makes a recommendation does not seem as magical as one that explains that recommendation using aspects of the user's data.

Leverage Ensembles

There is no one algorithm to rule them all. Indeed, the famous NFL (No Free Lunch) Theorem proves that all algorithms are equally good when averaged over all problems. Often, however, where one method fails completely, many other methods will find themselves in rough agreement. As such, ensemble methods (where several independent algorithms are used and the results combined using, e.g., voting) have met with great success.

#-------------

ML Notes

The last few years have seen an explosion of interest in machine learning technology and potential applications. As a non-expert, you’ve probably either had to assess ML technology for your product and business or as a potential investment. The jargon around ML technology is vast, confusing and, unfortunately, increasingly being hijacked by overeager sales teams.

This post is not a primer on ML technology; this post won’t pretend to give you an explanation of deep learning or any specific technology, because these concepts change frequently and are largely irrelevant to much of the decision making. Instead, this post will address how to assess the technology and determine if it will yield pragmatic business value.

Understand the task

Ultimately, ML is meant to be used in the context of a given task, a problem with inputs and a way to objectively assess how right or wrong an output is. While you may not understand the technology being used, it’s crucial to understand the task.

Don’t accept vagueness or something poorly defined like “understanding what a sentence means.” If someone can’t explain what their ML actually does independently of technical jargon, it’s a bad sign.

At a high level there are common kinds of tasks frequently seen in ML: classification, regression and ranking. For instance, image recognition, such as in ImageNet, is a classification task where we have an input image and want to predict the primary subject matter of the image (a photo of a dog, car, etc.).

While you may not understand the technology being used, it’s crucial to understand the task.

Regression is about predicting a real numerical value or values from an input, such as predicting the future value of a home or a stock portfolio. Ranking is about predicting an ordering of items which is “best” in a given setting; for instance, in search ranking, we want to order results that are most relevant for a given query and user profile and history.

So when you’re hearing about an ML pitch of some kind, it’s important to take a step back and get an explanation.

Understand the evaluation metric

Once you understand the task, it’s important to understand how the ML system is being evaluated on that task. Typically, people will define a system evaluation metric that gives a quantitative measure of how well the system does on the task. As an example, in image recognition you can report what percent of the time you predict the right category for an image (e.g. I correctly guessed this was an image of a dog). The common ML tasks (classification, regression and ranking) all have standard evaluation metrics with which it would be worth familiarizing yourself.

Not having a metric is a very bad sign.

It’s unfortunately quite common for people to develop very complex algorithms and technology for problems, but not actually develop an objective evaluation metric. Not having a metric is a very bad sign. There’s no objective way to actually know whether their “super deep learning” actually yields any tangible benefits. When it comes to building ML, or any technology really, for business value, you want to work with people who focus and drive by metrics.

A common and frustrating reality is that more complex ML technology does not necessarily mean improvements on evaluation metrics; especially in environments with limited data, simple techniques frequently outperform more complex ones.

The corollary of this is if you’re building ML, always develop and try simpler methods first. I’ve personally consulted on many projects where people have heavily invested in ML only to find out something vastly simpler (in more than one case just Naive Bayes) performed at least as well, with an order of magnitude more speed and less development time.

Understand how ML improvements impact business metrics

The last and trickiest aspect of assessing ML technology is understanding how improvements on the ML task will impact which business metrics and by how much. Sometimes there’s a very direct relationship. For instance, for ad placement in search results, the ML metric is typically predicting the probability of ad click-through (possibly weighted by expected CPC).

The rate and revenue-generated ad click-through is either a core business metric or closely related to one. In this setting, it makes a lot of sense to invest heavily in ML, because gains will likely improve business metrics.

In other settings, the relationship is less clear. For instance, at Netflix, improving movie recommendation quality by 0.5 percent, while difficult, does not necessarily mean that month-over-month subscriber retention will necessary budge (although something like engagement might).

As a product owner or investor, it’s important that you understand which business metric you want to actually move and whether or not ML improvements might actually yield those changes.

Unsurprisingly, this might be part of why Google invests so heavily in ML, because improvements are strongly correlated with key business and financial metrics. On the flip side, for Apple, a 1 percent improvement to Siri has a much weaker and tenuous relationship with how many iPhones are sold.

If you want to work on ML in products or invest in the area, it’s crucial to understand whether this really is an area where ML can “move” the needle.

Terraform

Pages

Sunday, December 9, 2018

ML

Terraform Advantages

Errors

Saturday, December 8, 2018

Initialization

ML