# Statistical Learning

This one is the best:

# 1️⃣ The Supervising Learning Problem

Outcome measurement $Y$ (also called dependent variable, response, target).
Vector of $p$ predictor measurements $X$ (also called inputs, regressors, covariates, features, independent variables).
In the $regression problem$ , $Y$ takes values in a finite, unordered set (survived/died, digit 0-9, cancer class of tissue sample).
We have training data $(x1, y1),...,(x_N, y_N)$ . These are observations (examples, instances) of these measurements.

On the basis of the training data we would like to:

It is important to understand the ideas bahind the various techniques, in order to know how and when to use them.
One has to understand the simpler methods first, in order to grasp the more sophisticated ones.
It is important to accurately assess the performance of a method, to know how well of how badly it is working.

TIP

Simpler methods often perform as well as fancier ones!

This is an exciting research area, having important applications in science, industry and finance.
Statistical learning is a fundamental ingredient in the training of a modern $data scientist$ .

No outcome variable, just a set of predictors (features) measured on a set of samples.
Objective is more fuzzy - find groups of samples that behave similarly, find features that behave * similarly, find linear combinations of features with the most variation.
difficult to know how well your are doing.
different from supervised learning, but can be useful as a pre-processing step for supervised learning.

Competition started in October 2006. Training data is ratings for 18000 movies by 400000 Netflix * customers, each rating between 1 and5.
training data is very sparse - about 98% missing.
objective is to predict the rating for a set of 1 million customer-movie pairs that are missing in the * training data.
Netflix's original algorithm achieved a root MSE of 0.953. The first team to achieve 10% improvement wins one million dollars.
is this a supervised or unsupervised learning problem?

Machine learning arose as a subfield of Artificial Intelligence.
Statistical learning arose as a subfield of Statistics.
$There is much overlap$ $T h e r e i s m u c h o v e r l a p$ - both fields focus on supervised and unsupervised problems:
- Machine learning has a greater emphasis on $large scale$ applications and $prediction accuracy$ .
- Statistical learning emphasizes $models$ and their intepretability, and $precision$ and $uncertainty$
But the distinction has become more and more blurred, and there is a great deal of "cross-fertilization".
Machine learning has the upper hand in Marketing!