首页 > 其他分享> > Machine Learning note—briefing

Machine Learning note—briefing

2022-02-06 15:01:17 作者：互联网

Basic Machine Learning Problems

● Supervised Learning: You have labelled data for computer to learn from
○ Regression
○ Classification
● Unsupervised Learning: You don’t have labelled data, but you want to find
patterns in the data
○ Clustering / Dimensionality Reduction

Linear Regression

In the simplest case, we can assume that the relationship between the features and the target is linear:
y =a + bX
In the equation above, y is the target, X is the feature, a is the intercept, and b is the weight of the feature
Using ordinary least squares method, we can estimate a and b in the equation
y = b0 + b1x1 + b2x2 + … + bnxn
This is still a linear regression model, sometimes called multiple linear regression
b0 is called the bias term, while b1 to bn are the weights of the features

Classification

● In classification, we are interested in putting each input sample into two (or more) pre-defined classes
● In other words, the target variable y is discrete
● Some common algorithms for classification:
○ Logistic regression
○ Support vector machines
○ Decision Trees
○ K-nearest-neighbour (kNN)
● Some regression tasks can be simplified to classification tasks

Logistic Regression

we can apply a transformation to the output of linear regression: logistic function or **sigmoid function

Its value tends to 1 if z tends to +∞, and tends to 0 if z tends to −∞

Decision Trees

● Decision trees are constructed by finding conditions to split the dataset into smaller subsets
● Decision trees can also be used to perform regression (thus the term CART:
Classification And Regression Trees)
● Decision trees are usually vulnerable to overfitting (more on this later), thus
we usually have to control the depth of a tree

Choosing ML Algorithms

Reference: Choosing ML Algorithms.

Model Complexity

● A complex model captures complex relationship between X and y, but it is also more likely to pick up noise → overfitting
● A simple model is easy to interpret, but may not be able to capture the true relationship between X and y → underfitting

Splitting Your Dataset

● It is usually advised that we have three splits of the dataset:

training set: for training your model(s)
validation/development set: for tuning your model’s hyperparameters
test/holdout set: for testing the performance of your model
imbalanced dataset：stratified sampling
K-fold cross validation

Evaluation

● When evaluating the performance of a model, we need to have:
○ ground truths: the correct answer / the true labels of the inputs
○ metric: a measure of how good the predictions are compared to the ground truths

Metrics for Regression - MAE（Mean Absolute Error ）

在这里插入图片描述

Metrics for Regression - RMSE（Root Mean Squared Error）

在这里插入图片描述

Metrics for Classification

○ Accuracy
○ True/False positives/negatives
○ Precision and Recall
○ Area Under the ROC Curve

Top 20 Python libraries for data science

标签：note,linear,Decision,Regression,Machine,Learning,briefing,model,regression
来源： https://blog.csdn.net/JacquesSun/article/details/122796209