Machine Learning Specialization: Arjunan K

Detailed notes of Machine Learning Specialization by Andrew Ng in collaboration between DeepLearning.AI and Stanford Online in Coursera, Made by Arjunan K.

Supervised Machine Learning: Regression and Classification - Course 1

Intro to Machine Learning

Machine Learning is the Ability of computers to learn without being explicitly programmed. There are different types of Machine Learning Algorithms.

  1. Supervised Learning
  1. Unsupervised Learning
  1. Recommender Systems
  1. Reinforcement Learning

Supervised Learning

Machines are trained using well "labelled" training data, and on basis of that data, machines predict the output. The labelled data means some input data is already tagged with the correct output. It find a mapping function to map the input variable(x) with the output variable(y). Some use cases are given below,

Types of Supervised Learning

  1. Regression
  1. Classification

Unsupervised Learning

Models are not supervised using labelled training dataset. Instead, models itself find the hidden patterns and insights from the given data. It learns from un-labelled data to predict the output.

Types of Unsupervised Learning

  1. Clustering (Group similar data like DNA, Customer, Disease Features)
  1. Anomaly Detection (Finds unusual data points)
  1. Dimensionality Reduction (Compress data to fewer numbers)

REGRESSION

It’s used as a method for predictive modelling in machine learning in which an algorithm is used to predict continuous outcomes. Commonly used regression is

Linear Regression

  1. Simple Linear Regression - (one dependent and one independent variable)
  1. Multiple linear regression - (one dependent and multiple independent variable)

CLASSIFICATION

It predicts categories, the program learns from the given dataset or observations and then classifies new observation into a number of classes or groups. Such as, Yes or No, 0 or 1, Spam or Not Spam, cat or dog etc. Classes can be called as targets/labels or categories. Commonly used classification is

Logistic Regression

In this note we will be focusing on the math behind the Linear and Logistic Regression Models.

SIMPLE LINEAR REGRESSION

What is Cost Function?

A cost function is an important parameter that determines how well a machine learning model performs for a given dataset. It calculates the difference between the expected value and predicted value and represents it as a single real number. It is the average of loss function (Difference between predicted and actual value).

Our aim is to minimize the cost function, which is achieved using Gradient Descent.

Types of cost function.

  1. Mean Squared Error (MSE) for Linear Regression
  1. Log Loss for Logistic Regression

Cost Function for Linear Regression - MSE (Convex)

Gradient Descent

Gradient descent is an optimization algorithm which is commonly-used to train machine learning models and neural networks. Training data helps these models learn over time, and the cost function within gradient descent specifically acts as a barometer, gauging its accuracy with each iteration of parameter updates. Until the function is close to or equal to zero, the model will continue to adjust its parameters to yield the smallest possible error. We start with

Normal Equation (Alternative for Gradient Descent)

Learning Rate

The learning rate is a hyperparameter that controls how much to change the model in response to the estimated error each time the model weights are updated.

1. Batch gradient descent (Used in this course)

Each step of gradient descent uses all training data. This process referred to as a training epoch.

2. Stochastic gradient descent

Each step of gradient descent uses a subset of training data. It runs a training epoch for each example within the dataset and it updates each training example's parameters one at a time.

3. Mini-batch gradient descent

Mini-batch gradient descent combines concepts from both batch gradient descent and stochastic gradient descent. It splits the training dataset into small batch sizes and performs updates on each of those batches. This approach strikes a balance between the computational efficiency of batch gradient descent and the speed of stochastic gradient descent.

Multiple Linear Regression

SIMPLE v/s MULTIPLE

WHAT IS VECTORIZATION?

Vectorization is used to speed up the code without using loop. Using such a function can help in minimizing the running time of code efficiently. Various operations are being performed over vector such as dot product of vectors which is also known as scalar product.

It uses principle of parallel running, which is also easy to scale.

Feature Scaling

Feature Scaling is a technique to standardize the independent features present in the data in a fixed range.

Example, if we have weight of a person in a dataset with values in the range 15kg to 100kg, then feature scaling transforms all the values to the range 0 to 1 where 0 represents lowest weight and 1 represents highest weight instead of representing the weights in kgs.

Types of Feature Scaling:

1. Standardization

2. Normalization

Standardization (Standard Scaler)

Standardization is a scaling technique where the values are centered around the mean with a unit standard deviation. This means that the mean of the attribute becomes zero and the resultant distribution has a unit standard deviation. Also known as Z Score Normalization.

Normalization

The goal of normalization is to change the values of numeric columns in the dataset to use a common scale, without distorting differences in the ranges of values or losing information.

1. Min Max Scaling

The minimum value of that feature transformed into 0, the maximum value transformed into 1, and every other value gets transformed into a decimal between 0 and 1.

2. Max Absolute Scaling

maximal value of each feature in the training set will be 1. It does not shift/center the data, and thus does not destroy any sparsity.

3. Mean Normalization

It is very similar to Min Max Scaling, just that we use mean to normalize the data. Removes the mean from the data and scales it into max and min values.

4. Robust Scaling

This Scaler removes the median and scales the data according to the quantile range (defaults to IQR: Interquartile Range). The IQR is the range between the 1st quartile (25th quantile) and the 3rd quartile (75th quantile).

The Big Question – Normalize or Standardize?

What is Feature Engineering?

Feature Engineering is the process of extracting and organizing the important features from raw data in such a way that it fits the purpose of the machine learning model. It can be thought of as the art of selecting the important features and transforming them into refined and meaningful features that suit the needs of the model.

Eg: Creating a feature Area from length and breadth features in data.

Why Polynomial Regression?

Suppose if we have non-linear data then Linear regression will not capable to draw a best-fit line and It fails in such conditions. consider the below diagram which has a non-linear relationship and you can see the Linear regression results on it, which does not perform well means which do not comes close to reality. Hence, we introduce polynomial regression to overcome this problem, which helps identify the curvilinear relationship between independent and dependent variables.

LOGISTIC REGRESSION

Here we replace linear function with Logistic/Sigmoid Function

Decision Boundary – Logistic Regression

What is Log Loss?

Log-loss is indicative of how close the prediction probability is to the corresponding actual/true value (0 or 1 in case of binary classification).

Equation of Log Loss Cost Function

Learning Curve, Vectorization, Feature Scaling all works same for Logistic Regression just like Linear Regression.

Overfitting and Underfitting in Machine Learning

Overfitting and Underfitting are the two main problems that occur in machine learning and degrade the performance of the machine learning models.

Before understanding the overfitting and underfitting, let's understand some basic term that will help to understand this topic well:

Overfitting

It may look efficient, but in reality, it is not so. Because the goal of the regression model to find the best fit line, but here we have not got any best fit, so, it will generate the prediction errors.

How to avoid the Overfitting:

Both overfitting and underfitting cause the degraded performance of the machine learning model. But the main cause is overfitting, so there are some ways by which we can reduce the occurrence of overfitting in our model.

Underfitting

How to avoid Underfitting:

Goodness of Fit

The model with a good fit is between the underfitted and overfitted model, and ideally, it makes predictions with 0 errors, but in practice, it is difficult to achieve it.

As when we train our model for a time, the errors in the training data go down, and the same happens with test data. But if we train the model for a long duration, then the performance of the model may decrease due to the overfitting, as the model also learn the noise present in the dataset. The errors in the test dataset start increasing, so the point, just before the raising of errors, is the good point, and we can stop here for achieving a good model.

REGULARIZATION

We mainly regularizes or reduces the coefficient of features toward zero. In simple words, "In regularization technique, we reduce the magnitude of the features by keeping the same number of features."

Types of Regularization Techniques

There are two main types of regularization techniques: L1(Lasso) and L2( Ridge) regularization

1) Lasso Regularization (L1 Regularization)

In L1 you add information to model equation to be the absolute sum of theta vector (θ) multiply by the regularization parameter (λ) which could be any large number over size of data (m), where (n) is the number of features.

2) Ridge Regularization (L2 Regularization)

In L2, you add the information to model equation to be the sum of vector (θ) squared multiplied by the regularization parameter (λ) which can be any big number over size of data (m), which (n) is a number of features.

Advanced Learning Algorithms - Course 2

NEURAL NETWORKS

Activations (a)

Demand Prediction of a Shirt?

Image Recognition

Neural Network Layer

Complex Neural Network

Inference/Make Prediction (Handwritten Digit Recognition)

Forward Propagation

Calculation of first layer vector

Calculation of second layer vector

Calculation of last layer vector

Tensorflow Implementation

Data in Tensorflow

Activation of Vector

Building a Neural Network in Tensorflow

Digit Classification Model

import tensorflow as tf
layer_1 = Dense( units=25, activation=”sigmoid” )
layer_2 = Dense( units=15, activation=”sigmoid” )
layer_3 = Dense( units=1, activation=”sigmoid” )

model = Sequential ( [ layer_1, layer_2, layer_3 ] )

x = np.array( [ [ 0….., 245, ….., 17 ], 
								[ 0….., 200, ….., 184 ]  ] )
y = np.array( [ 1, 0 ] )

model.compile(………………)
model.fit( x, y )
model.predict( new_x )

Forward Prop in Single Layer (Major Parts)

import tensorflow as tf
from tensorflow.keras.layers import Dense
from tensorflow.keras.models import Sequential

model = Sequential(
    [
        tf.keras.Input(shape=(2,)),
        Dense(3, activation='sigmoid', name = 'layer1'),
        Dense(1, activation='sigmoid', name = 'layer2')
     ]
)

model.compile(
    loss = tf.keras.losses.BinaryCrossentropy(),
    optimizer = tf.keras.optimizers.Adam(learning_rate=0.01),
)

model.fit(
    Xt,Yt,            
    epochs=10,
)

General Implementation of Forward Prop in Single Layer

def my_dense(a_in, W, b, g):
    """
    Computes dense layer
    Args:
      a_in (ndarray (n, )) : Data, 1 example 
      W    (ndarray (n,j)) : Weight matrix, n features per unit, j units
      b    (ndarray (j, )) : bias vector, j units  
      g    activation function (e.g. sigmoid, relu..)
    Returns
      a_out (ndarray (j,))  : j units|
    """
    units = W.shape[1]
    a_out = np.zeros(units)
    for j in range(units):               
        w = W[:,j]                                    
        z = np.dot(w, a_in) + b[j]         
        a_out[j] = g(z)               
    return(a_out)

def my_sequential(x, W1, b1, W2, b2):
    a1 = my_dense(x,  W1, b1, sigmoid)
    a2 = my_dense(a1, W2, b2, sigmoid)
    return(a2)

Artificial General Intelligence - AGI

Vectorization

Dot Product to Matrix Multiplication using Transpose

Matrix Multiplication in Neural Networks

In code for numpy array and vectors

Dense Layer Vectorized

Tensorflow Training

import tensorflow as tf
from tensorflow.keras.layers import Dense
from tensorflow.keras.models import Sequential

model = Sequential(
    [
        tf.keras.Input(shape=(2,)),
        Dense(3, activation='sigmoid', name = 'layer1'),
        Dense(1, activation='sigmoid', name = 'layer2')
     ]
)

model.compile(
    loss = tf.keras.losses.BinaryCrossentropy(),
    optimizer = tf.keras.optimizers.Adam(learning_rate=0.01),
)

model.fit(
    Xt,Yt,            
    epochs=10,
)

Activation Function Alternatives

Choose Activation Function

For Output Layer

For Hidden Layer

Why do we need activation functions?

Multiclass Classification

Cost Function of Softmax

Don’t use the above way of code for implementation. We can use alternative efficient method.

Improved Implementation of Softmax

x1 = 2.0 / 10000
0.00020000000000

x2 = (1 + 1/10000) - (1 - 1/10000)
0.000199999999978

# due to memory constraints in computer rounding happens.
# inorder to avoid/reduce rounding up in softmax we can use other implementation

Multi Label Classification

Adam - Adaptive Moment Estimation

Convolutional Neural Network - CNN

Debugging a learning algorithm

Evaluating your Model

Bias and Variance - Model Complexity or High Degree Polynomial

Regularization - Bias and Variance

Baseline

Learning Curve

If we have high bias increasing training data is not going to help. It wont decrease the error

If we have high variance increasing training data is going to help. It will decrease the error

Deciding what to try next revisited

BIAS

VARIANCE

Overfitting - High Variance and Low Bias

Underfitting - High/Low Variance and High Bias

High Variance

High Bias

Bias and Variance Neural Networks

Neural Networks

Iterative loop of ML development

Error Analysis

Error analysis involves the iterative observation, isolating, and diagnosing erroneous Machine learning (ML) predictions.

In error analysis, ML engineers must then deal with the challenges of conducting thorough performance evaluation and testing for ML models to improve model ability and performance.

Error Analysis works by

This deepcheck’s model error analysis check helps identify errors and diagnose their distribution across certain features and values so that you can resolve them.

from deepchecks.tabular.datasets.classification import adult 
from deepchecks.tabular.checks import ModelErrorAnalysis 
 
train_ds, test_ds = adult.load_data(data_format='Dataset', as_train_test=True) 
model = adult.load_fitted_model()

# We create the check with a slightly lower r squared threshold to ensure that 
# the check can run on the example dataset.

check = ModelErrorAnalysis(min_error_model_score=0.3) 
result = check.run(train_ds, test_ds, model) 
result

# If you want to only have a look at model performance at pre-defined 
# segments, you can use the segment performance check.

from deepchecks.tabular.checks import SegmentPerformance 
SegmentPerformance(feature_1='workclass', 
									 feature_2='hours-per-week').run(validation_ds, model)

Adding Data

Data Augmentation

Kind of feature engineering where we make more data from existing features

For Image Text Recognition we can make our own data by taking screenshot of different font at different color grade

What is Transfer Learning ?

Transfer learning make use of the knowledge gained while solving one problem and applying it to a different but related problem (same type of input like, image model for image and audio model for audio.

For example, knowledge gained while learning to recognize cars can be used to some extent to recognize trucks.

Pre Training

When we train the network on a large dataset(for example: ImageNet) , we train all the parameters of the neural network and therefore the model is learned. It may take hours on your GPU.

Fine Tuning

We can give the new dataset to fine tune the pre-trained CNN. Consider that the new dataset is almost similar to the orginal dataset used for pre-training. Since the new dataset is similar, the same weights can be used for extracting the features from the new dataset.

  1. If the new dataset is very small, it’s better to train only the final layers of the network to avoid overfitting, keeping all other layers fixed. So remove the final layers of the pre-trained network. Add new layers Retrain only the new layers.
  1. If the new dataset is very much large, retrain the whole network with initial weights from the pretrained model.

How to fine tune if the new dataset is very different from the orginal dataset ?

The earlier features of a ConvNet contain more generic features (e.g. edge detectors or color blob detectors), but later layers of the ConvNet becomes progressively more specific to the details of the classes contained in the original dataset.

The earlier layers can help to extract the features of the new data. So it will be good if you fix the earlier layers and retrain the rest of the layers, if you got only small amount of data.

If you have large amount of data, you can retrain the whole network with weights initialized from the pre-trained network.

Full cycle of a machine learning project

Fairness, Bias, and Ethics

Bias

Guidelines

Mitigation Plan - Reduces loss of life and property by minimizing the impact of disasters.

Error Metrics

Confusion Matrix in Machine Learning

Confusion Matrix helps us to display the performance of a model or how a model has made its prediction in Machine Learning.

Confusion Matrix helps us to visualize the point where our model gets confused in discriminating two classes. It can be understood well through a 2×2 matrix where the row represents the actual truth labels, and the column represents the predicted labels.

Accuracy

Simplest metrics of all, Accuracy. Accuracy is the ratio of the total number of correct predictions and the total number of predictions.

Accuracy = ( TP + TN ) / ( TP + TN + FP + FN )

Precision

Precision is the ratio between the True Positives and all the Positives. For our problem statement, that would be the measure of patients that we correctly identify having a heart/rare disease out of all the patients actually having it.

Eg : Suppose I predicted 10 people in a class have heart disease. Out of those how many actually I predicted right.

Precision = TP / ( TP + FP )

Recall

The recall is the measure of our model correctly identifying True Positives. Thus, for all the patients who actually have heart disease, recall tells us how many we correctly identified as having a heart disease.

Eg : Out of all people in a class having heart disease how many I got right prediction.

Recall = TP / ( TP + FN )

Trading off Precision and Recall

F1 Score

For some other models, like classifying whether a bank customer is a loan defaulter or not, it is desirable to have a high precision since the bank wouldn’t want to lose customers who were denied a loan based on the model’s prediction that they would be defaulters.

There are also a lot of situations where both precision and recall are equally important. For example, for our model, if the doctor informs us that the patients who were incorrectly classified as suffering from heart disease are equally important since they could be indicative of some other ailment, then we would aim for not only a high recall but a high precision as well.

In such cases, we use something called F1-score is used . F1-score is the Harmonic mean of the Precision and Recall

Precision, Recall and F1 Score should be close to one, if it is close to zero then model is not working well. (General case)

Decision Tree Model

Decision Tree Learning Process

When to stop splitting?

Measuring Purity

Choosing a split: Information Gain

Reduction of Entropy is Information Gain

Some Initial Steps

By this we get reduction in entropy know as Information Gain. Then pick the largest Information gain for best output in decision tree.

Putting it together

Decision Tree uses Recursive Algorithm

Using one-hot encoding of categorical features

Decision Tree for Continuous valued features

Regression Trees

Using Multiple Decision Trees

Sampling with replacement

Random Forest Algorithm

Suppose that in our example we had three features available rather than picking from all end features, we will instead pick a random subset of K less than N features. And allow the algorithm to choose only from that subset of K features. So in other words, you would pick K features as the allowed features and then out of those K features choose the one with the highest information gain as the choice of feature to use the split. When N is large, say n is Dozens or 10's or even hundreds. A typical choice for the value of K would be to choose it to be square root of N.

XGBoost

# Classification
from xgboost import XGBClassifier
model = XGBClassifier()
model.fit(x_train, y_train)
y_pred = model.predict(x_test)

# Regression
from xgboost import XGBRegressor
model = XGBRegressor()
model.fit(x_train, y_train)
y_pred = model.predict(x_test)

When to use Decision Trees

Decision Tree and Tree EnsemblesNeural Networks
Works well on tabular dataWorks well on Tabular ( Structured and Unstructured data)
Not recommended for Images, audio and textRecommended for Image, audio, and text
fastslower than DT
Small Decision tree may be human interpretableworks with transfer learning
We can train one decision tree at a time.when building a system of multiple models working together, multiple NN can be stringed together easily. We can train them all together using gradient descent.

Unsupervised Learning, Recommenders, Reinforcement Learning - Course 3

What is Clustering?

K- Means Intuition

K-means Algorithm

K - Means can be also helpful for data that are not that much separated

Optimization objective of K-Means - Distortion Function

Since we are moving cluster centroid to mean and calculating cost function really shows a decrease than previous one. So it is sure that distortion function, cost function goes down and goes to convergence. So no need to run the K-means if distortion change is less than a small threshold. It means it reached convergence.

Initializing K-means

Choosing the Number of Clusters

Types of method to choose value of K

Elbow Method

  1. We plot the Cost Function for different K
  1. It will look like Elbow

Anomaly Detection

Fraud Detection and in Manufacturing Checking

Gaussian (Normal) Distribution

Anomaly Detection Algorithm

Developing and evaluating an anomaly detection system

Anomaly detection vs. supervised learning

Anomaly is flexible than SL, because in SL we are learning from the data that is available. If something happened out of the box, SL cannot catch that but Anomaly can.

Choosing What Features to Use

# Feature transformations to make it gaussian
plt.hist(np.log(x+1), bins=50)
plt.hist(x**0.5, bins=50)
plt.hist(x**0.25, bins=50)
plt.hist(x**2, bins=50)

# In code bins is changed to make histogram less box type. 
# Width of histograms decreases as bins increases.

Making Recommendations

Using per-item features

Collaborative filtering algorithm

Collaborative filtering refers to the sense that because multiple users have rated the same movie collaboratively, given you a sense of what this movie maybe like, that allows you to guess what are appropriate features for that movie, and this in turn allows you to predict how other users that haven't yet rated that same movie may decide to rate it in the future.

Binary labels: favs, likes and clicks

So far, our problem formulation has used movie ratings from 1- 5 stars or from 0- 5 stars. A very common use case of recommended systems is when you have binary labels such as that the user favors, or like, or interact with an item. A generalization of the model that you've seen so far to binary labels.

Mean Normalization

TensorFlow Implementation of Collaborative Filtering

# Instantiate an optimizer.
optimizer = keras.optimizers.Adam (learning_rate=le-1)

iterations = 200 
for iter in range (iterations):
		# Use TensorFlow's GradientTape 
		# to record the operations used to compute the cost 
		with tf. GradientTape () as tape:
				# Compute the cost (forward pass is included in cost) 
				cost value = cofiCostFuncV (X, W, b, Ynorm, R, num_users, num movies, lambda)
				
		# Use the gradient tape to automatically retrieve 
		# the gradients of the trainable variables with respect to the loss 
		grads = tape.gradient( cost_value, [X,W,b] )
		# Run one step of gradient descent by updating 
		# the value of the variables to minimize the loss. 
		optimizer.apply_gradients( zip (grads, [X,W,b]) )

Finding Related Items

Limitations of Collaborative Filtering

  1. Cold start problem How to
    • rank new items that few users have rated?
  1. Use side information about items or users:
    • Item: Genre, movie stars, studio,
    • User: Demographics (age, gender, location), expressed preferences….

Collaborative filtering vs Content-based filtering

Examples of user and item features

The features User and Movie can be clubbed together to form a vector.

User Features

Movie Feature

To predict the movie rating we can use a linear regression were w can be heavily depend on user feature vector and x can depend on movie feature vector.

For this we need vectors for User and Movies

Deep learning for content-based filtering

Recommending from a large catalogue

We always have large set of items when it comes to Movies. Songs, Ads, Products. So having to run NN instance for millions of times whenever user log in to system is difficult.

So there are two steps

Retrieval

  1. Generate large list of plausible (probable) item of candidate
    1. For last 10 movies watched by user, find 10 most similar movies
    1. For most viewed genre find top 10 movies
    1. Top 20 movies in the country
  1. Combined Retrieved items to list, removing duplicates, items already watched, purchased etc

Ranking

  1. Take list retrieved and rank them based on learned model
  1. Display ranked items to user

Retrieving more items results in better performance, but slower recommendations

To analyze that run it offline and check if recommendation, that is p(y) is higher for increased retrieval.

Ethical use of recommender systems

What is the goal of the recommender system?

Illegal Things

Other problematic cases:

TensorFlow implementation of content-based filtering

First create the 2 NN of user and movie/item

user_NN = tf.keras.models. Sequential ([
													 tf.keras.layers.Dense (256, activation='relu'), 
													 tf.keras.layers.Dense (128, activation=' relu'), 
													 tf.keras.layers.Dense (32) 
												]) 
item NN = tf.keras.models. Sequential ([ 
													 tf.keras.layers.Dense (256, activation= 'relu'), 
													 tf.keras.layers.Dense (128, activation= 'relu'),
													 tf.keras.layers.Dense (32) 
												])
# create the user input and point to the base network 
input_user = tf.keras.layers. Input (shape=(num_user_features)) 
vu = user_NN (input_user) 
vu = tf.linalg.12_normalize (vu, axis=1) 

# create the item input and point to the base network 
input_item = tf.keras.layers. Input (shape=(num_item_features)) 
vm = item_NN (input_item) 
vm = tf.linalg.12_normalize (vm, axis=1) 

# measure the similarity of the two vector outputs 
output = tf.keras.layers. Dot (axes=1) ([vu, vm]) 

# specify the inputs and output of the model 
model = Model ([input_user, input_item], output) 

# Specify the cost function 
cost_fn= tf.keras.losses. Mean SquaredError ()

What is Reinforcement Learning?

Applications

Mars Rover example

For a robot in mars trying to find water/rock/surface. The position of rover or robot is State. The reward we provide to state decide the target for the rover. It can go to left or right. The final state after which nothing happens is called Terminal State. But it will learn from mistakes.

It will have 4 values

  1. The current state it is having
  1. The action it took
  1. The reward it got for action
  1. The new state

The Return in Reinforcement Learning

Making Decisions: Policies in Reinforcement Learning

The Goal of Reinforcement Learning is to find a policy pie that tells you what actions ( a=pie(s) ) to take in every state ( s ) so as to maximize the return.

Review of Key Concepts

It is a process which defines, future depends on where you are now, not on how you got here.

State-Action Value Function - Q Function

The Final Reward Value, Discount Factor (gamma) are the one depend upon the

Optimal Policy and Q -Function

Bellman Equations

Total Return has two parts

  • The reward you get right away
  • The reward you get from next state due to our action

Random (Stochastic) Environment

Example of continuous state space applications

Lunar Lander

It mainly has 4 actions to do

We can represent them as a vector of X, Y and tilt Theta along with the binary value l and r which represent left or right leg in ground

Learning the State-Value Function

So here we apply similar to linear regression where we predict y based on a input x function

Learning Algorithm

This algorithm is called DQN - Deep Q Network, we use deep learning and NN, to learn the Q-Function

Algorithm refinement: Improved neural network architecture

The current algorithm is inefficient, because we have to carry 4 inference of action from each state.

But if we change the NN by 8 input it become much more efficient.

Algorithm refinement: ϵ-greedy policy

Start at high epsilon to get more random actions, slowly decrease it to learn the correct one. Learning will take more time in Reinforcement Learning, when parameters are not set in a correct way.

Algorithm Refinement: Mini-Batch and Soft Updates

It works on Supervised Learning and Reinforcement Learning

In case of training if 10000 examples are available, we only use a subset 1000 each time to become much faster, and little noisy.

We only choose a subset with little different from the old one. We take time to learn the old things. Soft Update take care of this. We change gradually.

The State of Reinforcement Learning

Limitation

Overview

Courses

Thanks for reading this far…..

Made By Arjunan K