If you want to truly understand something, you build it from scratch. This post is highly recommended to every software engineer already anticipating Skynet and perceiving AI as some kind of sorcery.
Neural Netowrk from scratch
When designing a neural network machine learning engineers will usually use a high-level library like tensorflow (my personal favorite) or a wrapper on top of tensorflow like keras. And yes, old school ML engineers are still using theano. Having all these tools at your disposal it is rather tempting to view neural networks as a black box and not spend too much time thinking about low-level implementation details like backpropagation, the chain rule, weights initialization, etc. However understanding these concepts might be crucial when fine tuning a neural network. Choosing the optimal count of the hidden layers, the optimal size of each layer, the right activation function, the learning rate, the regularization weights, dropout rate, etc. is not an easy task. Knowing how deep learning works will certainly help you debug and optimize your network. Otherwise, you will be left shooting in the dark, trying to guess the optimal configuration.
In this post I will discuss the my solution to the first assignment in the excellent Udacity program “Deep Learning Nanodegree Foundation”. Before reading any further I strongly recommend watching the video below. It is a very well structured Stanford Lecture on Neural Networks, which is discussing backpropagation and the chain rule:
At 59:07 you can see a very small implementation of a neural network. It is as simple as it gets, it has only an input and an output layer and a single hidden layer between them. Without restricting us to 11 codes of python code, let’s implement a neural network from scratch and run it.
The Data
First, we need some data. Start by downloading the Bike Sharing Dataset. Let’s take a look at the data:
This dataset has the number of riders for each hour of each day from January 1 2011 to December 31 2012. The number of riders is split between casual and registered, summed up in the cnt column. You can see the first few rows of the data on the image above.
Let’s plot the number of bike riders over the first 20 days in the data set.
You can see the hourly rentals here. This data is pretty complicated! The weekends have lower overall ridership and there are spikes when people are biking to and from work during the week. Looking at the data above, we also have information about temperature, humidity, and windspeed, all of this likely affecting the number of riders. The neural network will by trying to capture all these.
Let’s convert categorical data into binary, so that it can be used as input in the neural network:
To make the neural network converge faster, let’s standardize each of the continuous variables. That is, we’ll shift and scale the variables such that they have zero mean and a standard deviation of 1. Also known as standard scaling.
Let’s save the last 21 days of the data to use as a test set after the network is trained. The test set is going to be used to make predictions and compare them with the actual number of riders.
The Neural Network (pure magic)
Usually, you would like to make one more split and create a validation set in order to observe and control the bias and variance of the neural network. Although this is EXTREMELY important, I will skip this step here, as I want to focus on the network implementation. If you want to learn more about identifying and controlling bias and variance you can take a look at Andrew NG’s lecture Machine learning W6 4 Diagnosing Bias vs Variance.
So let’s implement the neural network:
Let’s train the network. It takes a couple of minutes…
Let’s check how well the data is being predicted:
As you can see, you can easily implement a neural network from scratch that could get you pretty decent predictions for some real world problems. Nevertheless, there is a long way to go since Skynet is operational.