Deep learning with its immense potential has really fired my imagination. So much so, that I signed up to do a Udacity Deep Learning course. I am going to document my journey as I learn deeply about this topic (sorry, couldn't resist the pun). Now the journey itself is going fairly slow, given that I work for one of the hottest technology companies in the Silicon Valley - talk about a first world problem.
The first project is predicting the number of bikes that a bike sharing company needs on the road to meet demand, based on past rental data. The data itself comes from the UCI Machine Learning Repository . Bike sharing companies have come in vogue around the world and the beauty is that individual rides are recorded, leading to a virtual sensor network for sensing the mobility of a city (I didn't come up with this astute observation - UCI did).
This neural network (NN) was built using libraries in Python like Numpy, Pandas and Matplotlib or as in there were no machine learning or deep learning framework used. I will spare the details of building the network itself for a subsequent blog and focus on a few interesting observations.
Observation 1: There is a ton of data and ton of data overwhelms
There were 17,000 rows of data capturing the ride rentals by the hour. There were 59 features such as windspeed, temperature and humidity to bring color to this data. The sheer amount of data was overwhelming.
Once I plotted the data (thanks Udacity - they provided most of the code and I just spent time building the NN) there was some good news. Seemed like there was a pattern here.
Observation 2: Jupyter notebook and Python rock
I am quite taken by Python data libraries and specifically Jupyter notebook. Now if only I could start getting a handle on these libraries that will be wonderful. That said, the Python documentation is quite terse and without a cookbook or code, it is hard to see what's going on. As a Java guy, I wonder why can't Java be so simple :-).
Observation 3: The magic is in the hidden layer of the NN
The input layer is where the data comes in (all 59 columns of it and iterated over all 17k rows) and the output is where the prediction comes out - simple enough.
The interesting bit happens in the "Perceptron" or each node in the hidden layer.
Each input is multiplied by random weights - cue in crazy matrix math, fed into a sigmoid function and pushed to the output layer. The sigmoid function is what converts the numerical output into a probability of the prediction being close to the actual data.
Observation 4: The real magic is in the back propagation and gradient descent
What's really magical is that the NN is fed the data on the forward pass, the prediction is compared to the actual data and an error is computed. This error is pushed back through the network into the hidden layer and, at each stage, the weights are adjusted such that the prediction starts getting closer to the actual data - wow! A learning rate is set up that adjusts the amount the gradient descent formula adjusts the weight, so that the weights are adjusted by a nominal amount and the algorithm finds the global minimum or the right answer.
We do the whole process for a number of "epochs," or iterations, that run in the hundreds. This is called training the data. Eventually the trained NN is fed a separate dataset called the testing data to see how good the network performs.
Observation 5: The joy of predicting right
Close to 15+ hours of pure assignment time - whew!
You can see that my NN is pretty spot on for most days except Christmas and the end of the year. The reason for the failure is that the data set is for two years and the NN just saw two data points for Christmas. That is not enough data for it to make a reliable prediction.
Other than that, it was pure joy getting to this point - loved it. Forget the misery of the matrix math and the fact that I don't quite recall where gradient descent came in :-) - it was three weeks ago when I wrote this, so I deserve a break.