Predicting admissions into UCLA with a Neural Network

This blog documents my journey as I learn Deep Learning through Udacity. 

My personal challenges

I have been a Java guy for the last 18 years of my career with a cursory understanding of Python. To top it, I haven’t actively coded in the last 7-9 years :-(. I have found that bringing in Python with its data libraries (Numpy, Pandas, Matplotlib), the theory around Deep Learning, Matrix math and frameworks such as Keras is turning out to be a significant cognitive load.

That said, the problem space is deeply fascinating and my personal neural network prdicts that my blogs are going to about furthering my understanding how to break a problem down vs the actual coding of the problem ;-(. In this particular case, Udacity gave the code to help introduce students to Keras. 

The problem

The problem is quite simple - “A student is about to apply to UCLA for admissions and given the past GRE test scores, GPA grade scores and a class rank (between 1-4), she wants to understand if she can make it to UCLA”. Data is from http://www.ats.ucla.edu/. The data looks like the following where “admit” = 1 means that the student was admitted to the school.  

As you can see that the data set is fairly hard to understand. The first step was to plot the data. As you can see from the image that the data is not linearly or logistically separable. This is where Neural Networks shine.

 

Steps to build a Neural Network

NNs either classify data on a yes/no axis or provide a grade on a gradient. NN can be thought of as a tree where every node is a regression. The “sum” of each of the linear regression nodes results in an extremely sophisticated regression model that helps us classify the data.

So here are the steps to build a NN…

 

Understand the data

In this example, we drill down on the data on the third category “rank” to see if tells us anything special.

It does - seems like rank has a significant bearing. The higher the rank (1 being higher than 4), the greater the chances of the student being admitted.

Pre-process the data

One Hot Encode Data

We will use rank to one-hot-encode data. One-hot-encoding is to split the data into numerous columns such that the data is binary. In this case, rank will be converted to four columns rank_1, rank_2, …rank_4 and a student will with rank 1 will be represented as rank_1 = 1, rank_2 = 0, rank_3 = 0 and so on.

Scale Data

The next step is to scale the data. Here GRE is on a scale of 800 while GPA is on a scale of 4. For easier processing of data both are converted on a scale of 0-1.

Here you can see the data both one hot encoded and scaled.

Train the model

The next steps are to split the data into a training set and a testing set. You train the data on a training set and test the model on the testing set to see if things are working as you expect.

Once you have split the data, you now need to split the data into features and targets. Features (x-axis) are what are used to predict the target (answer or y-axis). In our example gre, gpa, rank_1..4 are features while admit is the target.

Defining the Neural Network

A NN has numerous components. Let’s break the next picture down. On the left are the features that are fed into the NN and on the right is the answer. There are number of layers where the features data travels through and gets processed. The processing tiers are shown as rounded boxes (these are actually called Perceptrons). A good number of these layers is what makes a NN into a Deep Learning NN. In this case we have 3 stage NN. The activation function gives a probability of the right classification on the features.

As you start off and feed the features, you provide some random weights to seed the NN. Then each stage runs a linear regression to predict the answer and that answer is fed into the subsequent layer. This eventually turns into the final answer.

Now, you don’t quite get the right answer on the first shot and this is where back-propagation comes in. The error between the predicted data and expected data is fed back through the network and each node updates itself to get closer to the answer. You do these a few hundred to a few hundred thousand times and voila you start getting close to the answer.

Keras makes it easy

In my earlier blog, Udacity had asked us to build the NN using Matrix math. Let’s just say doing Matrix math for each of the perceptron layers was the least fun aspect of the whole process. Keras makes it look easy - that said I am not yet conversant with APIs. Here is the simple definition of the NN (code credit Udacity)

from keras.models import Sequential
from keras.layers.core import Dense, Dropout, Activation

# This is the model
model = Sequential()
# This is the first layer
model.add(Dense(128, activation='relu', input_shape=(6,)))
model.add(Dropout(.2))
model.add(Dense(64, activation='relu'))
model.add(Dropout(.1))
model.add(Dense(2, activation='softmax'))

# Compiling the model
model.compile(loss = 'categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

 

Test the model

Once the model is trained, feed the test data to the model to see if it can predict the answer. In our case, we had split about 10% of the data as test data.

score = model.evaluate(features_test, targets_test)

The final answer we get from the model is 70%. Thus, we are able to predict whether a student can be admitted with 70% accuracy.

Summary

So what did I learn from the exercise:

  1. You don’t need data in hundreds of thousands. One of the key arguments I hear about not using NN is that we need tons of data. Here we had 400 data points and we could predict the answer with 70% accuracy. That’s pretty huge.
  2. Thanks to Keras - I don’t quite need to build NN’s by doing matrix math myself. The framework looks encouraging but I was quite taken aback by the number of in-the-know parameters “categorical_crossentropy”, optimizer=’adam’ - this is not going to be easy.
  3. NN’s may be easier than I thought. I am encouraged to go through the next excercise and write a blog for it.