Popular Posts

Monday, July 6, 2015

Extensive evaluation of different classifiers

TLDR: Random Forests is best thing common in both references.

From abstract of first reference "Do we Need Hundreds of Classifiers to Solve Real World Classification Problems?"

"We  evaluate 179  classifiers arising from 17  families (discriminant  analysis,  Bayesian, neural networks, support vector machines, decision trees, rule-based classifiers, boosting, bagging, stacking, random forests and other ensembles, generalized linear models, nearest-neighbors, partial least squares and principal component regression, logistic and multinomial regression, multiple adaptive regression splines and other methods). We use 121 data sets, which represent the whole UCI database (excluding the large-scale problems) and other own real problems, in order to achieve significant  conclusions  about  the  classifier  behavior,  not  dependent  on  the  data  set  collection. The classifiers most likely to be the bests are the random forest(RF) versions."

From second reference:
"With excellent performance on all eight metrics, calibrated boosted trees were the best learning algorithm overall. Random forests are close second, followed by uncalibrated bagged trees, calibrated SVMs, and uncalibrated neural nets. The models that performed poorest were naive bayes, logistic regression, decision trees, and boosted stumps. Although some methods clearly perform better or worse than other methods on average, there is significant variability across the problems and metrics. Even the best models sometimes perform poorly, and models with poor average performance occasionally perform exceptionally well."

The two references perform an extensive evaluation of different classifiers across datasets and across performance metrics. 

Do we Need Hundreds of Classifiers to Solve Real World Classifi cation Problems?
An Empirical Comparison of Supervised Learning Algorithms

Wednesday, June 24, 2015

Idea of modeling non-linearity as learning distrtibution over function spaces in feedforward neural network

I was thinking about relationship in graphical context of graphical models and feedforward neural network. On one hand, Feedforward Neural Network is a graph of deterministic functions and on the other hand, Graphical models are graph of dependence of random variable which are uncertain. Then I thought what if deterministic non-linearities can be replace with random process which generates functions and shared non-linearity can be inferred.

An interesting idea would be to learn a distribution over function space(which will be used as non-linearity in Feedforward Neural Networks) jointly with backpropagation in an EM like fashion.

To summarize, We want to replace the deterministic function with a learned function by modeling the distribution over function space and inferring the shared non-linearity in neural network.

Sunday, June 7, 2015

Squeezing space with LaTeX

I was trying to find ways  to correct large vertical spaces between paragraphs. After serching a bit on internet, I got the following command along with other options. So I wanted to share it here and for my own future reference.
Remove the spac­ing between para­graphs and have a small para­graph indentation

Make your text block as big as pos­si­ble. The sim­plest way to do that is using the geom­e­try package:
Use a com­pact font such as Times Roman:
Remove space around sec­tion headings.
Beware of enu­mer­ated and item­ized lists. Instead, replace them with com­pact lists.
\item ...
\item ...
If you are allowed, switch­ing to dou­ble col­umn can save heaps of space.
If the rules say 12pt, you can usu­ally get away with 11.5pt with­out any­one noticing:
When you get des­per­ate, you can squeeze the inter-​​line spac­ing using
There is also a savetrees pack­age which does a lot of squeez­ing, but the results don’t always look nice, so it is bet­ter to try one or more of the above tricks instead.

Tuesday, May 26, 2015

DL reading list for new students in LISA LAB




Tuesday, March 31, 2015

Setting up Theano on Ubuntu 14.04

After 2 days of non-stop struggle, I was able to make theano work with my GPU NVIDIA GeForce GTX 860M.


Settings which worked are as follows.
DONOT install bumblebee

Run Following command to see if your NVIDIA GPU device is being detected. If not, this BLOG wont help. Sorry.
lspci | grep -i NVIDIA

Install the following:
Driver: NVIDIA 340.76
Cuda : 6.5 toolkit

  • Switch to NVIDIA card (If you dont have this command try to get it, without changing the graphics drivers, conflicting drivers can be blacklisted, see next point).
prime-switch nvidia
  • Blacklist other driver which can create conflicts:
create /etc/modprobe.d/blacklist-file-drivers.conf File with blacklisted drivers. Use command ubuntu-drivers devices to get a list of nvidia drivers:
blacklist nvidia-349
blacklist nvidia-346
blacklist xserver-xorg-video-nouveau 
To list all installed Graphics Drivers (Useful while blacklisting drivers)  
ubuntu-drivers devices
Note: DONOT blacklist nvidia-340
  • Make sure the following command works without error.

  •  Find /usr/local/cuda-6.5/samples/1_Utilities/deviceQuery folder for your system.
use make command to create executable. 
cd /usr/local/cuda-6.5/samples/1_Utilities/deviceQuery/
sudo make 
run /usr/local/cuda-6.5/samples/1_Utilities/deviceQuery/deviceQuery
You should get the following results: 
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 6.5, CUDA Runtime Version = 6.0, NumDevs = 1, Device0 = GeForce GTX 860M
Result = PASS
  • Install theano from GIT (First install dependencies from theano website):
sudo apt-get install python-numpy python-scipy python-dev python-pip python-nose g++ libopenblas-dev git
sudo pip uninstall theano 
sudo pip install git+git://github.com/Theano/Theano.git

In /usr/local/cuda-6.5/targets/x86_64-linux/include/host_config.h, cuda-6.5 supports gcc-4.8 g++-4.8 so one needs to install these and make links to gcc and g++ respectively.
example:  sudo ln -s /usr/bin/gcc-4.8 /usr/local/cuda/bin/gcc
  • Create ~/.theanorc File with following content:
floatX = float32
device = gpu

fastmath = True

  •  Make a python file to test gpu say test.py
from theano import function, config, shared, sandbox
import theano.tensor as T
import numpy
import time

vlen = 10 * 30 * 768  # 10 x #cores x # threads per core
iters = 1000

rng = numpy.random.RandomState(22)
x = shared(numpy.asarray(rng.rand(vlen), config.floatX))
f = function([], T.exp(x))
print f.maker.fgraph.toposort()
t0 = time.time()
for i in xrange(iters):
    r = f()
t1 = time.time()
print 'Looping %d times took' % iters, t1 - t0, 'seconds'
print 'Result is', r
if numpy.any([isinstance(x.op, T.Elemwise) for x in f.maker.fgraph.toposort()]):
    print 'Used the cpu'
    print 'Used the gpu'

Once you run the above python file:
sudo python test.py
You might get an error:
Failed to compile cuda_ndarray.cu: libcublas.so.6.5: cannot open shared object file: No such file or directory
So you need to locate and configure that path:
locate libcublas.so.6.5
Add the following library path to ~/.bashrc (Second path has libcublas.so.6.5)
sudo ldconfig /usr/local/cuda-6.5/lib64/
sudo ldconfig /usr/local/cuda-6.5/targets/x86_64-linux/lib/
export LD_LIBRARY_PATH=/usr/local/cuda-6.5/targets/x86_64-linux/lib:$\$$LD_LIBRARY_PATH

export PATH=/usr/local/cuda-6.5/bin:$\$$PATH
export PATH=/usr/local/cuda-6.5/targets/x86_64-linux/lib:$\$$PATH
Now run the gpu code.
sudo python test.py
I get roughly 10x speedup with GPU.
I screwed up many times with different versions of drivers, if you do the same. You might not get a login screen (black screen). The use ctrl + Alt + F1 to goto command mode.
Remove  xorg.conf
sudo rm /etc/X11/xorg.conf

sudo service lightdm stop
sudo service lightdm start

You possibly have got you login screen. You might also be trapped in loop, where login screen asks for password and goes in and again login screen asks for password. I hope you dont get this situation.
(I added:
prime-switch intel
to end of ~/.bashrc because there were login loop issues with got resolved by this. I am not sure but it works.)

Although I have written this for my record but I hope it might help someone. :)

Hope this helps.

Wednesday, February 4, 2015

Scale of weight initialization

I was listening to talking machines interview(Illya Sutskuver) and he made important point on initialization scale for deep neural network.
It seems too small weights will significantly decay the signal and large would be unstable. This also brings an important point of stability issues of neural nets and connections to eigen value problem and random matrix theory as pointed out by Ryan Adams.

Saturday, January 31, 2015

Adaptive Learning Rates for NN

I was going through Hinton's lectures and I found something interesting and wanted to share.
It is a very usual case that magnitude of gradient for different layers are different. The fan-in of a unit determines the size of the “overshoot” effects caused by simultaneously changing many of the incoming weights of a unit to correct the same error. So we can use local adaptive gains $g_{ij}$ for gradients.

So update rule becomes:
\[\Delta w_{ij} = - \epsilon g_{ij} \frac{\partial E}{\partial w_{ij}}\]

How we adjust the gains is by additive increment and multiplicative decrement.
if $( \frac{\partial E}{\partial w_{ij}}(t-1) * \frac{\partial E}{\partial w_{ij}}(t) ) > 0 $
then $g_{ij}(t) = g_{ij}(t-1) + 0.05$
else $g_{ij}(t) = g_{ij}(t-1) * 0.95$
Other things to note are:
  • $g_{ij}$ should be withing some bounds like [0.1,10] or [0.01,100]
  • Use of full batch or large mini batches(nothing crazy should happen because of sampling error)
  • Use agreement in sign of current gradient and current velocity for that weight.(adaptive learning rates combined with momentum).
Updates for momentum method:
Weight change is current velocity
$$ \Delta w_{ij}(t) = v(t) =  \alpha v(t-1) - \epsilon \frac{\partial E}{\partial w_{ij}}(t) = \alpha \Delta w_{ij}(t-1) - \epsilon \frac{\partial E}{\partial w_{ij}}(t)$$
velocity $v(t) = \alpha v(t-1) - \epsilon \frac{\partial E}{\partial w_{ij}}(t)$, here $\alpha$ is slightly less than 1.
Momentum method builds up speed in directions with a gentle but consistent gradient. Use of small initial momntum $\alpha = 0.5$ and later to  $\alpha = 0.9$.