TECH LOGIC

Tuesday, February 20, 2018

How to compute probabilities from list of boosted trees in xgboost

If number of trees == num of boosting rounds
score = sum over predicted leaf values over all trees + 0.5
prob = exp(score)/(1+exp(score)) or 1/(1+exp(-score))

If number of trees == num of boosting rounds * number of class
score_i for all class i = sum over predicted leaf values over all trees + 0.5 , for all class i
prob_i for i = exp(score_i)/sum_i_(exp(score_i))

References:

Monday, July 6, 2015

Extensive evaluation of different classifiers

TLDR: Random Forests is best thing common in both references.

From abstract of first reference "Do we Need Hundreds of Classifiers to Solve Real World Classification Problems?"

"We evaluate 179 classifiers arising from 17 families (discriminant analysis, Bayesian, neural networks, support vector machines, decision trees, rule-based classifiers, boosting, bagging, stacking, random forests and other ensembles, generalized linear models, nearest-neighbors, partial least squares and principal component regression, logistic and multinomial regression, multiple adaptive regression splines and other methods). We use 121 data sets, which represent the whole UCI database (excluding the large-scale problems) and other own real problems, in order to achieve significant conclusions about the classifier behavior, not dependent on the data set collection. The classifiers most likely to be the bests are the random forest(RF) versions."

From second reference:
"With excellent performance on all eight metrics, calibrated boosted trees were the best learning algorithm overall. Random forests are close second, followed by uncalibrated bagged trees, calibrated SVMs, and uncalibrated neural nets. The models that performed poorest were naive bayes, logistic regression, decision trees, and boosted stumps. Although some methods clearly perform better or worse than other methods on average, there is significant variability across the problems and metrics. Even the best models sometimes perform poorly, and models with poor average performance occasionally perform exceptionally well."

The two references perform an extensive evaluation of different classifiers across datasets and across performance metrics.

Do we Need Hundreds of Classifiers to Solve Real World Classifi cation Problems?
An Empirical Comparison of Supervised Learning Algorithms

Wednesday, June 24, 2015

Idea of modeling non-linearity as learning distrtibution over function spaces in feedforward neural network

I was thinking about relationship in graphical context of graphical models and feedforward neural network. On one hand, Feedforward Neural Network is a graph of deterministic functions and on the other hand, Graphical models are graph of dependence of random variable which are uncertain. Then I thought what if deterministic non-linearities can be replace with random process which generates functions and shared non-linearity can be inferred.

An interesting idea would be to learn a distribution over function space(which will be used as non-linearity in Feedforward Neural Networks) jointly with backpropagation in an EM like fashion.

To summarize, We want to replace the deterministic function with a learned function by modeling the distribution over function space and inferring the shared non-linearity in neural network.

Sunday, June 7, 2015

Squeezing space with LaTeX

I was trying to find ways to correct large vertical spaces between paragraphs. After serching a bit on internet, I got the following command along with other options. So I wanted to share it here and for my own future reference.
Remove the spacing between paragraphs and have a small paragraph indentation

\setlength{\parskip}{0cm}
\setlength{\parindent}{1em}

Source:
http://robjhyndman.com/hyndsight/squeezing-space-with-latex/
http://www.terminally-incoherent.com/blog/2007/09/19/latex-squeezing-the-vertical-white-space/
http://www-h.eng.cam.ac.uk/help/tpl/textprocessing/squeeze.html
https://ravirao.wordpress.com/2005/11/19/latex-tips-to-meet-publication-page-limits/

Make your text block as big as possible. The simplest way to do that is using the geometry package:

\usepackage[text={16cm,24cm}]{geometry}

Use a compact font such as Times Roman:

\usepackage{mathptmx}

Remove space around section headings.

\usepackage[compact]{titlesec}
\titlespacing{\section}{0pt}{2ex}{1ex}
\titlespacing{\subsection}{0pt}{1ex}{0ex}
\titlespacing{\subsubsection}{0pt}{0.5ex}{0ex}

Beware of enumerated and itemized lists. Instead, replace them with compact lists.

\usepackage{paralist}

\begin{compactitem}
\item ...
\end{compactitem}
\begin{compactenum}
\item ...
\end{compactenum}

If you are allowed, switching to double column can save heaps of space.

\usepackage{multicols}

\begin{multicols}{2}
...
\end{multicols}

If the rules say 12pt, you can usually get away with 11.5pt without anyone noticing:

\begin{document}\fontsize{11.5}{14}\rm

When you get desperate, you can squeeze the inter-line spacing using

\linespread{0.9}

There is also a savetrees package which does a lot of squeezing, but the results don’t always look nice, so it is better to try one or more of the above tricks instead.

Tuesday, May 26, 2015

DL reading list for new students in LISA LAB

https://docs.google.com/document/d/1IXF3h0RU5zz4ukmTrVKVotPQypChscNGf5k6E25HGvA/edit#heading=h.5r7p5dbrilt4

http://www.iro.umontreal.ca/~lisa/twiki/bin/view.cgi/Public/WebHome

http://www.iro.umontreal.ca/~lisa/twiki/bin/view.cgi/Public/ReadingOnDeepNetworks

Tuesday, March 31, 2015

Setting up Theano on Ubuntu 14.04

After 2 days of non-stop struggle, I was able to make theano work with my GPU NVIDIA GeForce GTX 860M.

TRY AT YOUR OWN RISK!!

Settings which worked are as follows.
DONOT install bumblebee

Run Following command to see if your NVIDIA GPU device is being detected. If not, this BLOG wont help. Sorry.

lspci | grep -i NVIDIA

Install the following:
Driver: NVIDIA 340.76
Cuda : 6.5 toolkit

Switch to NVIDIA card (If you dont have this command try to get it, without changing the graphics drivers, conflicting drivers can be blacklisted, see next point).

prime-switch nvidia

Blacklist other driver which can create conflicts:

create /etc/modprobe.d/blacklist-file-drivers.conf File with blacklisted drivers. Use command ubuntu-drivers devices to get a list of nvidia drivers:

blacklist nvidia-349
blacklist nvidia-346
blacklist xserver-xorg-video-nouveau

To list all installed Graphics Drivers (Useful while blacklisting drivers)

ubuntu-drivers devices

Note: DONOT blacklist nvidia-340

Make sure the following command works without error.

nvidia-modprobe
nvidia-settings
nvidia-smi

Find /usr/local/cuda-6.5/samples/1_Utilities/deviceQuery folder for your system.

use make command to create executable.

cd /usr/local/cuda-6.5/samples/1_Utilities/deviceQuery/
sudo make

run /usr/local/cuda-6.5/samples/1_Utilities/deviceQuery/deviceQuery

You should get the following results:

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 6.5, CUDA Runtime Version = 6.0, NumDevs = 1, Device0 = GeForce GTX 860M

Result = PASS

Install theano from GIT (First install dependencies from theano website):

sudo apt-get install python-numpy python-scipy python-dev python-pip python-nose g++ libopenblas-dev git

sudo pip uninstall theano
sudo pip install git+git://github.com/Theano/Theano.git

In /usr/local/cuda-6.5/targets/x86_64-linux/include/host_config.h, cuda-6.5 supports gcc-4.8 g++-4.8 so one needs to install these and make links to gcc and g++ respectively.
example: sudo ln -s /usr/bin/gcc-4.8 /usr/local/cuda/bin/gcc

Create ~/.theanorc File with following content:

[global]
floatX = float32
device = gpu

[nvcc]
fastmath = True

[cuda]
root=/usr/local/cuda-6.5/

Make a python file to test gpu say test.py:

from theano import function, config, shared, sandbox
import theano.tensor as T
import numpy
import time

vlen = 10 * 30 * 768 # 10 x #cores x # threads per core
iters = 1000

rng = numpy.random.RandomState(22)
x = shared(numpy.asarray(rng.rand(vlen), config.floatX))
f = function([], T.exp(x))
print f.maker.fgraph.toposort()
t0 = time.time()
for i in xrange(iters):
    r = f()
t1 = time.time()
print 'Looping %d times took' % iters, t1 - t0, 'seconds'
print 'Result is', r
if numpy.any([isinstance(x.op, T.Elemwise) for x in f.maker.fgraph.toposort()]):
    print 'Used the cpu'
else:
    print 'Used the gpu'

Once you run the above python file:

sudo python test.py

You might get an error:

Failed to compile cuda_ndarray.cu: libcublas.so.6.5: cannot open shared object file: No such file or directory

So you need to locate and configure that path:

locate libcublas.so.6.5

Add the following library path to ~/.bashrc (Second path has libcublas.so.6.5)

sudo ldconfig /usr/local/cuda-6.5/lib64/
sudo ldconfig /usr/local/cuda-6.5/targets/x86_64-linux/lib/

export LD_LIBRARY_PATH=/usr/local/cuda-6.5/targets/x86_64-linux/lib:$\$$LD_LIBRARY_PATH

export PATH=/usr/local/cuda-6.5/bin:$\$$PATH
export PATH=/usr/local/cuda-6.5/targets/x86_64-linux/lib:$\$$PATH

Now run the gpu code.

sudo python test.py
I get roughly 10x speedup with GPU.

I screwed up many times with different versions of drivers, if you do the same. You might not get a login screen (black screen). The use ctrl + Alt + F1 to goto command mode.
Remove xorg.conf
sudo rm /etc/X11/xorg.conf

sudo service lightdm stop
sudo service lightdm start

You possibly have got you login screen. You might also be trapped in loop, where login screen asks for password and goes in and again login screen asks for password. I hope you dont get this situation.
(I added:
nvidia-modprobe
prime-switch intel
to end of ~/.bashrc because there were login loop issues with got resolved by this. I am not sure but it works.)

Although I have written this for my record but I hope it might help someone. :)

Hope this helps.

Saturday, February 14, 2015

On convex Neural Networks

Breaking the Curse of Dimensionality with Convex Neural Networks - Francis Bach slides

Convex Neural Networks - Bengio et al

Popular Posts