A few days back Tahmid and I wanted to learn and in the meantime do some work on machine learning. So we started to explore some of the common tools available for machine learning experimentation. Within first few days, we realized that is is not going to be an easy task. Almost all the common machine learning tools on the market had a very steep learning curve. We choose to use scikit-learn for our first endeavor at machine learning experiments. Scikit-learn is a machine learning library for Python.
My previous experience with python was not so good, especially on windows. Since my daily computer runs on windows, I had little option to choose Linux for the experimentation. The first hitch came when we tried to install scikit-learn on windows. one of the requirements for scikit-learn is SciPy library and pip install for SciPy was not possible. The official SciPy website had to say this
I tried many ways to install SciPy but all my efforts were fruitless. I tried to install winpython and other distribution of python with the library bundled but still now my efforts were fruitless and at that time my patience was completely gone and I started trying other tools for the experimentation.
But fortunately Tahmid found a way to install all the libraries properly and I followed his steps and it worked like magic. So I thought I would write this blog post to summarize the steps for anyone interested to learn about scikit-learn. Since these exact steps worked for me I would recommend anyone trying to follow the exact steps.
- Download and install the latest version of Python
- Remember to check add Python to Path. If you do skip this step you are in trouble, pip will not work for you.
- Download all the necessary library “wheel” builds from http://www.lfd.uci.edu/~gohlke/pythonlibs. In this case download
- Numpy: http://www.lfd.uci.edu/~gohlke/pythonlibs/#numpy
- SciPy: http://www.lfd.uci.edu/~gohlke/pythonlibs/#scipy
- Matplotlib: http://www.lfd.uci.edu/~gohlke/pythonlibs/#matplotlib [scikit-learn will probably run but the plotting functionality won’t work without this library]
- Pandas: http://www.lfd.uci.edu/~gohlke/pythonlibs/#pandas [Although not absolutely mandatory for scikit-learn, pandas is necessary for data manipulation]As there are many files available to download you have to be careful about which files you are downloading. For example, numpy‑1.11.3+mkl‑cp27‑cp27m‑win32.whl is numpy version 1.11.3 that runs on 32bit python 2.7. The downloads have to match the python version you have. You would need to download the latest version of the library.
- Now copy the wheels file to the core folder from which you will run “CMD”. For my case it was C:\Users\User . if you run CMD you will be able to see the folder path.
- Now install all the libraries using the following commands
pip install filename.whl
for example, to install numpy use
pip install numpy-1.12.1+mkl-cp27-cp27m-win32.whl - Now download and install scikit-learn from http://www.lfd.uci.edu/~gohlke/pythonlibs/#scikit-learn (similar to the previous libraries). Alternatively, you can install scikit-learn using the command pip install –U scikit–learn which encounters no problem in this case.
- Voila, you are good to run scikit-learn. Now to be sure you can run a few examples from scikit-learn website: http://scikit-learn.org/stable/auto_examples/index.html#general-examples