Why Python is the Preferred Language For Machine Learning

For developers in the field of Machine Learning, one of the most popular programming languages is Python. Python is neither the fastest language (easily overtaken by C and C++), nor is it necessarily the easiest language to learn (R and Matlab can have smaller learning curves). Then why is python used by  57% of Data Scientists and Machine Learning Developers and ranked first by the PYPL Index as one of the most popular programming languages today? As a programmer myself, I think it comes down to two things: simplicity of programming and the vast amount of libraries that Python offers.

Simple. Easy. Convenient.

Python allows developers to go from idea to product in no time. Whereas writing code for similar processes may take developers of languages such as c++ hours or even days to plan, perfect and write, python developers can write the same code in minutes. Take the following example, which explores reading a simple CSV file using c++, which I improvised from this solution to output to a matrix.


#include <iterator>
#include <iostream>
#include <fstream>
#include <sstream>
#include <vector>
#include <string>
class CSVRow
{
    public:
        std::string const& operator[](std::size_t index) const
        {
            return m_data[index];
        }
        std::size_t size() const
        {
            return m_data.size();
        }
        void readNextRow(std::istream& str)
        {
            std::string         line;
            std::getline(str, line);
            std::stringstream   lineStream(line);
            std::string         cell;
            m_data.clear();
            while(std::getline(lineStream, cell, ','))
            {
                m_data.push_back(cell);
            }
            if (!lineStream && cell.empty())
            {
                m_data.push_back("");
            }
        }
    private:
        std::vector<std::string>    m_data;
};
std::istream& operator>>(std::istream& str, CSVRow& data)
{
    data.readNextRow(str);
    return str;
}   
int main()
{
    std::ifstream       file("test.csv");
    std::vector< CSVRow > mat;
    CSVRow   row;
    while(file >> row)
        mat.push_back(row);
}

It works as it should, but the code is very bulky. Even if we ignore the fact that this is a 49 line solution, it is hard to read and understand. Now let’s look at my python version of the same code.


import csv
mat = csv.reader(open('test.csv','r'), delimiter=',')

That’s it. 2 lines. One line of actual code and one import command is all it takes for python to construct a CSV object. Best of all it’s easy to read and understand, and therefore can easily be passed on to another developer.

 

Libraries!

Python, being a simple language to read, understand, and code, is great, but where does the ease of this language integrate with machine learning? Libraries are the key here. Want to create n-dimensional matrices and implement Matrix Algebra into your program? Use Numpy. Want to perform high-level mathematical functions and analyses of your Numpy arrays? Use Scipy. Want to see a graphical visualization of your data? Matplotlib is the answer. Maybe you want to combine Scipy and Numpy to create basic machine learning models; in enters Scikit-learn. Looking for something more specific? Maybe the NLTK library for natural language processing or the OpenCV library for image analysis is a better fit. Perhaps, you want to go beyond just machine learning and leverage deep learning? Libraries like Tensorflow, Theano, and PyTorch can do the heavy lifting, while libraries like Keras can make it even easier for you to code. The point is there are a lot of resources that will allow you to quickly create your machine learning model and application.

 

Closing Thoughts

Two key leadership principles that we hold at Anant are Bias For Action and Deliver Results. It is important for developers and business leaders alike to prioritize making decisions and getting things done rather than investing large amounts of time to find the ‘perfect’ solution. It’s important to understand that python is not the only programming language that should be used for machine learning, nor is it the most optimal language. In the long run, it may be necessary to use a language other than python to perfect your machine learning process. However, python is unique in that it will allow you to quickly create a functional prototype machine learning model, that is collaborator friendly and leverages resources from a huge community of developers.

 

There is a lot of hype and interest surrounding Machine Learning nowadays, and yet most people who have heard of Machine Learning never actually try it out because they believe that they do not have the technical skills or knowledge to transform their idea into a reality. Python is that bridge between conception and product. With the simplicity of the language, the high-level libraries, and most importantly a community of developers, programmers of any skill level can create their own Machine Learning application.

 

Want to learn how you can implement your machine learning solution? Contact the Anant team and let us know!

 

Photo by Sergey Zolkin on Unsplash