Home Web internet Build an image classifier to sort images

Build an image classifier to sort images


Learn to organize and differentiate images using computer vision algorithms

Learn to organize and differentiate images using computer vision algorithms

A picture is worth a thousand words. This may be the reason why the number of images stored on our devices has increased significantly. Don’t we all recognize the trend of using an emoji instead of an overly long text response, making services like SnapChat, Instagram, and TikTok one of the most popular internet platforms.

Organizing these images on our devices can be tedious, which is why cloud-based storage platforms from Google and Apple offer automatic image sorting. These are driven by advances in computer vision algorithms, particularly developments in machine learning (ML) methods. In this tutorial, we will try to create a simple image classifier and run it on a local computer with Python installed. Just to make it more interesting, we’ll try to distinguish jalebis from samosas!

Data extraction

The starting point of any ML model development is to get well-organized data. Data for the purpose of this tutorial can be downloaded from the link here. Once the data is downloaded locally to your computer, try to extract the archive in such a way that the folder structure is preserved. You should see the following content in the folder.

Browse the contents of the data folder and you should see two more folders with the names of the foods we’re trying to categorize. Additionally, the folder also contains the Python code in the form of an interactive Python notebook (ipynb), which the reader can modify for future projects. Inside the data folder there are images of jalebis and samosas, which have been crawled from the web released under the Creative Commons license, allowing us to work with them.

Make the dataset

Now that the data has been downloaded, we will start using the Python program to build the image classifier. The recommended environment for this project is the Jupyter-Lab Python environment, but more advanced users can use any other integrated development environment. Once in your preferred Python programming environment, load the initial libraries needed to read and view images. Browse the folder where the content of the downloaded archive is located.

It should be noted that JPG images are stored as digital arrays and each number represents the intensity of that pixel. Usually 0 intensity corresponds to black and 255 corresponds to white and all numbers in between capture all 256 shades of gray. For color images, there are three channels corresponding to the red-green-blue (RGB) content of the images.

By using the Matplotlib library image reading and viewing packages, we can visualize the different channels of the images.

We will need to standardize these images, for which we will use the amount of RGB content in each of the images to describe them to the algorithms. So the dataset we create will resize all of these images to a fixed size and calculate the average amount of RGB in each image. Thus, each image will be represented using only three features.

Since computers cannot fully understand what jalebis and samosas are, we will use numbers to represent their labels. For example, a jalebi can be worth 0 and a samosa can be worth 1.

The final step in creating the dataset is to stack the two classes of data into a single array. This is a requirement for the ML framework that we will use in the next step.

We can now visualize all the jalebis and samosas in the average RGB feature space. Each marker in this plot captures the amount of R, G, and B content in these images. (see below)

Training an image classifier

Now that our dataset is ready, we can train a simple image classifier to automate the detection of jalebis and samosas. We will be using the Scikit-Learn Python package, which has a comprehensive set of tools and algorithms that can be used for ML.

The first step in training an ML algorithm is to split the data set for training and testing purposes. As the name suggests, the training set is used to train the ML algorithm and the test set is used to test the performance of it. Performance on the test set is more representative of how the algorithm will perform on new real-world data. We use 75% of the data for training and the remaining 25% for testing.

The next step is to load the Scikit-Learn Python modules that will be used to initialize our simple classifier. Here we will use the simplest of the logistic regression based classifiers. This classifier is constrained as a linear classifier and can be restrictive. However, these models can also be very useful and for this tutorial we will limit ourselves to using such linear models. We will initialize the logistic regression classifier, train it on the training set, and test it on the test set.

For this training-test randomization, we get about 70% training accuracy and 77% test set accuracy. It should be noted that these figures can vary considerably due to the random nature of the mixtures of data and the initial parameters of the model.

Once you have trained a classifier, we can use the trained model to make predictions on new data. The logistic regression classifier predicts a score for the likelihood that it thinks a certain image belongs to a certain group. This can be considered a probabilistic score.

And then ?

For now, we have explored the procedure to build a simple image classifier that can distinguish jalebis from samosas. Using this model, one can build more sophisticated algorithms. Simple extensions to this project could be to include additional food categories or try more complex algorithms. For the latter, Scikit-Learn has well documented algorithms with examples and could be a starting point.

Raghavendra Selvan is Assistant Professor in the Department of Computer Science at the University of Copenhagen