
A complete guide to understanding convolutional neural network
The monumental growth of AI (Artificial Intelligence) has bridged machines and humans in terms of capabilities to a significant extent. Presently, enthusiasts and researchers are working on various aspects of AI, expecting to come up with amazing things.
You might have come across the term ‘convolutional neural network’ in this context. Also referred to as ConvNet or CNN, convolutional neural network is a term used in deep learning. This is a category of deep neural networks. Most commonly, CNN is applied in evaluating visual imagery.
Computer vision is one of the most researched disciplines in AI. Machines are able to perceive the world in a way similar to humans through computer vision. They can even use an optimal use of knowledge and intelligence to perform a multiplicity of tasks. These include:
- Recognition of video and image
- Classification and analysis of image
- Recommendation systems
- Natural language processing
- Media recreation
Convolutional neural network is a particular algorithm, which has helped computer vision evolve with deep learning.
What is a convolutional neural network?
A CNN can be defined as a deep learning algorithm. This has been designed in such a way, that it can receive an input image, assign different objects and aspects with the necessary importance in the image and then differentiate them from one another. In ConvNet, the pre-processing involved is lower, when it is compared to the other popular algorithms. The filters, during the primitive methods, are engineered by hand. CNN, with adequate training, can learn these characteristics and filters.
The CNN architecture is comparable to the pattern of connectivity of Neurons, present in the brain of humans. Neurons, on an individual basis, show response to the stimuli in a restricted area only in the visual area. This is referred to as the Receptive Field. Multiple fields overlap each other, covering the overall visual area.
How do convolutional neural networks work?
At the outset, it should be noted that convolutional networks do not view images in a way human do. This makes it necessary for the researchers to think differently, regarding what a particular image indicates, when the convolutional network processes it.
Images in convolutional networks are perceived in terms of volumes, or 3D objects. These are different from the flat canvases that you can gauge in terms of height and width. The reason is, in digital coloured images, you have an RGB (red green blue) encoding.
These three are the primary colours. On blending them in different proportions, one can obtain the spectrums that can be perceived by humans. In a convolutional network, these images are absorbed in three distinct strata of shades that are stacked one above the other.
Therefore, a normal image with a particular colour appears to be a rectangular box in a convolutional network. The dimensions of this box can be measured in terms of pixels along with the dimensions. The depth in these cases consists of three layers, with the RGB colours placed one above the other. These are known as channels.
In a convolutional network, when the images move, they are defined in terms of output and input volumes. They are mathematically expressed as matrices, which have several dimensions. It is necessary to closely watch the accurate measures of each dimension of the volume of the image, as they form the basis of linear algebra operations, which is necessary during image processing.
A number is used to express the intensity of red, green or blue for each pixel present in an image. This number is an element that remains stacked in 2D matrices. Together, they constitute the image volume.
These numbers are actually the raw and initial sensory features, which the convolutional network feeds on. The purpose of CNN is to detect the numbers that serve as significant signals, and can help to group together images with a greater accuracy.
A convolutional network does not deal with a single pixel at a time. It consumes large square pixel patches, which are then passed through a filter. The filter has a small square matrix, which is not as large as the image. The size of this matrix is equal to the patch. This is also referred to as a kernel. The filter successfully finds the pixels and patterns.
Benefits of convolutional neural networks
The primary motivation behind the inception of CNN in the context of deep learning is the ability to address the constraints of traditional neural networks. The traditional networks, when used in image classification, cannot scale properly as they have several connections. These are actually disproportionate and eventually fail to work in the desired way. In CNN, some new ideas are included, that is effective in enhancing the power of deep learning networks.
Here are some of the benefits of convolutional neural networks:
- Scant representations
In image classification problems with large pictures, millions of pixels are present. A traditional neural network will use multiplication operations of the matrix to model the necessary knowledge, involving all the inputs and parameters. As a result, billions of computations will be involved in the process.
However, in CNN, the kernel is smaller, as compared to the input. As a result, the number of computations necessary is reduced, while the model is trained to make predictions.
- Sharing parameter
Parameter sharing is one of the techniques of optimisation, used in CNNs. It refers to the fact, that the same parameters are reused by CNNs while performing various functions. Besides, it implies that the weight parameters used in different positions of the input will enable the model acquire knowledge on a single weight set, rather than a different set for each of the functions. Therefore, in CNNs, parameter sharing helps to save memory significantly, than the traditional models.
- Equivariance
This is a particular category of parameter sharing. One can consider a function to be an equivariance if changing the input results in a similar change in the output. It is evident that in operations related to data transformation, the convolutions are equivariant. Therefore, one can predict how certain changes in the input can affect the output.
However, CNNs involve a high computational cost and need large amounts of training data. Sparing these two disadvantages, they have contributed to deep learning significantly.
Ankita Sinha, Co- Founder & CTO, Gravitas AI