Want to join CVI Group? Get in touch with us.
The Computer Vision and Intelligence Group of IIT Madras was started on September 2009 with a vision of building a
team of students with deep expertise in the technology of Computer Vision. The idea for the formation of such a club was seeded in 2008 when the IIT Madras team had represented India at the International Aerial Robotics Competition (IARC). CVIG which competed against elite teams, from other top Universities in IARC 2009, was acknowledged as the best vision team among all the participating teams.
Being the only club out of which a start-up has grown, CVIG has extraordinary mentorship and motivated and committed members, who have completed Industrial Projects by ITC, The Indian Railways, VDime, Eye hospital CHECK and multiple machine learning projects.
You can reach us by dropping an email to cvigroup.cfi@gmail.com or here . You can also reach us on Facebook
To know more about us you could visit our blog here

Our Activities:

As a club, we undertake projects ranging about topics from computer vision, deep learning, reinforcement learning and natural language processing. We are actively involved in projects and conduct sessions among our college community in these areas: with the hope of impacting society with bleeding edge technology.

  • Introductory sessions:  In line with our vision of nurturing a team of students with expertise in CV and AI, we conducted sessions that were aimed at introducing beginners to the world of Computer Vision and Artificial Intelligence.A wide range of applications of AI in the real world were discussed. Basics of python and concepts in computer vision, image processing and deep learning were explored using OpenCV and TensorFlow. Various aspects of feature extraction and image formation were also discussed during the sessions.We also conducted another set of similar sessions for Machine Learning where students were able to train their own neural net, understand and implement some really interesting aspects of Machine Learning.
  • Summer School: We conduct summer school for students during the month of July. We plan to conduct sessions on a few interesting concepts in Computer vision and Machine Learning in association with the Analytics club.

Competitions :

  • Hack2Innovate 2017 : A team from the group was awarded the first place in Hack2Innovate – a deep learning contest organised by T-Hub in collaboration with Samsung, Nvidia and Microsoft India. The winning teams were invited to GES – Hyderabad 2018.
  • HULT Singapore, 2018: A team from the club was selected for Singapore round at HULT 2018 for aiming to achieve the 12th United Nations. Sustainable Development Goals. Themed as “using energy to impact 10 million people”, the team aimed to  solve the issue of waste management in India by attaining complete grassroot waste segregation using an automatic waste segregator.
  • Defense Expo 2018: A team from CVI participated in the recently conducted Defense expo, where a proposal to substitute lack of GPS availability with Vertical SLAM (Simultaneous Mapping and Localisation) was proposed.
Resources to get startd in Computer vision and Machine Learning
the club's Google forum








As a club we have a completed and are currently involved in a large range of projects. Here are a few notable ones:

Completed Projects

  • Sub-pixel Super resolution – This project was aimed at implementing a CSI (TV Serial) style Resolution enhancer for images. A Deep Learning Super Resolution approach was adopted, and traditional transposed convolutions, generally used for upsampling in Deep Learning, was ditched for an Efficient Subsampling Method based on the Phase Shift approach.
  • Cell Boundary Identification – This project, was a part of Government of India’s innovation challenge. It aims at automated identification of cell boundaries from the pathological slides. This holds a tremendous potential for cancer diagnosis. It involves decomposing a video of the sample into frames and identifying cell boundaries.Input image:
    Segmented image:
  • Fun with FacesThis project was a small scale attempt at Snapchat with a few filters which would add certain add ons to your face. The final version used a library called dlib, which provides facial feature points and using a few machine learning algorithms, real time results were achieved.Initial prototype –
  • Rubik’s cube solverUsing images of the sides of a Rubik cube, this project will be able to solve it within seconds and it will give a 3d simulation and the steps required to solve it.
  • Expression Morphing – This project could morph expressions to one’s face, to make him smile or frown. It used a few libraries which would help obtain the facial keypoints and after a certain lines of code provided a final image which would have an expression decided by the user.

Current Projects

  • Integrated detection and tracking of multiple objects – Object Detection and localisation tasks on images have posed a lot of challenge to Computer Vision Scientists for a long time. With the advent of Artificial Neural Networks, there was an efficient solution for these tasks. In this project we have attempted to use the state-of-the-art Convolutional Neural Networks to detect objects and localise them in certain frames of videos, while using conventional methods of CamShift and MeanShift from OpenCV to track them in the rest of the frames. This combination ensures good accuracy and enables good real-time speed up. In an attempt to achieve good computational efficiency while maintaining good accuracy, we present this project.
  • Face Liveliness detection – Face recognition systems are widely used at a commercial scale for security accesses. Though the systems have grown quite good at the recognition task, they are quite prone to intentional hacks. We attempt to build a face Liveliness system that distinguishes between a real face and a copy of a face.
  • Automatic Waste segregation – To design an intelligent dustbin which would segregate waste into different categories with the help of a camera and other sensors, using deep learning.
  • Hand Gesture recognition for swarm bots – With the ever growing popularity of usage of drones for consumers and businesses, the inevitable necessity of fleets/swarms of drones being deployed, is in the very near future. Swarm technology is an approach to coordination of bots within a multi bot system. It was inspired by studying ants and other tiny organisms to know their behavior and apply the same to make multiple robots coordinate to perform a certain task.The team is trying to use image processing (Open CV) for recognising the drones indoors.They are also working on various communication protocols for coordination among the various drones.
  • Face Recognition –  This project is a part of CFI Jarvis which aims at making the new CFI building smart. This project focuses on the security features of the new building. It will continuously scan the environment and grants access to those who have been registered in the database of users. Future features will include face search through the institute database and alerts in the case of an attempt to breach security protocols.
  • Voice App Control – This project, uses one’s voice to control mobile phone apps. By using Natural Language Processing techniques, this app will be able to seamlessly control your phone’s interface.
  • AutoCaption – This project, as the name suggests, automatically provides captions to an input image using state of the art neural networks. It also provides hashtags.
  • Video Stabilization for UAVs – The primary objective of the project is to ensure proper clarity of a video taken from a moving camera source. This will help in retrieving useful features which otherwise will be rendered useless due to camera shake.
  • Evaluation of OMR – Using computer vision and machine learning techniques, one could make the process of evaluating the OMR easier and more accurate compared to the traditional methods that currently exist.
  • CNN SLAM – Simultaneous Localisation and Mapping ( SLAM ) is a rather useful addition for most robotic systems, wherein the vision module takes in a video stream and attempts to map the entire field of view. This allows for all sorts of “smart” bots, such as those constrained to perform a given task. A really good read onto what SLAM is can be found here. SLAM is usually known as an egg-or-chicken problem – since we need a precise map to localise bots and a precise localisations to create a map – and is usually classified as a hard problem in the field of computer vision.This project explores fusing key components of CNN imaging and geometric SLAM, where deep vision based monocular depth predictions are used in combination with geometry based SLAM predictions. The objective being to see if Deep vision can impact robotic SLAM, which has otherwise been largely disjoint from developments in the former field.Our main inspiration comes from the demonstrated work at CVPR 2017 by Tateno, Tombari et all. The original paper may be found here: link. Since then there have been quite a few developments in each of the pipeline changes. We are working on extensively experimenting with the same and reporting our findings.
  • Crowd Analysis in Banking environments – This is a part of the SBI stockroom problem set. The SBI Stockroom project is mainly aimed at developing an automatic customer identification system for banks,using facial recognition and handwriting matching.  As a part of the facial recognition module ,we aim to develop an algorithm to recognise faces in a crowded banking environment. This should help make the entire process more smooth and consumer friendly.
  • Deep Fake ( Deep Learning + Fake ) is a human image synthesis technique using artificial intelligence methods. It has often been used for doing face swaps, especially with celebrities. This project aims to develop a robust and fast face morphing system for images and video, using Autoencoders and GANs ( Generative Adversarial Networks ) .The main application of the project is to make personalised marketing software for clothes and fashion items in online retail stores.We aim to extend the previous work done by the subreddit of the same name. In addition to swapping two given faces, we improve on the previous work to make it more usable for deployment in the following manner:
    • Computing a metric for semblance of two faces. We have decided to adopt a facenet type anchor while defining corresponding loss metrics.
    • Swapping for multiple people in a given frame.
  • Forest Surveillance Project: India has a major part of its land area as forests. A lot of resources and effort goes into monitoring these forests and the wildlife in them and preventing forest fires, poaching etc. This project essentially aims to invoke the principles of Deep learning in forest surveillance and animal migration analysis and intelligently monitor forests. We envision to build an integrated module which can detect forest fires, animals etc so as to make this entire process of monitoring forests more efficient and reduce the burden on the people involved in it.
  • Lip Reading and AVR –  This project aims to use dual channel convnets to perform Audio-Video recognition as well as synchronisation. Synchronisation is done to ensure that there is no lag between the audio and video parts. Also, in a particular application of this project, we are also handling Lip-reading.