How we Started!
The project was done under the course COP 315 – An Embedded Design Course, at IIT Delhi, under Prof. M. Balakrishnan. The purpose of this course is to develop innovative solutions to intriguing real life problems. We are a group of 4 individuals though not from Computer Science yet highly enthusiastic about technology and especially interested in learning more about ML and CV algorithms . We were offered this project by our mentor Anupam Sobti Sir and we were really excited to work on it at the first glance itself because of it’s vast applicability in varied concrete domains. But of course the project turned out not a walk in the park but full of hurdles and challenges. Hadn’t it been for Anupam sir who helped us a lot by giving us innovative ideas at the times when we felt lost, we wouldn’t have been here.
How was model worked out ?
When we started the project – we hardly knew anything about Caffe or raspberry pi – let alone Movidius stick!
The model works in a specific flow –
- First we use the raspberry pi to power our camera, and the intel movidius stick
- We have programmed a code which takes frames as input, feeds it into the movidius stick and thus generates output.
- For using the stick for our own objectives – we use a graph file which is made from a Caffe Model, which you are going to use! We used a Caffe MobileNet SSD Model which was trained to detect 20 objects in any image and also detecting the bounding boxes and respective confidence for each of them.
- This caffe model is converted into suitable graph file using the Intel Movidius SDK.
- After getting various classes and respective bounding boxes – we select the class with highest confidence.
- For this class – we designed certain code which results in the location of the object in the entire frame- as left or right or center!
- After getting a class – we use python E-speak to generate audio feedback!
- This process is continous and so that the feedback does not get mixed up – we have given a suitable halt time for the program – 0.7 seconds. So after giving each audio output – we stop for 0.7 seconds and then the program continues!