VOID, or Visual Object Identification Demonstrator is a simple concept that enables real-time data generation and analysis that can tie together with transaction data at the point of sale and generate a super-rich data set from which to derive insights and test hypotheses, all with the eventual goal of improving business outcomes.
In its simplest form, VOID can recognise facial expressions, gender, age and physical clothing as well as being able to track head movements, thus providing data on the viewing behaviour of customers. Essentially, the programme enables us to identify where they are looking / what they are looking at, and how long they are looking for.
Couple that location data with an indicator of how happy / unhappy a person is and to what demographic they belong to, and suddenly retailers have a huge amount of incredibly insightful data at their fingertips.
From making decisions on advert placement, real-time personalised promotions or store layouts, to assessing customer service standards or regional purchasing behaviour; these are insights retailers can’t afford to ignore in the age of the connected experience.
At NRF we looked at the theory of tracking head movements to change product selections.
The demo camera captures the livestream of the person looking at the screen in front of them. The algorithm analyses the images, detects the face and establishes whether the individual is looking to their left or to their right. Based on that real-time output, the programme customises the pattern on the virtual fabric; the design changing on every turn of the head. Even with the most simple of set ups (a single camera and a laptop), we were able to see good results.
Note: In situ situations would require two cameras for depth perception as well as a third at the point of sale to capture transactional data as well as customer data (attire, facial metadata like head orientation and emotion). And whilst the set up might look basic, it sits on top of many months of working with heavy duty machine learning models working off live video in our London office. This is deep learning at its core.