After a few years as a structural Engineer, and embarking on a masters in data science, the opportunity to participate in a big data competition with Hilti sounded perfect. Hilti is known for their innovative inventions and for making the best concrete bolts and drills in the world.
The biggest hurdle in making a submission was dedicating a substantial amount of time between Christmas and New Years Eve to the cause. Hilti’s brief asked for a recommendation engine to be constructed using open source tools (such as R and python) and a large data set that they provided. The data consisted of details on approximately one million customer purchases each including product codes, prices, and characteristics of the customer. Our job was to use this information to provide a tailor made list of products that might appeal to each customer.
I used a Random Forests based approach implemented in R to come up with my recommendations, which were tweaked and combined to cater for the large number of product categories available to recommend. One of the four items in the tailor made list was a completely random product which adds an element of ‘risk’ or ‘exploration’ to the system. It is intended that this ‘risk’ element will be utilised or explored in another multi-armed bandit algorithm (see how Google performs multi-armed bandit experiments on websites here) that will run ‘live’. Multi-armed bandit’s learn what the best solution is between options by running trials which exploit inferred knowledge about the users’ preferences but also dedicate a fraction of the trials to exploring riskier options and possibly capitalising on these.
So far the competition has been rewarding and we’ve just made it to the semi-finals. A hilti drill is up for grabs to the second and third place finalists and the first prize is a trip to one of Hilti’s strategic IT locations so wish me luck!