Being concise is often a virtue and sometimes it is also a necessity. It’s especially interesting to explain something as complex as data science in just 6 minutes. This was the idea behind the Pecha-Kucha talks at the NetSci2015 conference of the Network Science Society in Zaragoza, Spain. The Pecha-Kucha presentation style recommends 20 slides for 20 seconds each, forcing the speaker to give a concise, fast paced, and entertaining talk. That’s how it was when our data scientists Daniel Villatoro and Dario Patanè presented “Reference Generation: A Method for Venue Recommendation”.
Dani and Dario gave their talk at NetSci Backstage, the Satellite Symposium breakout sessions of the conference, where they raced through a high-level view of our then-pilot product, Latlr_. Latlr_ was a venue recommendation system that could contextualize user parameters such as location, time of request, or user’s past history. While there are other well-known systems in the market, such as Foursquare, GoWalla, or Google Places, we didn’t have to build a user history from scratch, thus avoiding what is known as the “cold start” or “new user” problem. We have extensive purchase histories in a dataset derived from BBVA bank card records. The data is anonymized so no specific person can be identified, but it still has enough detail to create consumer preference categories. From this data we built networks of preferences based on co-visitation, that is, if two venues are visited by the same person, and enough people also exhibit the same behavior, once a critical mass is reached preferences can be inferred for other similar clients. Business are the nodes and clients in common are the links in the network, with the number of common clients strengthening the link until D&A can create a graph something like this.
With preferences clustered, the next step is to find things that are interesting to the client personally, since a general recommendation, such as just listing the most popular venue, isn’t good enough for most people. For this we used what’s known as Tanimoto or Jaccard’s distance. Simply put, that’s the number of things one user has in common with another user or user cluster divided by all the things all the users in question like, including all the things the first user has in common with other users and all the things they don’t, which was quickly explained to the audience with this intuitive diagram:
At this point all the application needed was some context. Being a mobile app, it knew the user’s location, or the user could override that and enter a location where they intended to be later. Then the user would tell it what type of businesses they were looking for. They could even override their default preferences and tell the app they wanted a business similar to another in a different location and the app would look for similar preferences in their location and list the choices in the appropriate order.
Of course recommendations to consumers is not the only application of this type of algorithm. It can also be used for Business Intelligence, such as businesses looking to understand local competition.