Scaling Simple Solutions to Complex Data Science Problems

Jose Antonio Rodriguez Serrano AI Factory Talks

Admittedly, I always find it difficult to narrowly define the term data scientist in a few words. For instance, the expertise of our data science team is the opposite of narrow with competencies in machine learning, software development, data visualization, design, statistics, economics or social science. What ties us all together is our capacity to solve problems. Additionally, we must find efficient solutions within constraints of time and resources, sometimes keeping the philosophy that “a done something is better than a perfect nothing”.

In a previous life as corporate researcher in computer vision, I was dedicating my time to conducting state-of-the-art research and embedding their outcomes into industrial products. During that period, I had to prioritize machine learning methods with a good trade-off between practicality and quality. A good “design pattern” I found was to use methods that can be dubbed as data proximity algorithms. Machine learning professor and author of “The Master Algorithm: How the Quest for the Ultimate Learning Machine Will Remake Our World” Pedro Domingos calls them analogy-based algorithms and exemplifies them as follows:

“Faced with a patient to diagnose, analogy-based algorithms find a patient in their files with the most similar symptoms and assume the same diagnosis. This may seem naive, but analogisers have a mathematical proof that it can learn anything given enough data.”

With sufficient data — and nowadays we are surrounded by a lot — data scientists are witnessing how these methods can provide solutions that are generic, fast, and conceptually simple to implement. They tend to have a short time-to-validation (ideal for the fans of the “fail fast” philosophy) – while obtaining performances near or above the state-of-the-art in some applications. Those methods are particularly compelling for data scientists who must quickly find solutions to a diverse set of problems.

In a recent Databeers event in Barcelona I argued for the emerging necessity for data scientists to consider methods that scale to lots of problems. My talk is 6-minutes short and rather inspirational than technical – but it does contain some technical references for the specialized audience. Enjoy!

Thanks a lot to the organizers for the invitation and to the public for their interest.