One of the most stimulating aspects of our work is to be able to assist in real time to the development of new technologies, new approaches or the continuous -and vertiginous- improvement of Artificial Intelligence systems. Whether it is learning new solutions or being involved in the improvement of one of them, the work in Artificial Intelligence is constantly changing.

To keep up to date with all these developments, we can attend the multitude of Data Science conferences that are held every year around the world. Whether they are more generalist or more specialised, they are the ideal place to discover state-of-the-art solutions and share our learnings. For this reason, at BBVA AI Factory we have an extraordinary budget to attend any event or conference relevant to our work.

After this last year and a half of hiatus, many conferences are already planning their return to on-site events.

Domain specific conferences

Recsys 2021 was held last September in hybrid format (online and in-person)


Where: Seoul, South Korea
When: June 21-25, 2022
Format: on line and in-person
⎋ More information

FaccT acronym corresponds to Fairness Accountability and Transparency of socio-technical systems and it gathers researchers and practitioners interested in the social implications of algorithmic systems. Systems based on AI and fueled by big data have applications either in the public sector as in multiple industries such as healthcare, marketing, media, human resources, entertainment or finance to name a few. In order to understand and evaluate the social implications of these systems it is useful to study them from different perspectives, this is why this conference is unique on its own because of its multidisciplinarity. It gathers participants from the fields of computer science, social sciences, law and humanities from industry and academia to reflect on topics such as algorithmic fairness, whether these systems contain inherent risks or potential bias or how to build awareness on their social impacts.

Data + AI Summit 2022

Where: San Francisco, United States
When: June 27-30, 2022
Format: in-person (although some kind of hybrid participation is announced too)
⎋ More information

Formerly known as Spark Summit, the conference -organized by Databricks– offers a broad panoramic of recent developments, use cases and experiences around Apache Spark and other related technologies. The variety of topics can be interesting for many roles in the Data Science and Machine Learning arenas (e.g., Data and ML Engineers, Data Scientists, Researchers, or Key Decision Makers, to name a few), and it’s always oriented towards big data and scalable ML pipelines. The presentations typically have a practical orientation and also include hands-on training workshops and sessions with the original creators of open-source technologies, such as Apache Spark, Delta Lake, MLflow, and Koalas. The contents from the previous edition are available on demand for free here. After two editions in a digital format, this year the Summit will be held in San Francisco although some kind of hybrid participation is announced in the webpage.


Where: Seattle, United States
When: Sept. 18-23, 2022
Format: in-person
⎋ More information

RecSys is the premier international conference that aims at portraying recent developments, trends and challenges in the broad field of recommender systems. It also features tutorials covering state-of-the-art in this domain, workshops, special sessions for industrial partners from different sectors such as travel, gaming and fashion industries, and a doctoral symposium. RecSys started with 117 people in Minnesota in 2007, and has reached this year (i.e. 2022) its sixteenth edition. One of the key aspects of this conference so far has been its good mix between academy and industrial participants/works.

General conferences on ML/AI

A moment during KDD 2019

AAAI Conference on Artificial Intelligence

Where: Vancouver, Canada
When: February 22 – March 1, 2022
Format: in-person
⎋ More information

The Association for the Advancement of Artificial Intelligence (AAAI) is a prestigious society devoted to advancing the understanding of the mechanisms underlying intelligent behavior and their embodiment in machines. Its conference promotes the exchange of knowledge between AI practitioners, scientists, and engineers. It explores advances in core AI and also hosts 39 workshops about a wide range of AI applications like financial services, cybersecurity, health, fairness, etc. Check out the 2021 conference and also a review of the 2020 edition.

Applied Machine Learning Days (AMLD)

Where: EPFL. Lausanne, Switzerland
When: March 26 – 30, 2022
Format: in-person
⎋ More information

Each year the conference consists of different tracks on different topics. It is a conference oriented to the application of Machine Learning, so you can find very varied topics each year. It stands out for its good balance between academia and industry, its keynotes and, above all, its workshops. These are sessions prior to the conference itself where you really learn 100% hands on. Check out the one held in January 2020.

The Society for AI and Statistics

Where: Valencia, Spain
When: March 28 – 30, 2022
Format: in-person (still under discussion)
⎋ More information

Web description: Since its inception in 1985, AISTATS has been an interdisciplinary gathering of researchers at the intersection of artificial intelligence, machine learning, statistics, and related areas. And it is true. It is a mainly statistical conference with applications in the field of Machine Learning. It requires a good knowledge of statistics to understand the concepts discussed there and to be able to exploit it to the fullest. The invited speakers are of a high level (many from the Gaussian side) and the organizers take great care in the choice of the venue ;). Check out the last proceedings.

KDD 2022

Where: Washington, United States
When: August 14-18, 2022
Format: in-person
⎋ More information

KDD is a research conference that has its origins in data mining, but its reach extends to applied machine learning, and nowadays defines itself as “the premier data science conference”. More than other conferences on ML or Ai which are aimed at the academic research community, this one is especially appealing to people with “Data Scientist” as job title. Its main differentiators are: an applied data science track, a track on invited Data Science speakers, and hands-on tutorials. The bar is still very high technically, but a large fraction of the research comes from the real world and corporate research has a high weight.

Within KDD, at BBVA AI Factory we have been actively involved both in the organization and the program committee of the workshop of Machine Learning for Finance for the last 2 years, after participating in the KDD Workshop on Anomaly Detection in Finance in 2019. Read our report on the 2019 conference to grasp a better idea of this event!

“Two heads are better than one”. This traditional aphorism reminds us that solutions that are assessed by several people are more convenient than those based on a single opinion. A critical phase in the field of academic research is also based on this idea. The so-called peer review process concerning the evaluation of the suitability of a manuscript for publication. This process is carried out by experts in the field and serves the function of ensuring standards of quality, thereby improving performance and providing credibility.

The application of Data Science and Artificial Intelligence techniques in industry has many similarities with academia in that part of the work is based on experimentation. At BBVA AI Factory, we are more than 40 professionals including data scientists, engineers, data architects, and business experts working on a diverse number of projects relating to Natural Language Processing projects, predictive engines, alert optimization or fraud detection systems, among others. In order to improve quality, and thus make systems more robust and carry out internal audits, we have developed a methodology inspired by peer review within the academic community.

This methodology also fulfils a second purpose: to favour the transfer of knowledge and participation in a more active way in different projects, but without becoming part of the day-to-day development. The design of this methodology has been carried out as a transversal and collaborative project, following several stages.

In a first phase, we formed a working group in which people from different teams and levels of experience defined the initial points of a review methodology that would cover the basic requirements of our projects. We didn’t start from scratch; we were inspired by large technology companies in other areas that have worked on similar proposals – for instance, AI-based system auditing1 2 -, data scientists that shared their experiences in different posts3 or non-tech opinion pieces4 5. The combination of these materials and our experience in data science projects at BBVA resulted in a first version of the peer review methodology.

In a second phase, the process was open to contributions from all AI Factory data scientists, thus collecting valuable feedback that allowed us to further refine the methodology to our needs, and develop the methodology in a collaborative way so that we all felt it as our own.

With this endeavor, a flexible and modifiable system was obtained that will be applied to all AI Factory projects over the next few months. This process divides every Data Science project into five phases: 1) Initial idea – in which the objective, scope and feasibility of the project are analyzed; 2) Data – focuses on the input and output information needed to build the solution; 3) Analytical solution – covers each of the iterations of the model development process based on the features obtained in the previous phase; 4) Validation – deals with ratifying the solution through the business KPIs and the metrics used for training/validation/test sets; 5) Monitoring – where the aim is to determine what we want to monitor once the solution is in production, and from different perspectives ie. stakeholders and data scientists.

Figure 1. The five phases of the proposed peer review methodology.

Once the phases were established, we had to define how the peer review process would be carried out. At the start of a project, a reviewer team is appointed, led by a Senior Data Scientist and consisting of a minimum of two people. This team meets regularly with the reviewed team, who provide the necessary documentation to ensure that all decisions made throughout the five phases are understood.

These periodic checkpoints are mandatory at the end of phases one, two, three and four, and at the beginning of phase five, to ensure that everything is in order before going to production. In addition, and to make the process flexible according to the complexity of the different projects, additional on-demand reviews can be organized during the course of the different phases. At these meetings the review team asks questions and provides constructive feedback to the reviewed team. In the case of a positive evaluation, the next phase is completed and a one-pager document is written up reflecting the content and conclusions of the sessions.

Figure 2. The One Pager document reflects the basic information and conclusions of each phase of the project. Download the document by clicking on the image.

At the end of the process, a final checkpoint is carried out to verify that the documentation is correct and self-explanatory, in order to decide the next steps of the project. At the same time, there is a review of the whole process.

The methodology is being introduced at BBVA AI Factory through three pilot projects that cover the wide-ranging use cases present in our company: from projects aiming at building improved embeddings of card transactions, to using graphs in financial pattern analysis scenarios. We used these pilot projects to peer review the methodology itself, collecting feedback from both reviewed and reviewer teams while also identifying improvements for future versions of the methodology.

Furthermore, we are planning to include in future versions of the methodology a set of guidelines that help us improve our ability to detect bias and therefore avoid unfairness. This way members of the reviewed and reviewer team can question the fairness of the projects and set out improvements that guarantee the impartiality and inclusiveness of our algorithms. We want our solutions to be collaborative, robust and fair by design in order to prepare ourselves for the future of banking.

In 2018, coinciding with the Football World Cup, a company ventured to forecast the probabilities of each team becoming champion -the original report is not available but you can still read some posts in the media that covered the story-. Germany topped the list, with a 24% probability. As soon as Germany was eliminated from the group stages, the initial forecast was viewed as being mistaken, which led to the anecdote circulating on social networks.

The problem wasn’t the model itself, of which no details were revealed, although it was said to be based on a simulation methodology, and robust sports forecasting models are known -BTW, for the occasion of the World Cup, BBVA AI Factory also created a visualization of player and team data-. However, the problem was not the report either because it didn’t draw the conclusion that only Germany would win.

The main problem was the interpretation of the result provided by certain media and the wider public who assumed that ‘Germany wins’ would prove to be right even if the numbers said indicated otherwise: the probability was so fragmented, that a 24% for Germany meant that there was a 76% chance that any other team would win, right?

The human tendency to simplify: the “wet bias”

The fact that humans are not good at evaluating probability-based scenarios, is well known to meteorologists. In 2002, a phenomenon called “wet bias” was unveiled: it was observed that meteorological services in some American media were used to deliberately inflate the probability of rain to be much higher than had actually been calculated. In his well-known book “The Signal and the Noise”, the statistician and data disseminator Nate Silver delves into this phenomenon and goes so far as to attribute it to the fact that meteorologists believe that the population, whenever it sees a probability of rain that is too small -say 5%-, will interpret it directly as “it’s not going to rain” -and consequently will be disappointed 5% of the time-.

This suggests that humans tend to simplify information for decision making. The fact is that the 5% chance of rain, or the 24% chance that Germany would win the World Cup, should not be transformed into a black and white decision, but should be taken as information for analysing scenarios. Nate Silver, in his post “The media has a probability problem” or in his last talk at Spark Summit 2020, analyzes this limitation to build scenarios given some probabilities, illustrating it with examples of hurricane forecasting or the 2016 US elections. As Kiko Llaneras argues in his article “En defensa de la estadística” (in Spanish), every prediction has to fall on the improbable side sometime or other.

Designing algorithms correctly from scratch

Those of us who work with Machine Learning in the design of customer-oriented products believe that we should not reproduce that same error of taking the results of forecasts as absolute. It is up to us to properly understand what level of confidence a Machine Learning system has in the result it offers, and to accurately transmit it to the receivers of the information.

For example, if we want to design an algorithm to forecast the expenses that a customer will have and to inform them through the BBVA app, we are interested in being able to analyze how confident the algorithm is in each forecast, and perhaps discard the cases where we do not have high confidence.

Surprisingly, many forecasting algorithms are designed in such a way that they can induce a similar misinterpretation to the one we described in the case of the World Cup. This is because the estimate provided by a forecast model (for example, next month’s expenditure), which takes information observed in the past (expenditure from previous months) results in the form of a single value. And we’ve already discussed what could happen if we reduce everything only to the most likely value. It would be more interesting if the system were able to provide a range -the expenditure will be between 100 and 200 euros-, and aim to reduce the range when it is very certain -for example in the case of a recurrent fixed expenditure-. The system could also extend that range if it is more uncertain, case by case -for example in the case of a holiday, where our expenditure is less predictable-.

At BBVA AI Factory we have worked on a research area, together with the University of Barcelona, to try to develop this type of algorithm using neural network forecasting techniques. This research had already been discussed in other posts and has resulted in publications, including one at the prestigious NeurIPS 2019 conference1.

Thanks to this research, we now have algorithms capable of providing forecasts that result in a range of uncertainty, or a mathematical distribution function, rather than a single value, which offers us more complete information.

Can we trust the black boxes? (Spoiler: Yes, with some tricks)

However, we have to face one more obstacle: offentimes data science teams use models that they did not create themselves: models from others, from external code libraries or APIs, or from software packages. We have a forecasting system that is already in place -for example, next month’s expense estimate, or a balance estimate for the following few days-, and for some good reason it cannot be replaced. The question begs: can we design another algorithm that estimates how confident the first algorithm is, without having to replace it or even modify it?

The response is positive and has been described in our recent article, “Building Uncertainty Models on Top of Black-Box predictive APIs“, published in IEEE Access and signed by authors Axel Brando, Damià Torres, José A. Rodríguez Serrano and Jordi Vitrià from BBVA AI Factory and the University of Barcelona. We describe a neural network algorithm that transforms the prediction given by any existing system into a range of uncertainty. We distinguish two cases: firstly, where we know the details of the system we want to improve. However, we also deal with the case where the system we want to improve is a black box, i.e, a system that we use to generate forecasts but which cannot be modified. Nor do we not know how it has been built. A common real-life scenario, for example, when using software from a supplier.

This opens up the possibility of using any available forecasting system, which works by giving point estimates and, without having to modify it, “augmenting” it with the ability to provide a range of uncertainty, as schematically shown in the figure above. We have verified the system in cases of bank forecasting and in cases of electricity consumption prediction. We are providing the link to the article so that other researchers, data scientists or interested people can consult the details.

The challenge: translating reliability into a human language

With this work, we have achieved the challenge of designing a forecasting system that provides extra information. However, the key question we raised at the beginning remains unanswered: if we build products based on Machine Learning, how do we transfer this information to the end user in a way that they understand that it is a useful estimate, but might present errors?

This is still an open issue. Recently, a presentation by Apple on product design with Machine Learning shed some light on this aspect, and suggested communicating uncertain information in terms of some amount that appeals to the user. Better to say “if you wait to book, you could save 100 euros”, than “the probability of the price going down is 35%”. The latter formula -the most commonly used- could give rise to the same interpretation problems that occurred with the case of Germany in the World Cup. If humans are not statistically minded animals, perhaps the challenge is in the language.