This is probably the longest and most ambitious data project I’ve ever worked on. With the covid-19 pandemic, governments across the world increased their spending on several medical equipment and services to fight the spread of the vírus. Portugal was no exception. But how much was spent? And which companies profited with the pandemic? No one seemed to know.
Using R, I’ve scrapped and analyzed all public tenders made since the beginning of February to answer some questions that we thought were prominent given that public money was being spent without the usual important but time-consuming public tenders procedures: who spent the most? Which companies sold the most? And, perhaps even more importantly, on what did the Portuguese government spend its money?
This way, I’ve found out that, by the end of October, Portugal spent 478 million euros fighting the pandemic. The biggest “winner” of these deals was one of the biggest private health groups in the country, followed by a Chinese ventilator manufacturer and a company that used to sell merchandise, but shifted their core business to imported masks.
We felt that this information should not be hidden in complex public tenders portals and excel files, so we decided to build a news application to let everyone explore the data and get a better view of what and where public health institutions, national and local governments were spending our money at.
This investigation was also made in partnership with OCCRP and journalists from other 37 countries. This kind of collaboration helped us compare Portugal’s spending with other European countries and also lead me to another investigations related to fake masks certifications.
But, as the Portuguese partner, I had to tackle a good but harsh problem: Portugal was the country that was publishing more data. What started by being an excel file with 300 contracts, quickly became a dataset of more than 16.000 public tenders as I’ve refined the search on our public tenders website. All the partners agreed that it would be useful to classify all the contracts into nine big categories. But how could I do it with so much data?
I’ve ended up developing a machine learning model to this classification based on the previous 5000 hand-made classifications. I’ve relied on natural language processing techniques to build the model. It wasn’t perfect - after a lot of o tweaks, it performed better in easier to guess contracts than others, but it saved us a lot of time.
This investigation, which resulted in three articles (published both online and in the print version of the newspaper) and a news app, was very impactful since it shed some light on the business side of the pandemic.
The fact that we also decided to publish a newsapp that allowed everyone to look at their local hospital/local government/school or any other public institution was also very important because it allowed people to scrutinize their local institutions.
For the newsapp, I’ve used vue.js to build everything and plumber package to develop an API with R.