This article also appeared in German.
There are people in Italy who want to adopt a child. There are women in rural France who have lost their husbands. There are households in Germany that have to make ends meet on 1,000 euros a month. They all have something in common: data brokers know about them. Their personal characteristics help the advertising industry make money.
We’ve learned this and more by analyzing an internal document from the advertising industry. It consists of hundreds of thousands of entries and provides an unprecedented picture of how closely data brokers in Europe are closing in on us.
The advertising industry divides humanity into segments like: widows in the countryside. If a company wants to place ads, it can request: I only want people in this segment to see my ads. And also: people in this segment should not see my advertisement. Other segments are called „purchasing power for alcoholic beverages: high“. Or: „low earners without orientation“.
The Xandr File
In order for data brokers to create such segments in the first place, they need to get information about us. This is where cookie banners come into play, trying to get our permission to collect data. Many apps and websites examine our behavior and draw conclusions from it. They don’t collect names, but numbers they use to identify us such as IP addresses or advertising IDs. The manufacturers of operating systems give our cell phones such an advertising ID. They exist so that advertisers can track us across websites and apps. Often, it’s this number that decides what personalized ads we get served.
One of the major players in this industry is data marketplace Xandr. Companies can pick and choose which audience they want to distribute their ads to on the company’s platforms. To do so, they select segments that appeal to them and pay Xandr for them. Last year, Xandr became part of Microsoft; the name „Xandr“ is to disappear in the future. Before the takeover, Xandr had publicly posted a huge document with the names of about 650,000 segments on the web – „inadvertently“, according to the company. The document thus allows an unprecedented look behind the scenes of an industry that otherwise remains hidden.
We published the first, in-depth analysis of the Xandr file in June. It describes how Xandr and its data brokers may be violating data privacy and how German companies are involved in the business. For this investigation we joined up with The Markup, which published a US-related article at the same time.
1,900 questionable segments discovered
Now, for the first time, we present a detailed data analysis for the entire European Union. The research shows how closely data traffickers are tracking people in Europe. Over several weeks, we further analyzed hundreds of thousands of segments. We found tens of thousands of segments on the characteristics and behavior of EU citizens, and we have reduced them to particularly delicate cases. What remained were 1,900 segments that we consider to be extremely questionable.
They can all be assigned with a high degree of probability to one of 15 EU countries on the basis of their name, and they all relate to personal characteristics such as health, religion and political views. They deal, for example, with whether someone is easy to manipulate and has a lot or little money. The research makes it clear for the first time how thoroughly the advertising industry screens people on this continent.
Experts in the fields of data protection and civil rights describe the findings as „very explosive.“ Vienna-based researcher and activist Wolfie Christl writes: „It’s a scandal that this uncontrolled trade in digital profiles about personal characteristics and behaviors is still happening in Europe.“ He had discovered the Xandr document on the net and brought it to our attention.
He and other experts are calling for tougher action. After German data protection authorities announced in June that they would investigate the file and the companies involved, several data protection authorities from other EU countries now also want to examine the evidence.
We have confronted Xandr and Microsoft with the results of the research. While we did not receive an answer in June, spokespeople for Microsoft now tell us, the company takes „these matters seriously“ and complies with all laws. Companies such as The Adex and Adsquare told us that collected data would be deleted after about 90 days. Oracle – with around 232,000 segments the largest data broker in the Xandr table – wrote: They „do not have any comment to make here“.
We present the results of the data research on six interactive maps of Europe. If you move your cursor over a country, you can see examples of questionable advertising segments from there. The maps make visible the business of an industry that likes to remain hidden itself, while scrutinizing us very closely.
Health: hearing aids and fertility
Thousands of segments in the Xandr table are about health. In our EU analysis, we assigned 54 segments of concern from eight EU countries to this category. We found segments on high alcohol consumption from Austria, Germany, Denmark, Sweden, Finland and Spain. Do the users concerned know that they are traded on the advertising market as heavy drinkers?
We also found several segments on the desire to lose weight, for example from Germany, France, Italy and Spain. Do the users in question know that their personal desire to lose weight is worth money to the advertising industry? In Italy, we found a segment about people who „intend“ to adopt a child. Does the advertising industry know about their wish before their own family does?
A segment attributed to Germany referred to visiting the hearing aid manufacturer Amplifon. Its press office did not answer our inquiry.
Anyone who wants to use personal data for advertising needs the informed consent of those affected. This is required by the General Data Protection Regulation (GDPR), and supervisory authorities and courts leave no doubt about this. The requirements for consent are particularly high when it comes to sensitive data – according to Article 9 of the GDPR, this includes health data. However, this does not automatically mean that the segments we discovered fall under this category; some data vendors dispute this when asked. Data protection experts have a different opinion; in case of doubt, the courts would have to decide.
The company Weborama, a data broker from France, claims: Their segments on the subject of health are not sensitive. „We wish to confirm unequivocally that we do not engage in the processing of sensitive data“, the company says in response to our inquiry. As an example, Weborama mentions the „Maternity“ segment. Users in this segment had only visited pages with terms such as „maternity leave, infants, midwife, nursery“. But the segment does not reveal whether a person has had a baby. The Weather Company writes that its segments on asthma, migraines and arthritis are based „solely on postal codes“.
Both cases illustrate the balancing act the advertising industry performs: When it comes to data protection, many data brokers emphasize that their data is not that accurate. When it comes to customer acquisition, they emphasize that their data is certainly accurate – accurate enough to reach the desired target group with personalized advertising. After all, the industry earns billions with it.
Financial strength: „elite“ and „low earners“
A person’s value is not defined by their money. But money defines thousands of segments on the advertising market. Hardly anything seems to interest the advertising industry more than the question of how much customers have in their pockets. This is the focus of almost half of the segments we discovered in our EU analysis: 940 segments from nine EU countries.
The names of the segments are sometimes ruthless. In Germany, for example, there are the segments „Problem areas: Social housing and ordinary apartment houses“ and „low earners without orientation.“ For rich people, there are segments such as „Wealthy Elite“ or interest in „luxury cars“. In some cases, income is broken down in detail: 1,000 to 1,250 euros a month; 1,250 to 1,750 euros, and so on.
Viennese lawyer Marco Blocher works for the organization noyb („none of your business“), which campaigns for data protection in the EU. „Data on supposed financial strength is very explosive,“ he says, referring to the research findings. „Maybe I don’t even get ads for cheap products displayed because – for a reason I don’t understand – I’m considered a financially strong customer.“
Viennese researcher and privacy activist Wolfie Christl says, „the widespread use of data about people in economically difficult circumstances for online advertising is a catastrophe.“ Advertisers could target people with manipulative or even fraudulent offers. „I would be in favor of giving special protection to data about a person’s socioeconomic situation,“ Christl says. Unlike health data, for example, information on financial strength does not fall under the special protection of Article 9 of the GDPR.
Personal weaknesses: Easily influenced
It’s not just a lack of money that can make people particularly vulnerable. In our analysis of segments with a clear reference to EU countries, we found eleven questionable segments with references to personal weaknesses. In total, the table contains hundreds of such segments, many without country attribution or with reference to the USA.
We discovered the „divorced“ segment in four EU countries. Apparently, it can be used to target people whose marriages have fallen apart. After a divorce, some people feel their entire life has been turned upside down – and apparently the crisis can even be echoed in online advertising.
Another segment describes a visit to the French sex store „Sexy Center,“ with vibrators and anal spreaders on offer. This shouldn’t have to be uncomfortable for anyone – but in some circles it is.
Even before the Internet came along, more or less honest businesspeople were looking for people who could easily be taken for a ride. For the grandchild trick, for example, criminals search the phone book for first names pointing to past decades. Today, there are segments of the advertising industry that may promise better targeting. They imply: Money is particularly loose with these people. They are called „Uncertainty, unnecessary expenses,“ „often influenced by advertising,“ and „unqualified credit card customers“. However, the Xandr table does not reveal what the segments are actually used for.
Marco Blocher criticizes classifying people as particularly impressionable. „This ultimately plays on their intellect and specifically tries to entice easily influenced people to buy.“
Religion: A deepest belief
Data on religious beliefs is also particularly protected by the GDPR. In six EU countries, we found 17 segments related to religion, such as „Greek Orthodox“ or „Christianity“ in Portugal.
German data broker The Adex, a subsidiary of ProSiebenSat1, owns a segment called „religion and spirituality.“ When asked, the company sows doubt as to whether religious people are really behind the segment. A press spokeswoman tells us that segments exclusively reflect the type of website visited. For example, someone reading a SPIEGEL report on Ferraris could be assigned to the sports car segment – without having a sports car themselves.
This once again shows the balancing act of the industry: It may well be that atheists also end up in the „religion and spirituality“ segment. On the other hand, this begs the question: If a segment supposedly says nothing about people – then why is it traded on a marketplace of the advertising industry?
Children: How old and how many
Data brokers are obsessed with families with children. With 800 segments, this is the second most common category in our EU research; in total, there are hundreds of segments on this topic in the Xandr table. Some people might be reluctant to tell a stranger whether they have children, exactly how many, whether they are still small. The advertising industry wants to know just that.
Mom and dad don’t live together? The advertising industry is capitalizing on this, too. some segments are about single parents. For Germany, we found segments on single parents with adult children, single parents with teenagers and young single parents.
Most of the segments we examined do not give the impression that minors themselves are behind them; mostly it’s their caregivers that are targeted. This is in line with the fact that the GDPR provides special protection for minors‘ data: Only data of people aged 16 and older may be processed – and only if their legal guardians consent.
Segments from six EU countries that refer to „13- to 18-year-olds“ are out of line. They refer to the question-and-answer site ask.fm, which is also popular with young people. We reported more on this here.
Political views: From conservative to environmentally conscious
Advertising can influence voting behavior, especially if it targets groups with certain political opinions. The Facebook scandal in 2018 was the latest display of how dangerous this can be. At that time, it was about the British data analysis company Cambridge Analytica, allegedly supporting then US presidential candidate Donald Trump with manipulative advertising.
In the Xandr data, we discovered 84 segments from seven EU member states that we believe are related to political opinions. For example, „conservative values,“ „rural traditional,“ „liberal intellectual milieu,“ „environmentally conscious.“
We have also assigned a segment to Weborama – the company that processes „no sensitive“ data. It’s called „social and environmental sustainability.“ The company states: This segment is not a „political opinion.“ People behind this segment would only have read words like „carbon footprint“ on the web. In general, Weborama segments would only describe whether people were exposed to „word clouds“ when reading articles.
On the one hand, it’s common for market researchers to develop categories to describe different groups in society. They are mainly interested in what people buy – not who they vote for. On the other hand, it doesn’t take too much imagination to conclude: Perhaps people with „conservative values“ vote not only for conservative products, but also for conservative parties.
Do people in a segment like „environmentalism“ only get shown ads for oat milk – or sometimes right-wing propaganda that casts doubt on the electability of green parties?
The French data protection authority (CNIL) criticizes personalized political advertising. It could „unduly influence individuals when it comes to political discourse and democratic electoral processes“, writes a spokesperson in response to our inquiry.
Sebastian Becker works for the organization EDRi (European Digital Rights), which campaigns for fundamental digital rights. He warns of the social consequences of advertising that targets personal attitudes, among other things: it „reinforces discriminatory stereotypes, polarizes political opinions, and affects the political participation of hundreds of millions of people.“
From offended to cordial: This is how data brokers react
Data brokers have long since ceased to be based exclusively in the USA; German, French, Spanish or Italian companies also offered their data for use on Xandr. Behind the EU segments we examined in this research are dozens of data brokers. We sent around 20 of them a list of the segments in which their company’s name appeared. Among other things, we wanted to know how they protect personal data, asking, if they think data subjects know they process this kind of sensitive data about them?
Some responded sensitively. Weborama’s privacy team wrote that it was „difficult for us as individuals to read this kind of accusation.“ They are in compliance with all regulations. A spokeswoman for The Adex tried a counterattack. She suggested that an author of this text is dishonest and untrustworthy because he himself has a free website at WordPress.com. There are tracking advertisements on such sites.
Nordic Data Resources responded with hugging tactics. „We greatly appreciate the work of data journalists like you,“ data protection officer Ulrik Larsen flattered us. He confidently shared the link to his company’s promotional segments, including a PDF for the German market. The segments revolve around such topics as attitude toward the church, income, or personality – for example, „self-centered and passive.“ Larsen writes, „We believe people know they are being profiled when they consent to cookies and data-sharing.“
Nordic Data Resources boasts a blank slate: „we do not track, store, or own any cookies or online IDs,“ Larsen writes. His company merely targets groups of at least 15 households, he says, sorted by zip code.
There’s a catch, though. Cookies and identification numbers are still used to reach people with advertising. However, as Larsen explains, these are handled by another company, including Eyeota. In plain language, this means that if anyone gets their fingers dirty, it’s others. There can be problems with this, as Larsen admits. With one of the providers from the Xandr list, Oracle, they stopped working in 2019. It could not guarantee the consent of users unter the GDPR.
Wolfie Christl warns: „The most important trick in data trading is the claim by all those involved that they are only acting on behalf of others and that these others are responsible“. This is how any responsibility „disappears into nirvana“.
These are the limits of data research
This research can only throw a spotlight. Even the data basis – the table of the data marketplace Xandr with more than 650,000 segments – is just a slice of the global data trade, albeit the largest so far. The list is dated May 2021, so it represents the recent past. To get a picture of the EU, we generously downsized.
First, we searched for country names and abbreviations of EU member states in the approximately 650,000 segments. What remained were about 44,000 segments. In this reduced data set, we used keywords to search for segments that potentially describe personal characteristics of people. To do this, we created a list of around 300 terms, among them „LGBTQ“ or „religion.“ We manually checked the hits using the four-eyes principle. We sorted out less questionable segments and divided the others into groups. We documented our procedure and all results on GitHub.
In order to achieve the most valid results possible, we accepted that we will miss some things. It is therefore likely that the data set hides further segments that have a connection to the EU and to personal characteristics. For example, a large proportion of the segments do not provide any information about specific countries. It is therefore not possible to tell whether there are other data from EU users behind them. Other segments do not give any clues to their content through their names. The goal of the research is not a complete survey, but a minimum of transparency.
Not all of the companies behind the 1,900 segments use unique user ID numbers. Some segments of our research could also refer to contextual advertising. This is when advertisers do not look at who is visiting a page, but instead focus on the page itself. On a parenting blog, for example, it’s easy to advertise children’s products. That’s much less invasive.
So the name of a segment only provides clues about its potentially sensitive content, but no certainty. There’s also no way to tell if there are about 100 people behind a segment or a million. Even when steaming down to 1,900 segments, a blur remains.
This vagueness is probably no coincidence: It is in the interest of the corporations, which prefer not to show their cards. It protects them from uncomfortable questions.
What former employees say
We spoke with two insiders about the research. Both worked as data scientists for a company that appears in our data. They don’t want to talk about their former employer publicly. We therefore give them pseudonyms: Biscuit and Scone.
Biscuit is not very impressed by the research. They say that the users concerned have agreed to companies processing their data. For example, if you visit the website of a bicycle store and click on „accept cookies,“ you could end up in a segment for people interested in bicycles. There’s nothing „shady“ about that. The industry has „moved a lot“ in order to adapt to the wishes of users.
Scone is more critical: „I would definitely question whether users can give informed consent“. Realistically, she explains, no one could verify what kind of processing one is consenting to. „So many websites have just annoying lists of company names that don’t tell you much,“ Scone says.
Scone describes how little insight even data merchants have into their businesses. They say, merchants have segments compiled by other companies. Whether there are really nicotine fans behind a segment about above-average tobacco consumption is a question of „trust,“ as Scone explains. Statistically, they say, it’s hard to verify.
„Personally, I can tell you that some people who work with data providers do question what are ethical usages of it and what are not. But then there are people who seem to be completely unconcerned about it.“ Scone’s impression is that people outside the advertising industry either don’t talk about it at all – or they are blowing the topic out of proportion.
„Intrusive“ advertising industry: what experts say
We also shared the results of our research with experts and asked for their assessment. „The research is very explosive,“ says data protection lawyer Marco Blocher in an interview with netzpolitik.org. „It shows how broken the system is. You have to put several weeks of research into it to halfway understand what’s happening. As a normal consumer, you have no idea about it.“
The GDPR wants users to give informed consent to the processing of their data. „The main problem is that data is spreading explosively,“ Blocher says. A website might give data to hundreds of advertisers, and they might give the data back to hundreds of companies – and so on. „Informational self-determination becomes impossible.“
Viennese researcher and privacy activist Wolfie Christl agrees: „From my point of view, there can be no informed consent to sell data about our everyday behavior to thousands of companies,“ he writes. „No one knows exactly which paths this data takes and what is done with it. I suspect not even the data trading companies themselves know for sure.“ Insider Scone sounded quite similar when they described how the data brokers‘ business is based on trust.
Sebastian Becker of EDRi sharply criticizes the opacity of the advertising industry. „The results confirm the intrusive nature and lack of respect for fundamental rights within the online advertising industry.“ Not only the right to data protection and privacy are „blatantly violated,“ but also the right to freedom from discrimination.
„Question of resources“: How data protection authorities react
All three experts see the state as having a duty. „Data protection authorities should draw consequences from the research and take the trouble to work through the whole thing,“ demands Blocher. „They should check whether data brokers are violating laws and, if so, ban them from processing.“
Wolfie Christl wants authorities to take a similarly strict approach as with „tax evasion, money laundering or fraud.“ Data protection lawyer Blocher dampens expectations: „We see that authorities are very reluctant. Partly they lack capacity, partly they lack interest.“
The data protection authorities from 14 EU countries that we contacted also reacted rather cautiously. We presented them with the segments that could be attributed to their country. For example, the authority in Sweden writes they have not yet decided whether to launch an investigation into the issue. „It is amongst other things a question of resources. “ Authorities from four EU countries have not yet responded to our inquiry.
The authority in Austria points out that none of the companies from the search has a registered office in Austria. It will therefore use the search as an opportunity to contact the supervisory authorities in the relevant EU countries. The authority in France writes that it is already investigating complaints in connection with Xandr. The authorities in Belgium, Romania, and Greece say they are looking into the information. In Germany, four data protection authorities have already announced they will investigate cases after our research in June.
Even if supervisory authorities sometimes muddle along for years without any discernible effect – sometimes things do change. For example, in Norway, a country that is subject to the GDPR even though it is not an EU member, the data protection regulator recently banned personalized advertising on Facebook and Instagram, initially for three months.
So an Internet without personalized advertising is possible, and there are alternatives. It is not necessary to research the personal and intimate characteristics of millions and millions of people to find suitable ads. It’s enough if ads simply match the content of a website. A test conducted by the Dutch Broadcasting Corporation shows that this contextual advertising can also earn good money. But such cases are an exception – a turnaround in Internet advertising is unlikely to happen on its own.
Many extreme segments without country reference
Outside the EU, where even less stringent data protection laws apply, data trading is even more invasive, as the Xandr data show. Our selective search of segments with a U.S. connection reveals a wealth of segments, some of which are highly questionable: about chronic diseases such as diabetes and rheumatism, about political attitudes such as „hardcore“ Republicans or „persuadable“ Democrats. Targeted advertising can also be used to address people with „high“ debts, Asian or Native Americans, Muslims and Jews.
At the end of the research, a large dark field remains: For numerous segments, we did not find a clear reference to a country. Among them are thousands of other potentially questionable segments that describe personal characteristics. The segments deal, for example, with leukemia, skin cancer, infertility, ADHD, depression and drug addiction. Homosexuals, LGBT activists, trade union members, climate deniers and right-wing extremists. Whose identification numbers are in these segments probably remains a secret of the advertising industry.