Everything we do on the Internet is being recorded and analyzed in order to achieve one goal: to show us targeted advertising. This is a reality to which many people have become accustomed in exchange for free services. However, very few people understand exactly where our data ends up when we visit websites, use apps or make digital payments. Targeted advertising moves in mysterious ways. That’s another fact we’ve become accustomed to.
An investigation by netzpolitik.org is set to change this fundamental imbalance between the adtech industry and internet users. In June, we published a series of articles shining a light on the collection, trade and use of personal data in the global adtech industry. We analyzed an inventory file from a US-based data marketplace called Xandr. The file contains more than 650,000 so-called audience segments. These are used by advertising companies to categorize and target billions of people.
The scope and detail of this data collection is staggering. There is hardly a human characteristic that advertisers do not want to exploit for their purposes. Want to reach people in Denmark who have bought a Toyota? No problem. Italians with financial problems? No problem. Minors in Austria? Hardcore Christians in Portugal? Pregnant women in Poland? Fragile seniors in France? Queers in Spain? No problem.
“It’s the largest piece of evidence I’ve ever seen about what I call today’s distributed surveillance economy”, says Wolfie Christl, a privacy researcher at Cracked Labs. He discovered the file and shared it with netzpolitik.org and US-based non-profit news website The Markup. “The reporting confirms that the global surveillance ads industry is unacceptably intrusive and poses a threat to democracy”, comments Jan Penfrat from European Digital Rights organization EDRi.
The file, which dates back to May 2021, shows metadata for a total of 651,463 audience segments. It includes a name and number for each segment, as well as the company that provided it to Xandr and an ID for that data provider. It looks like this (extract):
- Lotame | 422 | 4073004 | International_EU – France Alcoholic Beverages
- Lotame | 422 | 4073353 | International_EU – France Automobile Brands – Land Rover
- Lotame | 422 | 4073669 | International_EU – France Browser Language – Arabic
- Lotame | 422 | 4073677 |International_EU – France Credit Level – Poor
- Lotame | 422 | 4073768 | International_EU – France Dads
- Lotame | 422 | 4072781 | International_EU – France Forum Readers
- Lotame | 422 | 4072930 | International_EU – France Military
- Lotame | 422 | 4073729 |International_EU – France Relationship Status – Divorced
Audience segments work like giant containers for groups of people who are likely to share a common characteristic. That might be demographics, interests, consumer behavior and personality traits. Additional information can also end up in segments, about what apps and websites we use, where we go, what we believe, what illnesses we have. Adtech companies collect and trade these segments like commodities, meaning that people’s data often passes through the hands of dozens or hundreds of companies.
40,000 segments on EU countries
Targeted advertising is an industry worth more than 550 billion US dollars. Xandr is one of the most important infrastructures in this ecosystem for those who don’t want to depend on the walled advertising gardens of Google, Meta or Amazon. In 2022, Microsoft acquired Xandr from U.S. telecommunications provider AT&T. Neither Xandr nor Microsoft responded to multiple press inquiries from netzpolitik.org and The Markup.
The file in question, containing over 650,000 segments, provides a rare insight into the global advertising surveillance economy, not only in the US but also in Europe. It was hidden on a documentation page for advertising clients, but was accessible for anyone via the open web. It was taken down shortly after our initial email to Microsoft and Xandr. An archived version of the website und and the file [23 MB] can still be found at the Internet Archive.
Our analysis is far from complete. We encourage everyone to make their own assessment of the file. Researchers or journalists wishing to discuss ideas or findings can contact Ingo Dachwitz at ingo.dachwitz ett netzpolitik.org.
While the vast majority of segments do not refer to a specific country, tens of thousands do have such a reference. Some have a country code such as „ES“ for Spain in their name. The segments in the file cover most regions of the world, showing that adtech surveillance is global.
The largest group of segments with explicit reference to countries is the European Union. According to our analysis, about 40,000 segments mention EU countries or nationalities, while the file contains about 30,000 segments that explicitly mention the United States. Thousands of segments refer to Australia and South America, some even mention China or African countries like Nigeria.
However, as Xandr is a US-based data marketplace, it can be assumed that the majority of segments without an explicit country reference is targeted in the United States.
Data on almost every European citizen
Our analysis of the file shows that within the European Union, the countries mentioned most often are France and Spain, each with around 9000 segments. They are followed by segments mentioning Germany (about 6000), Portugal (about 4500), Italy (about 3500), the Netherlands (about 3000), Sweden (about 1500) and Denmark (about 1000).
The file does not contain information on how many different entries each segment contains. However, it is known that there can be hundreds of thousands or even millions of different IDs in one segment. Oracle alone, the largest provider in the Xandr list with more than 200,000 segments, claims to have data on more than five billion people. It is therefore reasonable to assume that the adtech industry holds data on most of the citizens of the European Union.
It is important to note that segments are typically not people’s names, but individual IDs linked to people’s devices. Those could be mobile ad IDs, IP addresses, browser fingerprints or cookie IDs. Adtech companies stress that this means they are working with pseudonymized data, sometimes even falsely claiming that their data is „anonymized“. Nevertheless, the IDs allow ad companies to recognize the devices associated with people with certain characteristics anywhere in the online advertising ecosystem.
Advertisers can use the segments through so-called demand-side platforms to target the audiences they want, but they don’t usually get access to this raw data. That’s why many companies in the industry reject the term „data broker“ to describe their business. They prefer to call themselves technology platforms, advertising infrastructure service providers or location intelligence platforms. But these companies take data from disparate sources, reorganize and repackage that data, help track and reach people across different devices, and offer it to other companies for use in exchange for money or other economic benefit. That’s why we chose to call these companies data brokers.
Time and again, it also comes to light that some of these companies do sell raw data after all, for example to the FBI or to the United States Immigration and Customs Enforcement.
Mercedes, mothers and military
It is difficult to understand the file in its entirety. This is not only due to the size of the segment collection, but also because the category names are structured very differently from one data vendor to another. The list also contains some segments created specifically for individual advertisers. Over 50,000 segments are labelled „custom“. According to the Xandr documentation, these are segments that cannot be used by all advertisers. Instead, the provider only unlocks them for specific clients.
Despite this complexity, The Markup has performed a data analysis that shows at least a rough frequency distribution of some higher level categories. According to this, segments related to the automotive sector are the largest group. Advertisers can use Xandr’s data to target, for example, fans or owners of a particular make of car, or people whose household has more than two cars and who drive more than 32,000 kilometers a year. More than 1,000 segments can be found for the keyword „Mercedes“ alone.
The second largest group is demographics. Advertisers can select not only by gender or age, but also, for example, parents of teenagers, single mothers with small children or people who are about to get divorced. Lifestyle information is often included, such as „conservative retirees“, „urban elites“ or even „multicultural families“. Mothers seem to be a particularly interesting group; there are segments for „soccer moms“, „big city moms“, „busy moms“ or even „moms who shop like crazy“.
According to the rough analysis, the third largest group of segments is based on information about people’s profession or industry. Segments then have names like „beauty centre owner“, „lawyer“ or „politician“. This category can also refer to employees of specific companies, such as „Aldi competitor“ or „Volvo SUV competitor“. Members of the military and police, journalists, lawmakers and politicians can also be targeted.
Cancer, Depression and Eating Disorders
Hundreds of segment labels point to highly sensitive data such as health information. Advertisers can choose from categories such as breast cancer, bladder cancer and depression. Many segments also refer to reproductive health, period tracking, menopause or heavy buyers of pregnancy test kits. Some segment names even refer to visitors to individual clinics. Here are some examples from US supplier Liveramp:
- LiveRamp Data Store | 8082 | 16237485 | HealthRankings > BPD
- LiveRamp Data Store | 8082 | 16237395 | HealthRankings > BPH
- LiveRamp Data Store | 8082 | 16237478 | HealthRankings > Breast Cancer
- LiveRamp Data Store | 8082 | 24900788 | HealthRankings > Breast Cancer Caregivers
- LiveRamp Data Store | 8082 | 16237416 | HealthRankings > Cholesterol
- LiveRamp Data Store | 8082 | 16237450 | HealthRankings > Cough/Cold
- LiveRamp Data Store | 8082 | 16237432 | HealthRankings > Diabetes
- LiveRamp Data Store | 8082 | 16237508 | HealthRankings > Diabetes Type II
- LiveRamp Data Store | 8082 | 16237498 | HealthRankings > Eating Disorder
In addition to health-related segments, there are many segments that refer to religion, such as „Muslim“ or „Jewish“, as well as those that refer to people’s sexual orientation or their origin and ethnicity. The list also includes political issues: Who is for and who is against Donald Trump? Who is for or against Black Lives Matter and who is against abortion rights?
According to tracking expert Wolfie Christl targeting not only influences how we perceive the world and ourselves. It is also used to exploit people’s vulnerabilities, as he demonstrated in a study on the targeting of gambling addicts in 2022. In the Xandr file we found many segments referring to gambling, also segments targeting people who are „always getting a raw deal out of life“, are considered to be „fragile seniors“, are labelled „opiate addiction“ or who want to consume less tobacco, fast food or alcohol.
LGBT in Spain, multicultural families in Sweden
Some of the sensitive segments have a clear link to the US. Even people in the vicinity of military bases or visitors to certain election campaign events appear to be targeted there. However, many critical segments do not have a clear country of origin. We asked Xandr and Microsoft if they could guarantee that the IDs of EU citizens would not be included. We did not receive an answer.
There are some sensitive segments with a clear reference to EU countries in their names. These include segments with information about casino visits, sports betting habits or even gambling addiction. Also, many segments related to low income, poverty, pregnancy, or interest in loss-making or speculative financial products have an explicit reference to EU countries. We found several EU segments referring to minors under the age of 16.
For Germany, we found several segments referring to health issues such as sleep disorders. We also found a segment referring to strong believers in Christianity in Portugal, „multicultural families“ in Sweden or „LGBT“ in Spain, short for lesbian, gay, bisexual and transgender people. Some of the companies we confronted with the findings answered that certain segments were no longer offered.
93 data providers with hundreds of sources
There’s another aspect tying the data to the European Union: According to the file European companies are part of the network of data brokers that buy, refine and distribute the segment data.
In total, 93 companies are listed as data providers, meaning that they have apparently offered to use their audience data for targeted advertising via Xandr. The names of the segments often include information about where these 93 data brokers obtained their data. Sources range from advice websites, weather apps and credit card companies such as Mastercard to other data brokers and market research companies, amassing to hundreds of data sources.
Most of the 93 data providers offering their data on Xandr are based in the US, but we were also able to identify several European companies. The largest of these is the previously Dutch and now London-based market research giant Nielsen. Its adtech division, Nielsen Marketing Cloud, is listed with more than 65,000 segments in the file, making it the third largest data provider after US companies Oracle and Liveramp.
Data brokers from Germany, France, Italy, Spain, Denmark and the Netherlands
Adsquare is another large data provider based in Berlin. It is listed with more than 15,000 segments in the file. Six other German data brokers are listed in the file: DataXTrade, Emetriq, a company owned by the German telecom giant Deutsche Telekom, Roq.ad, Semasio, Zeoptap and The ADEX, which is owned by the media company ProSiebenSat1. Together they offer more than 5,000 segments not only on German citizens, but also on people in other European countries and the US.
There are at least four major data brokers from France in the file, together offering more than 4,500 segments. There is also a data provider called Orange Private Data Marketplace with 2215 segments, which seems to be linked to the French telecom giant Orange.
Our analysis is far from complete. We encourage everyone to make their own assessment of the file which can be found at the Internet Archive [23 MB]. Researchers or journalists wishing to discuss ideas or findings can contact Ingo Dachwitz at ingo.dachwitz ett netzpolitik.org.
With GroupM NL and Greenhouse Group B.V. we also find two data brokers based in the Netherlands in the list. According to the file, together they had more than 2100 segments on Xandr. The Italian company Audiens S.R.L. is listed with more than 1300 segments and the Spanish company DatMean with more than 600.
A Danish company, Digiseg, is also listed in the file with about 400 segments. Audienzz is a Swiss data broker owned by the Neue Zürcher Zeitung newspaper and is listed with 29 segments.
The above list represents the status quo in May 2021, the time to which the file dates back to. We cannot say anything to the current situation.
How the adtech system works
We asked several civil society experts to comment on our findings.
„The investigation is another strong signal confirming the problematic nature of the current online advertising ecosystem,“ says Dorota Glowacka of the Polish digital rights NGO Fundacja Panoptykon. „In our opinion, such a model can easily lead to the exploitation of users‘ vulnerability for advertising purposes. This may not only lead to excessive shopping, but also – as we already know – influence our political choices or contribute to mental health problems.“ In 2021, Panoptykon published a study showing how targeting based on health information can fuel serious anxiety disorders.
Jan Penfrat of European Digital Rights agrees: „The industry is sorting us all into data categories to sell our attention and screens to the highest bidder. Worse, the surveillance ad industry, including EU-based companies, provides a system that allows all kinds of actors to target and manipulate people, and to discriminate against marginalized people.“ Penfrat points out that surveillance ads are suspected of influencing Brexit, as well as numerous democratic elections over the years. A detailed report by EDRi explains the damage surveillance ads can do to people every day.
“The personal data are so intimate, and they are shared so widely, and with so little care, the harm is potentially enormous,” says Johnny Ryan of the Irish Council for Civil Liberties (ICCL). “By exposing everyone in Europe to continuous profiling by virtually any company, the industry is putting Europe’s security, political stability, and economy at risk.”
We asked Ryan, himself a former adtech executive, what role European companies play in global ad surveillance. His answer: „As far as I can see, EU companies are fully integrated into the industry.“ However, Ryan adds that the industry’s irresponsible rules have been set in the United States. „Europeans are standard takers rather than standard makers. The result is that the industry has no respect for European values.“ Ryan, who has sued major players in the industry, points to a lawcase the ICCL is currently pursuing in Hamburg against the industry organization IAB TechLab.
For our German reporting, several data protection experts told us that data collection of this enormous scale and complexity can hardly comply with the European General Data Protection Regulation (GDPR). Among others, the head of Berlin’s data protection authority, Meike Kamp, said that it is almost impossible for people to understand the implications of giving their consent to this kind of data processing, making it unlikely to meet the GDPR requirements of informed and freely given consent.
To improve the situation, Dorota Glowacka, Jan Penfrat and Johnny Ryan agree that political action is needed. In the words of Johnny Ryan: „First, enforcement has failed at the national level. The enforcement failure in Ireland is particularly dangerous, because Ireland is responsible for supervising Google, Meta, Microsoft, and others. Second, the European Commission has not put pressure on European Member States to correct this.“
Jan Penfrat from EDRi adds: „After years of data protection enforcement, we know pretty well that obtaining valid consent for surveillance ads is near impossible. Rather than requiring civil society and data protection agencies to sue and fine every single infringing data broker, the next EU Commission should propose a ban of surveillance ads in Europe.“
With the cooperation of Johannes Gille.