Chandhiramowuli tells me of one case where a data annotator in India had to differentiate between images of soda bottles and pick out ones that looked like Dr. Pepper. But Dr. Pepper is not a product that is sold in India, and the onus was on the data annotator to figure it out.
The expectation is that annotators figure out the values that are important to the company, says Chandhiramowuli. “They’re not just learning these distant faraway things that are absolutely meaningless to them—they’re also figuring out not only what those other contexts are, but what the priorities of the system they’re building are,” she says.
In fact, we are all data laborers for big technology companies, whether we are aware of it or not, argue researchers at the University of California, Berkeley, the University of California, Davis, the University of Minnesota, and Northwestern University in a new paper presented at FAccT.
Text and image AI models are trained using huge data sets that have been scraped from the internet. This includes our personal data and copyrighted works by artists, and that data we have created is now forever part of an AI model that is built to make a company money. We unwittingly contribute our labor for free by uploading our photos on public sites, upvoting comments on Reddit, labeling images on reCAPTCHA, or performing online searches.
At the moment, the power imbalance is heavily skewed in favor of some of the biggest technology companies in the world.
To change that, we need nothing short of a data revolution and regulation. The researchers argue that one way people can take back control of their online existence is by advocating for transparency about how data is used and coming up with ways to give people the right to offer feedback and share revenues from the use of their data.
Even though this data labor forms the backbone of modern AI, data work remains chronically underappreciated and invisible around the world, and wages remain low for annotators.
“There is absolutely no recognition of what the contribution of data work is,” says Chandhiramowuli.