This startup enables rural India to fight poverty via AI-based work for Microsoft, Google and others
Swarnalata is a resident of Raghurajpur, a heritage crafts village in Odisha’s Puri district. When the majority of the 500 master Pattachitra painters or chitrakars in the village make less than $12 a month, Swarnalata found an opportunity to earn $60 a week! But how?
Swarnalata works with Karya, a not-for-profit enterprise, which enlists rural Indians to create quality data sets for Microsoft, Google and the who’s who of the Generative AI world to train Large Language Models (LLMs)—the human-like brains behind chatbots like ChatGPT.
For AI data workers, like Swarnalata, who are spread across 100 districts in 22 states and perform “simple dataset generation tasks”, Karya offers $5 per hour—20 times more than India’s minimum wage.
Since its inception in 2021, Karya claims to have facilitated payouts to over 30,000 rural Indians for completing 40 million paid digital tasks—capturing, labelling and annotating data for AI training across speech, text, images and videos in 12 Indian languages, including English.
What’s more? Karya, which calls itself a “data cooperative”, enables its workers to earn royalties every time their data is sold, providing a source of supplemental or passive income for rural India.
A fairly simple proposition? It wasn’t so until an Indian engineering graduate from Stanford University—Manu Chopra—decided to flip the global AI data generation model on its head by setting up Karya. The inspiration behind the idea has roots in Chopra’s brush with poverty.
The Inception Story
“I grew up in a basti in Delhi. I saw poverty first-hand. The issue is so close to my heart that it is almost existential. That’s the only thing I feel is worth for me to work on,” remarked the 27-year-old Manu Chopra in an interview with Nasscom’s president Debjani Ghosh.
During his time as a computer science engineering student at Stanford, the idea of ‘tech-for-good’ inspired Chopra. He returned to India in 2017 and joined Microsoft Research as a fellow to explore ways to tackle extreme poverty by giving the poor access to “dignified, digital work”.
Chopra travelled the length and breadth of the country and realised that “millions of rural Indians have access to a smartphone and a bank account. Yet rural India remains among the world’s poorest areas”.
“India is a graveyard of skilling non-profits. So many people have tried, but it is so hard to skill people,” said Chopra explained in the interview with Nasscom.
“The idea in my head was, what if we could bypass skilling? Can we give people a livelihood and money for skills they already have? What is the skill that rural India already has? Their language,” he explained.
At the time Karya started working, the value of Indian languages was on the rise. Each year Microsoft, Google and others were spending billions of dollars collecting training data for their artificial intelligence models.
“This could include, ‘Oh, I want Siri to be better with Indian accents, so Siri needs to understand and receive millions of hours of Indian English data,” explained Chopra.
Enterprises wanted to train their models in local Indian languages so that they could better serve the world’s largest consumer market with more than 1.4 billion people.
Chopra found the product-market fit and decided to use Indian languages to unlock great economic value for rural India and joined hands with his manager at Microsoft Research— Vivek Sheshadri—who became the Chief Technology Officer. Safiya Husain, who had created several United Nations programs, among other work, joined in early 2022 as the third co-founder and Chief Impact Officer of Karya.
Karya: The not-for-profit enterprise
“The reason we decided to operate as a non-profit entity is partially because the market rates in the sector are horrible. Even though we are paying $5 an hour, the highest wage at for-profit companies would be ₹60 an hour,” said Chopra.
“We are solving for a market failure. The horrible wages data workers make in India is a market failure,” he added.
Karya’s website explains that currently most dataset generation work goes to urban communities or is outsourced to Kenya or the Philippines. In both cases, workers are often exploited, offered sub-minimum wages and often overworked. Median hourly wages are estimated at $0.1-0.5 per hour, while datasets sell for over 200 times this price.
Karya shares most of the proceeds from the sale of datasets with its rural workers and covers additional costs through philanthropic grants from the likes of Microsoft, Bill & Melinda Gates Foundation and others.
Karya wants to enable each of its rural participants to earn $1,500 within a year through flexible, digital work opportunities In India today, as it can take an average low-income Indian seven generations to reach this benchmark of the lower-middle class, going by data from World Economic Forum (2020).
By 2030, Karya wants to move 100 million rural Indians out of poverty.