Presented at: 11th Language and Development Conference, New Delhi, India, November 18–20 2015 Abstract
The consequence of linguistic digital exclusion is the inability of billions of people to access vital knowledge and economic resources that contribute to prosperity in an era of globalization. However, rectifying linguistic inequity is mostly absent from development discourse and the agendas of governments and agencies that undertake development activities. Most efforts to produce content for excluded languages depend on the haphazard occurrence of a commercial, academic, or programmatic purpose for an activity in a given language at a particular moment. The Kamusi Project seeks to address the digital linguistic divide by engaging communities in the systematic collection of codified data for any language – linguistic information that can be used in many kinds of advanced knowledge and technology resources. This paper explores assumptions about participants’ motivations and behaviors that underlie the project’s methods, including participation in online games and interactive mobile apps intended to elicit speakers’ knowledge of their own languages in ways that can be shared by others. While the Kamusi system aims to welcome all, disparities may continue to exclude those without substantial time, network access, equipment, digital experience, or literacy, leaving international members of a diasporic language group as its most active contributors. Further, smaller and more remote languages have, by definition, fewer potential participants and less access for participation, thus perpetuating their inability to jump the digital divide. Without external support for the time and effort necessary to gather linguistic knowledge, even the most carefully constructed tools will fail for thousands of languages spoken by millions of people, including many languages near extinction. This paper raises, without definitively resolving, the social challenges of a multilingual digital infrastructure platform that has the technical capacity to document every word in every language, but can only approach accomplishing this objective through the involvement of those who have the least access to taking part.
These are the languages for which we have datasets that we are actively working toward putting online. Languages that are Active for you to search are marked with "A" in the list below.
•A = Active language, aligned and searchable
•c = Data 🔢 elicited through the Comparative African Word List
•d = Data from independent sources that Kamusi participants align playing 🐥📊 DUCKS
•e = Data from the 🎮 games you can play on 😂🌎🤖 EmojiWorldBot
•P = Pending language, data in queue for alignment
•w = Data from 🔠🕸 WordNet teams
We are actively creating new software for you to make use of and contribute to the 🎓 knowledge we are bringing together. Learn about software that is ready for you to download or in development, and the unique data systems we are putting in place for advanced language learning and technology:
Our biggest struggle is keeping Kamusi online and keeping it free. We cannot charge money for our services because that would block access to the very people we most want to benefit, the students and speakers of languages around the world that are almost always excluded from information technology. So, we ask, request, beseech, beg you, to please support our work by donating as generously as you can to help build and maintain this unique public resource.
Answers to general questions you might have about Kamusi services.
We are building this page around real questions from members of the Kamusi community. Send us a question that you think will help other visitors to the site, and frequently we will place the answer here.