High-quality digitized linguistic 🔢 data does not exist for most languages, of a caliber needed for learning or technology. This is not because such data cannot be collected, but because nobody has taken the initiative to do so. Of course, none of this data existed 40 years ago in ways that could be used, or were even imagined, for English - 👪 people simply put in the effort to assemble resources for the language with the biggest immediate payoff.
Until twenty years ago, Bogota, Colombia, was snarled in traffic, like many cities in the world. Then they made a plan, made a network of special bus routes, and greatly reduced their transport problems. Nairobi never developed a twenty-year plan and lives in perpetual gridlock. Crafting the infrastructure for an excluded language needs long term planning, like crafting a fine whisky needs decades of patience.
As with effective mass transit, or starting a high-end distillery, developing high-quality linguistic resources for a language within the Kamusi system is more a matter of commitment than technology. We know the tools and we have the techniques; for example, rather than reinventing speech recognition technology in order for computers to recognize the spoken words of Fula, all that is needed is to collect the data about the language's terms and sounds, and then train existing technology using that data. When seen in a purely technical light, any language can join technology at the cutting edge, as long as good data can be collected in a compatible format. Creating infrastructure for one language or 7000 is a matter of planning and commitment to producing the 🔢 data over the course of years. To help develop or fund the project with an eye to the long term, please contact us!
These are the languages for which we have datasets that we are actively working toward putting online. Languages that are Active for you to search are marked with "A" in the list below.
•A = Active language, aligned and searchable
•c = Data 🔢 elicited through the Comparative African Word List
•d = Data from independent sources that Kamusi participants align playing 🐥📊 DUCKS
•e = Data from the 🎮 games you can play on 😂🌎🤖 EmojiWorldBot
•P = Pending language, data in queue for alignment
•w = Data from 🔠🕸 WordNet teams
We are actively creating new software for you to make use of and contribute to the 🎓 knowledge we are bringing together. Learn about software that is ready for you to download or in development, and the unique data systems we are putting in place for advanced language learning and technology:
Our biggest struggle is keeping Kamusi online and keeping it free. We cannot charge money for our services because that would block access to the very people we most want to benefit, the students and speakers of languages around the world that are almost always excluded from information technology. So, we ask, request, beseech, beg you, to please support our work by donating as generously as you can to help build and maintain this unique public resource.
Answers to general questions you might have about Kamusi services.
We are building this page around real questions from members of the Kamusi community. Send us a question that you think will help other visitors to the site, and frequently we will place the answer here.