Spoken languages begin with 🔊 sound. Hearing children learn language first by listening and speaking; ✍ writing is a later addition for individuals and societies. For interaction between humans and computers, machines must be able to decode utterances as 📃 text, and convert text to 🔊 auditory signals that make sense to the 👪 people hearing them. Much of the groundwork has been laid for English and a few other languages, but acoustic information for most people has yet to be captured in ways that can be used by technology. Computers at Bell Labs could recognize the spoken English words 1 through 10 in 1952, with ten being about as many languages for which this basic trick has been repeated for all of Africa in the subsequent 64 years.
Our model provides space to record natural 🗣🔊 speech sounds and match them to shapes (wind [twist]/ whined/ wined) and places - not just "big", but "bigger" and "biggest", not just Parisian French but Vaudoise and Quebecoise and Ivorian. By collecting this🔢 data in a dictionary linked by meaning, we can envision the day when a person speaking, say, a local variety of Swiss German could have their words recognized, translated, and output as comprehensible speech in a regional dialect of Cantonese. The processes necessary to do the acoustic modeling for any language are well established, but the digitized data for most languages is non-existant. Within the Kamusi data framework, and working with partners in the Human Languages Project, where we can find sponsorship, we can gather and deploy the sounds needed for advanced voice technologies for any language.
These are the languages for which we have datasets that we are actively working toward putting online. Languages that are Active for you to search are marked with "A" in the list below.
•A = Active language, aligned and searchable
•c = Data 🔢 elicited through the Comparative African Word List
•d = Data from independent sources that Kamusi participants align playing 🐥📊 DUCKS
•e = Data from the 🎮 games you can play on 😂🌎🤖 EmojiWorldBot
•P = Pending language, data in queue for alignment
•w = Data from 🔠🕸 WordNet teams
We are actively creating new software for you to make use of and contribute to the 🎓 knowledge we are bringing together. Learn about software that is ready for you to download or in development, and the unique data systems we are putting in place for advanced language learning and technology:
Our biggest struggle is keeping Kamusi online and keeping it free. We cannot charge money for our services because that would block access to the very people we most want to benefit, the students and speakers of languages around the world that are almost always excluded from information technology. So, we ask, request, beseech, beg you, to please support our work by donating as generously as you can to help build and maintain this unique public resource.
Answers to general questions you might have about Kamusi services.
We are building this page around real questions from members of the Kamusi community. Send us a question that you think will help other visitors to the site, and frequently we will place the answer here.