Words have "canonical" forms (lemmas), those that are usually shown in dictionaries. Other forms can be easy to include in an entry, such as plurals or gender variations, or the five shapes an English verb can take (see/ sees/ saw/ seen/ seeing). Even simple shapes can be complicated - spelled or spelt? - but the our flexible architecture makes it easy to catalogue variations and show where they are used. Multiple alphabets, or spellings with and without tones, are charted as🔢 data elements within an entry.
More difficult are variations caused by "agglutination", where prefixes, infixes, suffixes, or entire words can be glued together, often causing further changes internally. We have a parser for Swahili that filters words through all 300-odd grammatical rules, revealing the underlying terms. We hope to develop similar routines for other languages as student projects at partner universities, and expect to apply existing tools to parse German when we have dedicated resources. While agglutinative terms are complex, they necessarily adhere to a finite set of rules that speakers 👪🔊, and therefore computers, can apply in each new construction.
The final objective is to figure out the components of any 📃 text that a user searches for, whether it is stored in the database as a form of a single word, a known party term, or an on-the-fly compound that would never appear in a standard catalogue of terms.
These are the languages for which we have datasets that we are actively working toward putting online. Languages that are Active for you to search are marked with "A" in the list below.
•A = Active language, aligned and searchable
•c = Data 🔢 elicited through the Comparative African Word List
•d = Data from independent sources that Kamusi participants align playing 🐥📊 DUCKS
•e = Data from the 🎮 games you can play on 😂🌎🤖 EmojiWorldBot
•P = Pending language, data in queue for alignment
•w = Data from 🔠🕸 WordNet teams
We are actively creating new software for you to make use of and contribute to the 🎓 knowledge we are bringing together. Learn about software that is ready for you to download or in development, and the unique data systems we are putting in place for advanced language learning and technology:
Our biggest struggle is keeping Kamusi online and keeping it free. We cannot charge money for our services because that would block access to the very people we most want to benefit, the students and speakers of languages around the world that are almost always excluded from information technology. So, we ask, request, beseech, beg you, to please support our work by donating as generously as you can to help build and maintain this unique public resource.
Answers to general questions you might have about Kamusi services.
We are building this page around real questions from members of the Kamusi community. Send us a question that you think will help other visitors to the site, and frequently we will place the answer here.