Universal Concept Identifier

Data science is built on the ability to identify items precisely, using numbers. Books, for example, all have an ISBN so that particular editions can be found in bookstores and libraries worldwide. A great challenge for informatics is ascertaining when things are the same across systems - whether the goods leaving the supplier are the same as the goods arriving at the warehouse. Sometimes pieces of information can link databases effectively; "July 19, 2010" will always refer to the same moment in history, even if systems render it variously as "19/7/10" or "19 juillet 2010". Words, however, are clouds that do not have meanings fixed in a standard system. A 🐕 dog could be "a mammal, Canis lupus familiaris, that has been domesticated for thousands of years", or "a domestic mammal, related to wolves and foxes, that is often kept as a pet", making it impossible to automatically tie together information in different systems that try to reference the same concept.

The problem is further compounded when expressing the same basic 💭 concept in different languages, since the shapes of both the terms and the words used to define them are inherently different computationally, even if the idea is identical.
For example, 👂 is written in English as "ear", but the only fact the computer knows is the binary code for e-a-r, "01100101 01100001 01110010". In Romanian, 👂 is written as "ureche", but the computer sees "01110101 01110010 01100101 01100011 01101000 01100101". For the computer to know that two terms are equivalent, that 👂 in one language = 👂 in another, we need a set of digits that is the same for any term that expresses the same thought.

Kamusi is implementing a Universal Concept Identifier, a single number that can be assigned to a given idea. Any term that matches that idea - 🐘, éléphant in French, ndovu in Swahili, ゾウ名 in Japanese - is joined to that ID. Using our differentiator, we can split similar things with different numbers where appropriate, yet still show their close ontological relationships - a freeway and a thruway get different numbers, but are both linked through our graph architecture to the idea of limited access auto routes, and the German translation "autobahn". We will use the json "synset" numbers WordNet has established as a starting point, and integrate with the IDs created in the limited expanded vocabulary set proposed for CILI, the Collaborative Interlingual Index.

UCIs will be extended through Kamusi to cover millions of additional concepts that fall outside of the 100,000 concepts identified by WordNet, including words other than nouns, verbs, adverbs, and adjectives, and untreated forms such as gerunds, 650,000 additional concepts gleaned from Wiktionary, millions of named entities from the Joint Research Council, 1.6 million species from the Catalogue of Life, 8 million domain-specific terms in 25 languages from IATE as well as term sets from additional sources, and items indigenous to languages other than English, . The UCI will thus be available to codify open data across numerous languages, projects, and data systems, with the intent that the world's linguistic data can play together in ways that are not currently possible when links can only be inferred by guesswork based in spelling.

/info/ucid

Kamusi GOLD

These are the languages for which we have datasets that we are actively working toward putting online. Languages that are Active for you to search are marked with "A" in the list below.

Key

•A = Active language, aligned and searchable
•c = Data 🔢 elicited through the Comparative African Word List
•d = Data from independent sources that Kamusi participants align playing 🐥📊 DUCKS
•e = Data from the 🎮 games you can play on 😂🌎🤖 EmojiWorldBot
•P = Pending language, data in queue for alignment
•w = Data from 🔠🕸 WordNet teams

Software and Systems

We are actively creating new software for you to make use of and contribute to the 🎓 knowledge we are bringing together. Learn about software that is ready for you to download or in development, and the unique data systems we are putting in place for advanced language learning and technology:

Articles and Information

Kamusi has many elements. With these articles, you can read the details that interest you:

Videos and Slideshows

Some of what you need to know about Kamusi can best be understood visually. Our 📽 videos are not professional, but we hope you find them useful:

Partners

Our partners - past, present, and future - include:

Hack Kamusi

Here are some of the work elements on our task list that you can help do or fund:

Theory of Kamusi

Select a link below to learn about the principles that guide the project's unique approach to lexicography and public service.

Contact Us

We welcome your comments and questions, and will try to respond quickly. To get in touch, please visit our contact page. You must use a real email address if you want to get a real reply!

kamusigold.org/info/contact

© Copyright ©

The Kamusi Project dictionaries and the Kamusi Project databases are intellectual property protected by international copyright law, ©2007 through ©2016, under the joint ownership of Kamusi Project International and Kamusi Project USA. Further explanation may be found on our © Copyright page.

This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License.

Commentary

Discussion items about language, technology, and society, from the Kamusi editor and others. This box is growing. To help develop or fund the project, please contact us!

Our biggest struggle is keeping Kamusi online and keeping it free. We cannot charge money for our services because that would block access to the very people we most want to benefit, the students and speakers of languages around the world that are almost always excluded from information technology. So, we ask, request, beseech, beg you, to please support our work by donating as generously as you can to help build and maintain this unique public resource.

/info/donate

Frequently Asked Questions

Answers to general questions you might have about Kamusi services.

We are building this page around real questions from members of the Kamusi community. Send us a question that you think will help other visitors to the site, and frequently we will place the answer here.

Try it : Ask a "FAQ"!

Press Coverage

Kamusi in the news: Reports by journalists and bloggers about our work in newspapers, television, radio, and online.

Sponsor Search:
Who Do You Know?



To keep Kamusi growing as a "free" knowledge resource for the world's languages, we need major contributions from philanthropists and organizations. Do you have any connections with a generous person, corporation, foundation, or family office that might wish to make a long term impact on educational outcomes and economic opportunity for speakers of excluded languages around the world? If you can help us reach out to a potential 💛😇 GOLD Angel, please contact us!