The Tip of the Iceberg

The dictionaries we provide through the search bar on KamusiGOLD are only a small part of the project. However, due to our infinitesimal budget, we cannot currently afford to open our data to the public in the ways we would like.

Take, for example, the word "infinitesimal" from the previous sentence. We offer you the opportunity to search for that word in English, and to find its equivalents in all the languages for which we have corresponding data. You can enter any word in the search bar, and we'll happily share much of what we know about it. What we cannot open up is a permanent address for that term, which is the starting point so you can link to it, dig into any rich information we have about it, improve and expand it, or use it as data in other applications.

The problem is that when terms such as "infinitesimal" are all given fixed URLs, they transform from data points into Big Data. Each entry becomes not just a web page, but as many web pages as the word has senses; while "light" is one page on Merriam-Webster.com, concept-specificity makes it 48 different pages on Kamusi. Each web page requires all of the ornamentation that makes each page beautiful. Each entry has extended information that we either need to present on its web page, or provide links to within the code. Every term links to translations, ancestors, or other entries, (e.g. infinity, finite, finish), with code that you don't see but your device does. The code should also contain a lengthy RDF description so the data can be pinpointed by other projects. While each word takes fairly infinitesimal resources when we serve it from our database, its associated web pages can be 🐘 elephantine. With millions of words, offering a web page for each demands a lot more resources than offering a simple query to our database.

Moreover, each static page is a target for the search engines that constantly crawl the web to index what's out there. This is really what overloaded our system and kept us offline for a year before we turned to today's query-only light-access solution. Not only Google and Bing, but also Baidu, Sogou, Yandex, and several others you've never heard of. Most websites have a few dozen pages, or maybe a few hundred. A dictionary with a healthy 100,000 terms would have 100,000 web pages, which is already a lot of 🐘🐘 for the crawlers. As a dictionary of many languages that is attempting to provide you with every word ever known to be spoken, our data contains millions of 🐘🐘🐘🐘, with links from one to the next that are irresistible for the crawlers to follow. It takes a fraction of a second to send each 🐘 back to you, and when a dozen search engines hit at the same time, our server has to give them the same attention. Unlike human users, though, the robots do not pause. As soon as we give them one 🐘, they ask for the next. And the next. And the next, following the chains through millions of links. At one second per entry, 10 million entries would take a single search engine nearly 4 months to crawl, by when it's time to start all over again. With each 🐘 exposed, we spent our all our time telling the robots what we had, with no power left over to serve the data to actual people.

We do have ways to solve the problem, but they involve money to pay our developers. Basically, we need to create a very limited index page for each entry that the robots can read quickly, and require authenticated login to get at the real pages with the rich data. Preferably, we will be able to afford a multi-server solution, with robots crawling on one machine while confirmed humans enjoy full data access on hardware that is not inundated by automatons. When we do this, we will be able to open up many more services that take advantage of our precise concept-based data, but are contingent on fixed URLs.

Giving you everything we have won't take a miracle, but does demand adequate sponsorship to implement known solutions. Meanwhile, our query-only search is not exposed to search engines, so are happy to offer it to you.

/info/iceberg

Kamusi GOLD

These are the languages for which we have datasets that we are actively working toward putting online. Languages that are Active for you to search are marked with "A" in the list below.

Key

•A = Active language, aligned and searchable
•c = Data 🔢 elicited through the Comparative African Word List
•d = Data from independent sources that Kamusi participants align playing 🐥📊 DUCKS
•e = Data from the 🎮 games you can play on 😂🌎🤖 EmojiWorldBot
•P = Pending language, data in queue for alignment
•w = Data from 🔠🕸 WordNet teams

Software and Systems

We are actively creating new software for you to make use of and contribute to the 🎓 knowledge we are bringing together. Learn about software that is ready for you to download or in development, and the unique data systems we are putting in place for advanced language learning and technology:

Articles and Information

Kamusi has many elements. With these articles, you can read the details that interest you:

Videos and Slideshows

Some of what you need to know about Kamusi can best be understood visually. Our 📽 videos are not professional, but we hope you find them useful:

Partners

Our partners - past, present, and future - include:

Hack Kamusi

Here are some of the work elements on our task list that you can help do or fund:

Theory of Kamusi

Select a link below to learn about the principles that guide the project's unique approach to lexicography and public service.

Contact Us

We welcome your comments and questions, and will try to respond quickly. To get in touch, please visit our contact page. You must use a real email address if you want to get a real reply!

kamusigold.org/info/contact

© Copyright ©

The Kamusi Project dictionaries and the Kamusi Project databases are intellectual property protected by international copyright law, ©2007 through ©2016, under the joint ownership of Kamusi Project International and Kamusi Project USA. Further explanation may be found on our © Copyright page.

This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License.

Commentary

Discussion items about language, technology, and society, from the Kamusi editor and others. This box is growing. To help develop or fund the project, please contact us!

Our biggest struggle is keeping Kamusi online and keeping it free. We cannot charge money for our services because that would block access to the very people we most want to benefit, the students and speakers of languages around the world that are almost always excluded from information technology. So, we ask, request, beseech, beg you, to please support our work by donating as generously as you can to help build and maintain this unique public resource.

/info/donate

Frequently Asked Questions

Answers to general questions you might have about Kamusi services.

We are building this page around real questions from members of the Kamusi community. Send us a question that you think will help other visitors to the site, and frequently we will place the answer here.

Try it : Ask a "FAQ"!

Press Coverage

Kamusi in the news: Reports by journalists and bloggers about our work in newspapers, television, radio, and online.

Sponsor Search:
Who Do You Know?



To keep Kamusi growing as a "free" knowledge resource for the world's languages, we need major contributions from philanthropists and organizations. Do you have any connections with a generous person, corporation, foundation, or family office that might wish to make a long term impact on educational outcomes and economic opportunity for speakers of excluded languages around the world? If you can help us reach out to a potential 💛😇 GOLD Angel, please contact us!