A leading team of machine learning data scientists and software engineers from Voyager Labs unravelled the world of unstructured data in a 24-hour hackathon at the National Library. The challenge was to organize and reveal a wealth of national treasures buried in the library – including millions of books, manuscripts, photographs, posters, songs and recordings, movies, ancient maps, works of art and more – in a way that will make these rare collections accessible to the public. Our team went beyond the challenge and not only found a way to organize and categorize this wealth of unstructured data – They also linked it to external sources of information on the web such as Wikipedia to create a whole source of information which everyone will now be able to utilize and enjoy.
A wealth of hidden treasures
The National Library holds an immense collection reflecting thousands of years of cultural creation in unstructured form. This includes a collection of over 150,000 images (pictures, posters, newspaper clippings etc.) of deep cultural importance which were virtually non-accessible to the public. Sorting and making sense of this collection manually is an impossible task, and so the National Library decided to call on the leading minds in the world of technology to assist in creating a system which would classify, tag, sort and bring meaning to these treasures.
Harnessing deep learning algorithms to automate discovery
Our team of data scientists and software engineers rose to the challenge, and in 24 short hours (fueled with nothing but pizza) they managed to plan, develop and create a system for classification and deep insights and apply them to this rare, large collection. Using machine learning and deep learning algorithms they were able to create a system to automate classification and bring meaning to these previously alienated pieces of history.
Extending the search for information beyond the sphere of the Library
Our team was then able to link each treasure to a wealth of information from other relevant data sources online, for example Wikipedia, the Library of Congress and even YouTube, by using the concepts they had found to be connected to each picture or item (for example location, attendees, dates and much more). This enabled them to create a platform which would tie together all of the relevant information connected to a specific topic and available in the virtual world.
Ron Pick, Dan Ostrosky, Eyal Hochman and Ofir Olivenbaum were our pioneering representatives at the hackathon and we’re very proud of their accomplishment. They reached the final round and took part in making these national treasures accessible to the public. We hope the system they have built will continue to be of use to students and researchers on their quest for information and knowledge.