LigerCat–Literature and Genomics Resource Catalogue–is a Web application that provides a big-picture view of the National Library of Medicine’s Medline database. It allows researchers to browse the metadata of hundreds of thousands, even millions, of biomedical journal articles simultaneously. It also has the coolest name of any scientific application that has been published in peer review.
It is a Ruby on Rails application that I wrote when I was at the Marine Biological Laboratory. I designed and implemented the entire application stack from the ground up. The data is split between a MySQL database and a Redis cluster; the Redis cluster, which stores hundreds of millions values, was the largest in the world at the time it was built. To compute queries on demand, it has a large, horizontally scalable processing cluster, which pull tasks from an AMQP work queue and process them in parallel. Using this architecture, I processed each of the 1.9 million species in the Encyclopedia of Life though LigerCat, analyzing the metadata of tens of millions of scholarly articles, in a matter of days.
The basic idea behind LigerCat is that tagging is nothing new. The Web 2.0 folks got the idea from librarians, who have been tagging literature for many years. Librarians tag things using a “controlled vocabulary,” which is a set of tags that are curated and maintained by some authoratative body. For instance, scientific articles indexed by the National Library of Medicine are tagged with a controlled vocabulary called Medical Subject Headings (MeSH), which has over 20,000 tags in the set.
The Articles search, which is selected by default, allows the user to query the PubMed article database. Ligercat will download all the results, and build a MeSH tag cloud from the articles returned by your search. You can search for a topic, a person, or an organism, and LigerCat will build you a MeSH cloud based on the results.
LigerCat can be cited as,
LigerCat: using “MeSH Clouds” from journal, article, or gene citations to facilitate the identification of relevant biomedical literature. Sarkar IN, Schenk R, Miller H, Norton CN. AMIA Annu Symp Proc. 2009 Nov 14;2009:563-7.