Google on Monday released the latest in a string of text datasets designed to make it easier for people outside its hallowed walls to build applications that can make sense of all the words surrounding them.
As explained in a blog post, the company analyzed the [company]New York Times[/company] Annotated Corpus — a collection of millions of articles spanning 20 years, tagged for properties such as people, places and things mentioned — and created a dataset that ranks the salience (or relative importance) of every name mentioned in each one of those articles.
[pullquote person=”Dr. Olivier Lichtarge” attribution=”Dr. Olivier Lichtarge, Baylor University”]”A computer certainly may not reason as well as a scientist but the little it can, logically and objectively, may contribute greatly when applied to our entire body of knowledge.”[/pullquote]
Essentially, the goal with the dataset is to give researchers a base understanding of which entities are important…
View original post 773 more words