What we read about deep learning is just the tip of the iceberg


The artificial intelligence technique known as deep learning is white hot right now, as we have noted numerous times before. It’s powering many of the advances in computer vision, voice recognition and text analysis at companies including Google, Facebook, Microsoft and Baidu, and has been the technological foundation of many startups (some of which were acquired before even releasing a product). As far as machine learning goes, these public successes receive a lot of media attention.

But they’re only the public face of a field that appears to be growing like mad beneath the surface. So much research is happening at places that are not large web companies, and even most of the large web companies’ work goes unreported. Big breakthroughs and ImageNet records get the attention, but there’s progress being made all the time.

Just recently, for example, Google’s DeepMind team reported on initial efforts to build algorithm-creating systems

View original post 681 more words


The tricky business of acting on live data before it’s too late


For all the talk about big data and how it can help us track down needles in haystacks, there’s still a lot of work to when it comes to issues like public health. When successful intervention might require timelines of minutes or hours rather than days, it takes a might keen eye to monitor lots of needles in lots of haystacks and, more importantly, spot new and important ones as they pop up.

We’ve been following news out of the Global Database of Events, Languages and Tones (GDELT) project for the past several months, and it’s very impressive as a tool for historical analysis of the world’s happenings. It takes and indexes real-time streams from news sources around the world, and now includes hundreds of millions data points spanning the past 35 years. It has been used for all sorts of analyses so far, ranging from tracking the spread of terrorist…

View original post 402 more words

Researchers are cracking text analysis one dataset at a time


Google on Monday released the latest in a string of text datasets designed to make it easier for people outside its hallowed walls to build applications that can make sense of all the words surrounding them.

As explained in a blog post, the company analyzed the [company]New York Times[/company] Annotated Corpus — a collection of millions of articles spanning 20 years, tagged for properties such as people, places and things mentioned — and created a dataset that ranks the salience (or relative importance) of every name mentioned in each one of those articles.

[pullquote person=”Dr. Olivier Lichtarge” attribution=”Dr. Olivier Lichtarge, Baylor University”]”A computer certainly may not reason as well as a scientist but the little it can, logically and objectively, may contribute greatly when applied to our entire body of knowledge.”[/pullquote]

Essentially, the goal with the dataset is to give researchers a base understanding of which entities are important…

View original post 773 more words