Algorithms - the dark beasts that can make AI turn on human kind

Algorithms - the dark beasts that can make AI turn on human kind

by Maan Barazy

The term algorithm has been wildly misused in recent years, so its important to clarify what it means. An algorithm is a series of steps taken to solve a specific, discrete problem. Google Search is powered by an algorithm, but only in the way that a Matryoshka doll is a doll. Google Search is actually powered by an algorithm on top of dozens of algorithms on top of dozens of other algorithms, all solving a specific subproblem of the larger problem and working together in a single process or program to return the search results of a query. The oldest algorithms ever recorded were on ancient Babylonian tablets dating to about 1,800 BCE, explaining the specific procedures to compute different values like square roots and other measures. We still use one of the Greek mathematician Euclid’s most famous algorithms—his method for finding the greatest common divisor, first formulated around 300 BCE—in programming today because of its elegant simplicity.

The entire field of theoretical computer science is all about trying to find the most efficient algorithm for a given problem, but sometimes we just don’t know the math. There's a whole catalog of problems that we don't have the math to solve yet—we will get into that more toward the end of the series—, but these problems act as choke points that create inefficiencies in business, scientific research, and other administrative areas, and there are a lot of people waiting for these problems to finally be solved.

The reason why classical computers can’t solve these problems efficiently either is built into the circuits of the machine; every instruction it is given must be completed before it can start on the next one. Multicore systems or multiprocessors machines can speed things up, but you’d still probably need one or two trillion years to get a result after throwing all of our processors together and setting them loose on an O(n!) problem where n was something like 500.

Computer engineers have seen this coming for decades, but now we are finally coming to the end of the line for what we can squeeze out of a classical computer. You just can't make these operations go any faster, and by nature, they can only perform one operation at a time. If you have an algorithm that requires quintillions of operations to complete, a classical computer has to carry out all of those operations in order. At that point, the time it takes to get a result simply becomes a matter of math and physics. If we’re going to solve these problems that have exponential time complexity or greater, then something else is needed.

It wasn’t until the age of computers however that algorithms really began to take a mathematical approach to seemingly non-mathematical problems, and these modern algorithms are some of the most important solutions to problems currently powering the world’s most widely used systems.

I would tend to categorize them as disruptors of AI if is the playfield of algorithms. Known as “dark pools”, these private all-electronic marketplaces soon began to spawn their own special trading algorithms, triggering a Darwinian struggle for the survival of the fittest.

As humans we also have algorithms that we use in our daily decision making. For example, if I find a seat on a crowded bus and the next person who enters is pregnant, using a cane, or accompanied by a small child, I use an algorithm to decide whether or not to give up my seat to them. Or, if a woman arrives at a medical center claiming to have a painful medical condition, caregivers choose whether to believe her or not based on a formula they internalized during their training or previous life experiences. These decisions can influence health. And, similar to computer algorithms, we can update our algorithms as we gather new data and experiences.

Algorithms are everywhere. Companies use algorithms to make recommendations and to improve customer experience. Sometimes these algorithms make suggestions for good books and other times, they make suggestions for good divorce lawyers. Algorithms are also constantly trying to make our lives easier, including completing our sentences as we draft emails. But exactly how these algorithms are affecting our health remains unknown.By the early 2000s, the algorithms were no longer just mindless lists of instructions such as “If the price of commodity X rises, then buy company Z”. The fitter, smarter algorithms had been equipped with artificial intelligence code, which allowed them to spot the actions of less sophisticated algorithms and learn to outwit them.

Not even the designers of the algorithms can predict what their progenies will do in every circumstance. The methods used to create the algorithms produce code which is often beyond analysis. acebook’s Newsfeed, Instagram, YouTube, Call of Duty, they are all built from algorithms solving problems for each other to produce the functionality that we desire.

Let's say you wanted to find the sum of all numbers between the numbers m and n. You could create a number called result, set it equal to 0, and simply add m to it, then m+1, then m+2 and so on until you add n itself, leaving the sum of all the numbers in result. This series of steps is the algorithm. Not a very efficient one, but we’ll save that for a later article. Worse, as the algorithms evolve they can acquire traits never intended by their creators. Four years on from the Flash Crash, it is far from clear that we have really got to grips with the digital beasts named after a long-dead Islamic genius.

What are the seven Algorithms that are running the world?

PageRank

Google’s PageRank algorithm is a great place to start, since it helped turn Google into the internet giant it is today. PageRank was the first algorithm Larry Page and Sergei Brin developed to index and rank web pages on the internet in the late 1990s, eventually using it to power their new Google search engine. The essential feature of PageRank is that it determines a score for how authoritative a page is based on the authority scores of the pages that link to it. More authoritative pages linking to a page in turn confer a greater measure of authority to that page than others, so in this way, the people who write the content on the page and link to other pages effectively tell Google which pages carry more weight than others. 

PageRank was revolutionary when it was introduced and quickly blew other search engines out of the market. PageRank is so important that an entire industry developed around the algorithm itself: Search Engine Optimization. The PageRank algorithm so thoroughly established Google's dominance as the only search engine that mattered that the word Google officially became a verb less than eight yearsafter the company was founded. Even though PageRank is now only one of about 200 measures Googleuses to rank a web page for a given query, this algorithm is still an essential driving force behind its search engine.

Key Exchange Encryption

How do you secure information that is effectively read out over a loudspeaker on a street corner that everyone can hear? That is the challenge when trying to protect network communication traffic transmitted over public communication lines; anyone can intercept these communications en route and read the data. Code ciphers, which convert each byte of data into a different byte of data based on some programmatic formula, are the obvious answer. But those won't work when one party doesn't know which cipher the other party is using, and most secure communications happen between parties who have had no prior contact, so have no means of agreeing to one beforehand.

The Key Exchange Encryption algorithm does the seemingly impossible by establishing a single, shared mathematical secret between two parties, who don't even know each other, and is used to encrypt the data as well as decrypt it, all over a public network and without anyone else being able to figure out the secret.

*I apply my private number as the exponent to the result you broadcast over the public channel and obtain a value, and you do the same.

*That value will be the same for the both of us and we use that value to encrypt our communications.Because none of us ever publiclly discloses our own personal private key, it is practically impossible for anyone who sees this information being passed to determine what value we are using to encrypt our communications. The process that produces the shared secret relies on two basic ideas. First, (am)n and (an)m will give you the exact same answer. The private keys are m and n and the public key is a. This will always work.

But what if you're watching all of this as a third party trying to intercept the messages being passed? The only unencrypted information being passed is the public key, a, and the two resultsam and an, except the two results don't look this way to you; you just see two very large seemingly random numbers that you know are somehow mathematically tied to the public key a. Without knowing m or n, which is never shared on the public channel, the only way to find out the two private keys that produce the cipher is the inverse process to exponentiation, which is finding the discrete logarithm of either m or n, and there is no currently known way for a classical computer to do this before the Sun explodes and takes us all out in a few billion years.

Why it's so hard is the subject of another article, but it genuinely is that difficult, which makes it perfect for public encryption. While not commonly used on its own anymore, the public-private key structure of the Key Exchange algorithm is an essential feature of more advanced encryption schemes like RSA encryption.

Backpropagation

Backpropagation through a neural network is one of the most important algorithms invented in the last 50 years

Neural networks operate by feeding input data into a network of nodes which have connections to the next layer of nodes, and different weights associated with these connections which determines whether to pass the information it receives through that connection to the next layer of nodes. When the information passed through the various so-called "hidden" layers of the network and comes to the output layer, these are usually different choices about what the neural network believes the input was. If it was fed an image of a dog, it might have the options dogcatmouse, and human infant. It will have a probability for each of these and the highest probability is chosen as the answer.

This is where backpropagation comes in. Backpropagation is the propagation of the error back through the neural network and over the connections that produced the incorrect answer. As it goes, it will go back and make adjustments to all of those connections and decrease the weight given to that connection. Over time, a neural network is able to learn what something is by learning what something is not and converging on the correct answer.

In this way, neural networks can be trained to recognize what a face looks like, a voice sounds like, and what movies you might like based on the movie you last watched. Without backpropagationdeep-learning neural networks wouldn’t work, and without these neural networks, we wouldn’t have the rapid advances in artificial intelligence that we’ve seen in the last decade.

COMPRESSION

Source: stoimen.com / Wikimedia Commons

If you wanted to compress a file to make it smaller and easier to manage over a network or to save on disk space and you look at the bytes of data in front of you, where would you even begin? How do you make bytes smaller, so they take up less space but enable you to decompress it afterward to recover what you had at the start?

Several variations on compression exist, but they almost all rely on a similar trick; they use references and offsets instead of the actual data itself to represent the data using less space.

Let's say you had a string of characters you wanted to compress, ABBCABBCABACABACABACDDDBDB, which is 26 characters long. Another way to write this is ABBC2ABAC3D2DB2, where the numbers after a string of characters tell you how many times that string needs to be printed. The compressed string is now only 15 characters long.

That may not seem like much, but we’ve just reduced the amount of memory this string needs by just over 40 percent. When you have files that are gigabytes in size, that 40 percent is huge. Now, not all data can be compressed like this, and the efficiency of the compression varies, but compressing as much data as we can as often as we can keeps communications networks and hard disks from being clogged up with a massive amount of repetitive bloat. This basic idea behind file compression has empowered streaming moviesstreaming musiconline video games, and just about everything else, honestly. Compression is everywhere, and it is essential to the efficient transmission and storage of information.

Searching and Sorting Algorithms


Searches and Sorts are a special form of algorithm in that there are many very different techniques used to sort a data set or to search for a specific value within one, and no single one is better than another all of the time. The quicksort algorithm might be better than the mergesort algorithm if memory is a factor, but if memory is not an issue, mergesort can sometimes be faster, and anything is better than bubblesort.


The same applies when you have to search through a data set for a specific value. On a perfectly sorted list, like a dictionary, a binary search is the fastest way to get what you want, but if you wanted to find the longest word in the dictionary or an unsorted random stream of words read in from a million articles downloaded from the Internet, then the heapsort doubles as your search algorithm, since the highest value—or lowest, if that's what you are looking for—in a data set will always be at the top of the heap.

The type of search needed will always depend on the data structure you're searching through (lists, trees, graphs, etc), but if you have a program that does anything useful with data, it is guaranteed that it will be using a search and a sort algorithm somewhere in its code. They all matter and programmers use all of themall the time, and they form the foundation on which data structures and more advanced algorithms are built. 

 Dijkstra's Shortest Path

Source: Shiyu Ji / Wikimedia Commons

Dijkstra's Shortest Path algorithm is a search algorithm for graphs, but it bears special mention, because it isn't like other search algorithms.

According to Dijkstra himself, in 1959 the computer scientist Edsger Dijkstra was sitting with his fiance somewhere in the Netherlands drinking coffee when he wrote up an algorithm that could show the power of the computer system he was working on to a general, non-computing audience in a way that they could understand.

He plotted out 64 cities on a graph, with each city represented by a node and drew various paths, which are technically known as edges, between them. He labeled one node Rotterdam and another node Groningen and designed an algorithm that found the shortest path between the two nodes. This is done by starting at a source node and having it find the shortest path between that node and every other in the graph, stopping once it reaches the destination node

He almost certainly didn't think he was creating what would become one of the most widely used algorithms in the world, but in that 20 minutes in 1959Dijkstra enabled everything from GPS routingon our phones, to signal routing through telecommunication networks, and any number of time-sensitive logistics challenges like shipping a package across country. As a search algorithm, Dijkstra's Shortest Path stands out more than the others just for the enormity of the technology that relies on it.

TCP/IP Routing Protocol Algorithms

Source: The Opte Project, via PRI.org

In case you've never seen it, that is the Internet. At least that's how it sees itself, anyway.

When the Internet began, the standards for the transmission control protocol/Internet protocol (TCP/IP) were basically brand new and while mathematically sound, the algorithms at the heart of the standard Internet protocol wasn't build with the unfathomable amount of traffic it has to manage in mind. One inefficient algorithm could have kneecapped the Internet before it really got going.

Fortunately for us, as the Internet expanded into every area of our lives, the first initial decisions that make up TCP/IP would turn out to be vital to the successful operation of the entire network as the traffic exploded beyond anyone’s wildest expectations.

One of the most critical of these decisions was which algorithm to use to route data packets, the actual information flowing through the Internet that we send and receive. The two most widely used by the Internet, the Distance-Vector Routing Protocol Algorithm (DVRPA) and the Link-State Routing Protocol Algorithm (LSRPA) are the two most essential algorithms we use every day as they efficiently route data traffic between the billions of connected networks that make up the Internet.

DVRPA works by finding the shortest distance between the source and the destination networks. It can use any number of metrics to calculate this but it will usually use something very simple like the number of router and server "hops" it must perform along the way. The simplicity is what matters in DVRPA

TRANSPORTATION

New Algorithm Could Save Cities from Too Many Taxis

Routers using this algorithm keep a record of all known networks on a table along with the distance to each one. Whenever this routerforms a new connection to another network, usually called neighborsor peers, it passes them this table which that peer uses to update its table before passing its updated table to any network it is already connected to and so on. This way, changes quickly propagatethroughout these connections so that every network knows how far it is to any other network on the Internet. While this doesn't guarantee the fastest connection, it is very quick and not very complicated to work out, so overall, it has operated pretty well with modifications to improve its efficiency.

LSRPA meanwhile operates in essentially the same way, but routers running the LSRPA algorithm keep a map the entire Internet that it can connect to and routinely tests different connections and analyzes them to determine the more realistic cost of that connection in terms of computationtimeetc. Like DVRPA, whenever it establishes a connection, it passes along its map to the network it connects to, so that changes to the network propagate throughout, giving routers using the algorithm a much more realistic picture of the various connections.

While it is more likely to find the most efficient route more often, it is more computationally heavy and isn't as well-established as DVRPA. As computer hardware improves, however, and new equipmentreplaces the older network nodes, more of the Internet will be able to manage running LSRPA, improving the efficiency of the entire Internet in the process.

The issue of efficiency doesn't just relate to the hardware however. The efficiency of the various algorithms can make or break a system. Fortunately, we know how to measure the efficiency of algorithms with mathematical precision, allowing us to find the right algorithm for the right problem.

要查看或添加评论,请登录

Maan Barazy的更多文章

社区洞察

其他会员也浏览了