A brief summary of Deep Neural Networks and Machine Learning

As a response to the limitations of the symbolic AI systems, connectionism started in the 1980s, what we now call neural networks. Inspired by the brain, the idea is that knowledge placed in networks of weighted connections between units would enable systems to learn on their own and would be better suited than other approaches to solve tasks strictly placed in the human domain.

Multilayered networks consist of some input layer, at least one layer of hidden (non-output) units, along with a layer of output units. Each output unit corresponds to one of the possible results. Each input then has a weighted connection to a hidden unit, and each hidden unit has a weighted connection (usually randomly selected) to the other hidden units in the next layer or an output unit. This is meant to resemble the brain, in which some neurons directly control outputs such as muscle movements, but most neurons simply communicate with other neurons. Networks that have more than one layer of hidden units are called deep networks.

Each unit multiplies each of its inputs by the weight on that input’s connection and then sums the results. Each unit uses its sum to compute the unit’s “activation” or probability value (close to 0 if the sum is low, close to 1 if the sum is high). The network then performs its computations layer by layer, fine graining the input, with each hidden unit computing its activation value; at the end these activation values become the inputs for the output units, which then compute their own activations. The output unit with the highest score is the system’s answer. What this amounts to is a system which is structurally breaking down inputs, looking for patterns, comparing the variations and trying to match its results with certain output types. 

What is called the cost function then is part of how a system is actively taught to make less errors. Basically, the function estimates how the neural network is performing with regards to the relationship on an x y gradient (result and goal). The system is then run iteratively with changes being made, steadily comparing x and y with the goal of finding the setup which minimises the cost function. 

The feedback or correctional process of backpropagation which trains the network is especially interesting but often hard to get a grip on. Basically, the backpropagation algorithm takes errors observed in the output units and looks backward starting from the last layer for the cause of the error in the hidden units (since neural networks are set up sequentially), which are the weights which minimize the cost function. Learning in neural networks hence consists in gradually modifying the weights on connections. The trouble is, that neural networks will have thousands of units in several layers, which is why partial derivative is applied.

Symbolic systems are engineered by humans who to some extent control the contents of the system. Subsymbolic systems’ procedures are harder to get a grip on. As Mitchell hints at, this has to do with the enormous amounts of nodes and connections which would need to be analyzed and visualized. But it also seems to be because the system is really working autonomously in looking for patterns, while the programmer merely states if errors were made and adds input, no other parameters are added or changed transparently. Furthermore, as many other systems it is a general machine which contains no prior settings with regards to the issue it is working on.

There are some philosophical questions which come to mind.

The neural network is supposed to structure intelligent processes in a similar way as the brain does. Due to its complexity, it’s hard to understand exactly how a neural network got to some result, since it is autonomously looking for patterns and matching them to some output. This has some moral implications for sure. We would like a system to be able to explain how it got to certain results. But how different is this from human experience really? Sure, if asked what motivates our actions or thoughts we are able to conjure up plausible explanations, but with all that we know about our unconscious motivations, how is our black box different from an AI black box? Beyond the technical setup of neural networks, how does this approach differ philosophically from systems which we have encountered thus far? At first it seems impressive what a bit of software is able to do. But when we think about the critiques of AI, from the frame problem, the issues with turing machines or the chinese room, do neural networks solve any of these issues? Or are they merely placed in a new system?

Mitchell, Melanie (2019) Artificial Intelligence. Farrar Strauss & Giroux. Chapter 2: “Neural Networks and the Ascent of Machine Learning”


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s