Neural Networks, Artificial Intelligence, and Optimization

Download 1,36 Mb.

Pdf ko'rish

Sana	25.04.2022
Hajmi	1,36 Mb.
	#580441

Bog'liq
neural-networks-artificial

Neural Networks, Artificial Intelligence, and
Optimization
William Langhoff
Langhoff@uwm.edu
September 30, 2016

Abstract
Artificial intelligence research is a rapidly growing field. Here we provide a
brief survey of artificial neural networks, an essential component of AI re-
search. We briefly introduce the way neural networks work, along with several
types of neural networks currently used. Then, we explore an evolutionary
method of creating neural networks. Several of these are implemented
in
silico
, and results are analyzed.

1
Introduction
Since the early days of computing, people have been attempting to develop
artificial intelligence (AI). One could say that active research in AI began
when Allen Turing [7] sought to answer the question, ”can machines think?”
Neuroscience had revealed that the brain was composed of neurons which
send electrical signals, and Turing had shown that any type of computation
could be represented digitally. Therefore, people began to believe they could
create an electronic brain.
In 1997, a machine called Deep Blue, created by IBM, beat the reigning
world champion in the game of chess. This was a major milestone in com-
puting and a fantastic technical achievement, but those in the know saw that
Deep Blue was in no way intelligent. Deep Blue used brute force to compute
all the possible moves and countermoves and find the best one. This strategy
is very different from the ’intuitive’ way that humans approach the game of
chess.
However, another type of computer program was being studied, which
behaved more like the theoretical electronic brain. These were neural net-
works, which were inspired by the structure of neurons in the brain. Neural
networks have been studied extensively and there are many variants, but
they represent an approach which could one day lead to strong AI.
One of the next milestones in AI was another machine by IBM, called
Watson. Watson beat the reigning champions in the American game show
’Jeopardy,’ using a neural network program. Watson was designed to pro-
cess natural language and use knowledge which it had learned from internet
sources to answer Jeopardy questions.
More recently, in 2016, a project called AlphaGo by Google beat the
reigning world champion of Go, or Chinese chess. This used a neural network
and a technique called ’reinforcement learning’, which allowed AlphaGo to
learn from its mistakes. This is actually very analogous to how humans learn
to perform a task, since we practice repeatedly until we are able to perform
the task.
Neural networks are a type of biologically inspired model, and it makes
sense to use biology as inspiration when trying to create AI. The only type
of intelligent agents we know of were produced by evolution through natural
selection, so there is significant motivation to create AI through the same
ways nature has created the human mind.
This paper will serve as a brief survey of neural networks and AI research.
1

While it will be impossible to be comprehensive, we hope to provide an
overview of the essential ideas in the field. We begin by exploring a few ideas
from biology, which motivate our models. Then, we will consider several
types of neural network models and the tasks they can be used for. Finally,
we will consider a novel method of creating neural networks, which attempts
to model biological evolution.
Throughout this paper, we will often describe the problem we are consid-
ering as an optimization problem. This context will allow us to use mathe-
matical methods to make progress.
2
A brief detour into Biology
Here we will briefly explore a few concepts from biology, which will motivate
our methods in AI.
2.1
Evolution
Evolution by natural selection is the process by which populations develop
new features over time. Originally explained by Charles Darwin, this has
become the ’grand unifying theory’ of modern biology. It is based on the
following observations:
•
More individuals are born each generation than an environment can
support.
•
Populations exhibit variation between individuals.
•
Some variations increase the probability of an individual reproducing.
Modern science understands these ideas in terms of genetics, where the
genomes of individuals determine their characteristics, or phenotype. Each
individual’s DNA encodes their genome as a long string in a four letter al-
phabet, the letters being ’A’, ’C’, ’T’, and ’G’.
2.1.1
Evolution as function optimization?
In a certain sense, we could view evolution as nature optimizing a function.
When an organism grows, its phenotype is determined as a function of its
genotype. Further, small differences in the genotype generally lead to small
2

changes in the expressed phenotype. While there are certainly exceptions to
this statement, we can consider biology as a roughly ’continuous’ function
from the space of all possible genomes to the space of all possible organisms.
Evolution generally acts to maximize the fitness of organisms. Therefore,
we can view evolution as a process which finds the genome which maximizes
fitness in the environment.
While this may seem like nonsense, and to a certain degree most certainly
is, it will inform our ideas in later discussion.
2.2
Neuroscience
Neural network models are inspired by the central nervous system. The
fundamental unit of the nervous system is a neuron,or brain cell, and the
central nervous system consists of a network of these neurons.
A neuron, viewed from a simple point of view, receives electrical impulses
and then sends out electrical impulses. It connects to other neurons which
feed signals to it via dendrites, and then sends an electrical current down it’s
axon to other neurons.
Figure 1: A simple illustration of a neuron (public domain)
The brain is a highly complex, modular network of neurons. The brain
is far more complex than anything we will consider here, but this is the
motivation for neural networks.
We can consider the problem of general intelligence as a type of optimiza-
tion as well. Consider a human being, they need water, food, shelter, sex,
etc. A human interacts with their environment to optimize the amount of
these needs which they have. However, it is important to notice how this dif-
fers from simpler mathematical optimization problems. An intelligent agent
3

like a person needs to optimize several functions, rather than just one, and
needs to keep them optimal relative to each other. If a person simply wanted
to maximize the amount of water they had, they might try to drink a lake,
which would not end well for them. However, while the objective function
is certainly difficult to define and likely changes over time, the actions of an
intelligent agent can be seen as an optimization algorithm.
3
Neural Networks
One of the most common models in AI is an artificial neural network, inspired
by the physiology of our own nervous system. However, once we construct a
network and begin ’learning’ from a dataset, we have reduced the problem
to one of mathematical optimization. The network’s behavior is determined
by a set of numerical variables, called weights, and we seek to optimize the
performance of that network as a function of those weights.
Rather than begin immediately by defining a neural network, we will
instead motivate their study by first considering a simpler model which serves
as a building block for neural networks. We also provide, as an example,
results from our implementation of the perceptron. The Python files used
for these experiments are available upon request.
3.1
The Perceptron
The perceptron is one of the earliest examples of a machine learning algo-
rithm. Its structure is motivated by that of a biological neuron, and we will
later build up neural networks from many of these.
Definition 1.
A
perceptron
is a function
f
:
R
n
→
(0
,
1)
The perceptron consists of a vector of
weights
w
∈
R
n
, and a
bias
b
∈
R
.
Given an input
x
∈
R
n
, the model gives
f
(
x
) =
σ
(
w
T
x
+
b
)
(1)
Where
σ
is some non-linear sigmoid (S - shaped) function, such as
σ
(
t
) =
tanh(
t
)
or
σ
(
t
) =
1
1+
e
−
t
4

The purpose of a perceptron is to classify data into two categories. Let
us assume we have some labeled dataset
x
(1)
, y
(1)
,
x
(2)
, y
(2)
, ...,
x
(
m
)
, y
(
m
)
, where
x
(
i
)
are our inputs, and
y
(
i
)
the corresponding desired outputs. Let
us also assume that our data points can be classified correctly using this
method. Then, our goal becomes to find
w
∈
R
n
, b
∈
R
which minimize the
error function:
e
(
w
, b
) =
1
2
m
m
X
k
=1
y
(
i
)
−
f
(
x
(
i
)
)
(2)
We have now taken our problem of classifying a dataset, and reduced it
to an optimization problem. This is one of the fundamental techniques in
machine learning. When we talk about learning in the context of machine
learning algorithms, we usually refer to updating the parameters of a model,
and (hopefully) converging towards some optimal set of parameters.
Since this particular optimization problem is quite simple, we will use a
simple method to solve it, namely gradient descent. For simplicity, we will
work with a single training example at a time. Since the derivative of the
error for the whole dataset is simply the sum of the error for individual data
points, this changes nothing.
3.1.1
Optimizing the Perceptron
We will begin by randomly initializing
w
, b
with values near zero.
It is
possible to instead begin with all zeros, but experiments show that random
initialization generally speeds up convergence, possibly due to ’symmetry
breaking’ in the parameters.
For a data point (
x
, y
), we have:
e
0
(
w
) =
−
f
(
x
) (
y
−
f
(
x
)) (1
−
f
(
x
))
x
T
e
0
(
b
) =
−
f
(
x
) (
y
−
f
(
x
)) (1
−
f
(
x
))
We call a single loop through the training set an epoch. We choose an
arbitrary
α
∈
(0
,
1] as a learning rate, and step through our training set,
updating our weights and bias according to the following rule:
5

w
7→
w
−
αe
0
(
w
, b
)
x
b
7→
b
−
αe
0
(
w
, b
)
For the sake of demonstration, we implemented this algorithm in Python.
For our data, we used a subset of the well known Iris data set [4]. Dependent
on the sepal length and width, we classify iris flowers into two species,
Iris
setosa
and
Iris versicolour
.
Figure 2:
On the left we see how the error decreases over time, and on the left we
see a scatter-plot of our dataset, colored according to species. The line in the
middle is the decision boundary, our perceptron classifies based on which side
of the line a data point lies on. This demonstrates what our model is actually
doing, it is finding a line which separates our data into two classes. In more
generality, when dealing with more high dimensional data, the perceptron
finds a hyperplane which separates the two classes, if one exists.
This algorithm, while interesting, is rather limited. We can only work
with data which is linearly separable, meaning it is separated by a hyperplane.
We could not, for example, teach the perceptron to classify points based
6

on whether they were inside some circle. However, we can construct more
complicated models with perceptrons as the building blocks.
3.2
Multilayer Perceptrons
While a single perceptron, as described above, can only classify data which
is linearly separable, it has been shown [3] that a feed-forward network of
perceptrons can do much more.
Figure 3: By Cburnett [GFDL], via Wikimedia Commons[2]
The basic idea is the same, but now instead of a single perceptron, we have
multiple layers of perceptrons, where the inputs to one layer are the outputs
to the previous. Each perceptron behaves like a biological neuron, receiving
multiple inputs and sending a single output (but sending that output to more
than one place).
It has been shown that a multilayer perceptron can, in fact, give approx-
imations of arbitrary functions. More precisely, let
f
:
R
n
→
R
m
. For any
>
0, there exists some network with three or less layers which will approx-
imate
f
with error bounded by
[3]. We will not consider the proof of this
statement here, because while it is interesting, it is also irrelevant for our
purposes. The proof is non-constructive, it gives us no information about
how many perceptrons to include per layer, or what the weights should be.
7

3.2.1
Training a multi-layer Perceptron
An algorithm which is analogous to the one we presented for the single per-
ceptron exists for the multilayer perceptron as well. We will not explain it
fully, but the algorithm is called backpropagation. The main idea is that
for each weight and bias in the network, their contribution to the error is
measured, and that is used to update the weights. The algorithm is called
backpropagation because it begins by computing the error for the final layer,
and works backwards.
In figure 3.2.1 we se the decision boundaries of some simple neural net-
works. These were implemented in Python, with a single hidden layer and
varying numbers of neurons in that layer.
Figure 4: Decision boundaries of Neural Networks
Backpropagation has been very successful in supervised learning. Multi-
layer perceptrons can classify data with arbitrary decision boundaries, and
there is a great deal of literature on the subject. However, backpropagation
relies heavily on two principles: knowing the correct output for at least some
of the possible inputs, and having the correct output be well defined. This is
great for problems of classifying data, but can we use a multilayer perceptron
for other applications in artificial intelligence?
8

3.3
Modern Neural Networks
The multilayer perceptron an example of a feed-forward neural network.
However, many types of networks exist, and we will detail a few of them
here. Later, we will be exploring methods which do not fit neatly into any
one of these boxes, so it is helpful to have a picture of some different types
of networks. Each of these is best visualized as a directed graph, where each
node behaves somewhat like a perceptron.
3.3.1
Deep Networks
Deep neural networks, and the field of deep learning, have recently become
much more interesting in modern research. A deep neural network is simply
one which has many layers, rather than a few. Initially, it was difficult to
train such a network due to the vanishing gradient problem, where the earlier
layers do not have their weights affected much by the algorithm [5]. Because
of this, research into deep networks was sparse for many years, until it was
shown that by using greedy algorithms for layer-wise training, deep networks
could be extremely effective [1]. The central idea is to build up the network
one layer at a time, training each layer to reduce the dimensionality of the
input.
Figure 5: Deep Feed Forward Networks [5]
9

3.3.2
Convolutional Networks
Convolutional neural networks represent the state of the art in image classifi-
cation, used for image search engines like Google, as well as facial recognition
by Facebook and the NSA. Computers store images as an array of pixels, with
a numerical value associated to the color of that pixel. Convolutional net-
works are inspired by the idea that nearby pixels are highly correlated, so
the network is connected in such a way to take advantage of that correlation.
Figure 6: A Convolutional Network [8]
While we will not spend much time on these here, it is worth noting that
due to their success in processing image data, this technology could be used
to handle visual input, such as that for a robot.
3.3.3
Recurrent Networks
Recurrent neural networks differ from feed-forward networks by possessing
connections from the later nodes to the earlier nodes. These networks are
often used for time-series data, for example predicting the next word in a
sentence.
4
NEAT: NeuroEvolution of Adapting Topolo-
gies
The types of networks we have considered so far seek to optimize relatively
simple functions. However, we would like to consider the use of neural net-
works for more general problems in AI. Our approach will be to use evo-
lutionary methods instead of gradient descent. We will start with simple
10

networks, and build up more complex ones using an evolutionary algorithm
called NEAT. The NEAT algorithm was originally described in [6], and has
been modified and expanded since then. We will present a basic explanation
for the version of NEAT which we were able to implement.
The main idea of NEAT is to evolve both the structure and weights of a
neural network. Contrast this to the backpropagation paradigm, which fixes
a network structure and then learns the best weights for that structure. The
overall idea when using NEAT is the following:
1. Initialize a random population of simple networks.
2. Evolve the population, having networks compete with each other.
3. Protect new innovations through speciation.
4. Gradually increase complexity to minimize the size of the final network.
We will first see how NEAT encodes the structure of a network into
a genome, and how pairs of individuals are able to reproduce. Then, we
will see how speciation is implemented, and discuss why it is useful for the
algorithm. Finally, we will briefly discuss competitive co-evolution and the
importance of growing complexity.
4.1
NEAT’s genetic encoding
In order to efficiently explore the space of possible neural networks, NEAT
uses a clever genetic encoding.
There are two types of genes, node genes and connection genes. Node
genes encode what type of neuron they are, such as input, output, or hidden
neurons, and serve as indexes for the connections. Connection genes encode
the weight associated to them, which nodes they are connected to, a boolean
variable determining if the connection is active, and an innovation number.
Every time a new connection is created by a mutation, a global counter
of innovation numbers is incremented and the new value of the counter is
assigned to be the innovation number of that connection. This simple trick
is actually what makes NEAT computationally feasible.
This innovation
number allows NEAT to efficiently compare genomes of different networks.
There are two types of structural mutations possible with NEAT, which
are detailed below. New nodes can be constructed by splitting an existing
11

Figure 7: An example of a simple network in NEAT. [6]
connection, and new connections can be made between two existing nodes.
In addition, mutations can occur where weights are perturbed or replaced.
Connections can also be turned on and off by mutation.
When two genomes are selected as parents for a new offspring, their cor-
responding genes are matched up using the innovation number. This saves
the algorithm from needing to analyze the structure of the network each gen-
eration. Corresponding genes from both parents are carried over to the child,
along with all extra genes from the more fit parent.
4.2
Speciation
NEAT uses speciation to protect new innovations. When a new connection or
node is added to the network, it is likely that fitness will initially decrease. If
this is not compensated for, there is potential that the algorithm will converge
towards a sub-optimal solution, because it is unable to further explore the
search space. Speciation allows new innovations time to optimize before they
are removed.
In order to sort the population into species, NEAT computes a ’distance’
between pairs of genomes. This distance is given by the formula:
δ
=
c
1
E
N
+
c
2
D
N
+
c
3
W
Where
E
and
D
are the number of excess and disjoint connection genes
in the pair, respectively.
W
is a normalized sum of weight differences on
those connections which are shared by the pair. The parameters
c
1
, c
2
, c
3
12

Figure 8: Structural mutations in NEAT [6]
are tunable parameters which we can tune based on the needs of a specific
problem.
4.3
Competetive Co-Evolution and Complexification
Since NEAT is inspired by evolutionary processes, it makes sense to apply
NEAT to problems which are analogous to life processes. In biology, organ-
isms compete with each other for limited resources. Therefore, we will try to
implement the same type of system.
We will use NEAT for a problem where the fitness function would be
difficult to optimize by traditional methods. For example, our network could
be used to play a game such as tic-tac-toe. It’s inputs would be the current
state of the board, and it’s outputs would be decisions about which move to
make. We can evaluate the fitness of an individual by having it play against
other networks in the population, scoring it based on the number of games
it wins.
As we remarked earlier in our brief detour into biology, populations tend
towards higher complexity over time when they evolve. Therefore, we will
begin with very simple networks, and our algorithm will graudally increase
their complexity until a sufficiently optimal solution has been reached.
Having given a brief overview of the NEAT algorithm, we will now present
our implementation. This experiment was used to demonstrate the strengths
13

Figure 9: Reproduction in NEAT [6]
and weaknesses of NEAT. We used a customized version the NEAT-python
library for our implementation. Neat-python is available as a pre-alpha on
GitHub, and our adaptations are available on request.
5
Case study 1: Neat plays Tic-Tac-Toe
Our first case study will attempt to use NEAT to play tic-tac-toe, also known
as nots and crosses. This example will use competetive co-evolution, the
population will compete amongst itself to determine fitness.
Tic-Tac-Toe is a simple game, and an optimal strategy exists and is easy
to write down. With optimal play by both players, the game always ties.
If the player who goes first plays optimally, but the second player does not,
then the first player will win. We would like to ask whether NEAT can find
the optimal strategy.
14

5.1
The Experiment
The NEAT algorithm involves quite a few hyper-parameters. Here we de-
scribe our choices for those along with the reason for them.
The input and output layers of our network both have nine nodes, cor-
responding to the nine boxes in Tic-Tac-Toe. The inputs will correspond
to the board state, and the outputs will determine which move the network
chooses.
Given the simplicity of the game, we use a relatively small population of
50 individuals. The algorithm attempts to maintain this number, but a given
generation may have slightly more or less than 50. We also set a target of 8
species. The compatibility threshold will increase when we have more than 8
species, and decrease when we have less than 8. This adaptive compatibility
means we don’t need to determine the best compatibility threshold for our
networks.
Our initial networks are random, consisting only of input and output
nodes. Each individual has one-fourth of the 81 possible connections between
input and output, chosen randomly and with random weight.
In a given generation, mutations occur with the following probabilities:
Add node
0.2
Add connection
0.1
Delete node
0.02
Delete connection
0.01
Mutate weight
0.6
Determining the fitness of each individual is achieved by the following
method:
1. Each individual plays against all others
•
Player 1 gets 10 points for a win, 1 point for a tie
•
Player 2 gets 10 points for a win, 5 points for a tie
2. Each individual plays against an optimal player
3. Fitness is the sum of all scores
In doing so, the population competes with itself to determine it’s fitness,
and the networks which play best against that population are most likely
15

to reproduce. Playing against optimal players also gives a fitness boost for
networks which play well against an optimal solution.
5.2
Results
Unfortunately, we were unable to evolve a network which played optimally,
but we believe this could be achieved with further computing resources and
better tuning of the hyperparameters. The algorithm seems to get stuck in a
sub-optimal state, and continues to add complexity without increased fitness.
Figure 10: Species sizes in a Tic-Tac-Toe experiment
The species which originate early never die out, and new species don’t
seem to evolve later in the simulation. Reworking the parameters of the
algorithm could change this, possibly improving performance.
Over time, the average fitness of the population does not seem to increase
significantly. This may be because the whole population is getting better at
playing the game, and therefore the amount of wins and losses does not
change by much.
However, with greater computational resources, we hypothesize that NEAT
would be able to find an optimal, or nearly optimal solution. It may be nec-
essary to tune the hyperparameters differently before this would be achieved.
16

Figure 11: Best and average fitness in a Tic-Tac-Toe experiment
6
Case study 2: NEAT plays 2048
Here we attempt to implement NEAT to play the popular puzzle game 2048.
6.1
The Game
In 2048, players try to combine like numbers until they reach the number
2048. The 4x4 board starts empty, except for two random squares which
contain either a 2 or a 4. Then, the player chooses to move up, down, left,
or right.
When the player chooses a direction, all the tiles slide as far in that
direction as possible. If two tiles with the same value slide into one another,
their values are added and they become one tile. The player’s score increases
by the new value. For example, in the figure shown, if the player chose down,
the ’2’ tiles in the second column would combine to become a ’4’, and the
player’s score would increase by 4.
Whenever the player moves, one random empty tile becomes a 2 or a 4.
Play continues until the player cannot make any further moves, or the player
wins when they combine two tiles with value 1024 to get the value 2048.
While there are strategies for this game which will maximize a player’s
17

Figure 12: 2048: A game in progress
chances of winning, the game involves a fair amount of chance. Therefore,
when we apply NEAT to this problem, we will average the scores over several
games.
6.2
The Experiment
The inputs to our network will be the numerical value of the 16 tiles, with
the convention that an empty tile has value zero. Our network will have four
outputs, corresponding to the four directions the player can move. We will
use a softmax function to normalize our outputs so that they sum to 1, and
can interpret their value as the probability that moving in that direction is
the correct move.
We work with a population of 150 networks, all randomly initialized.
Our initial population have no hidden nodes, and contain some random set
of connections from the inputs to the outputs.
Speciation was modified to use an adaptive parameter. Our target num-
ber of species is 10, when we have less species than that we decrease the
compatibility threshold, and when we have more species we increase the
18

Figure 13: An example of an initial network.
Green links have positive
weight, red have negative weight.
threshold. This allows us to keep a reasonable number of species which are
not too small.
We also modified the algorithm so that if a the mean fitness of a species
has not improved in ten generations, the species becomes extinct. The in-
dividuals with maximal fitness are still saved in case NEAT does not find a
better solution. This was done to stop the algorithm from stagnating for too
long.
Mutations occured with the following probabilities:
Add node
0.1
Add connection
0.05
Delete node
0.001
Delete connection
0.001
Mutate weight
0.8
19

6.3
Results
Because of the stochastic nature of 2048, it is much more difficult to find a
perfectly optimal solution. However, several networks were able to achieve
relatively high scores in some games.
Figure 14: Best and average fitness in a 2048 experiment
Figure 15: Speciation in a 2048 experiment
Speciation stabilized fairly early in the experiment, similarly to the Tic-
Tac-Toe case.
20

7
Discussion
Here we have considered several types of neural networks, along with a genetic
algorithm for evolving new ones. While the NEAT algorithm has had success
in several problem domains, we were unable to achieve complete success in
either of our attempts. In contrast, other types of neural networks are much
more often successful. Of course, we should now ask why NEAT seems to
have sub-par performance, and whether we should abandon methods like
NEAT because of this.
Conventional neural networks perform well on very specific tasks, such as
image classification. However, they rely on a well understood fitness function.
In contrast to this, NEAT can use competetive co-evolution to increase fitness
relative to some environment. This is why, despite it’s sub-par performance,
we believe evolutionary methods will be essential in the creation of general AI.
The actions of an intelligent agent can still be understood as an optimization
problem, but the fitness function is incredibly complex.
In addition, NEAT has a large number of hyper-parameters relative to
other algorithms. These parameters reflect the many facets of the algorithm,
while conventional optimization via gradient descent is relatively simple. We
hypothesize that with more computational resources, these parameters could
be better tuned to the problem domain.
Further, it may simply be necessary to run NEAT for a larger number
of generations. Biological evolution takes place over a period of millions of
years, so it may be necessary to run the algorithm for a large number of
generations.
These are the reasons we hypothesize that evolutionary methods will be
essential in the quest to create strong AI. Of course, we still don’t know if
this is necessarily a good idea, but we will leave that question for another
discussion.
References
[1]
Yoshua Bengio. “Learning Deep Architectures for AI”. In:
Foundations
and Trends in Machine Learning
2.1 (2009), pp. 1–127.
issn
: 1935-8237.
doi
:
10 . 1561 / 2200000006
.
url
:
http : / / dx . doi . org / 10 . 1561 /
2200000006
.
21

[2]
en:User:Cburnett.
Artificial Neural Network
. [Online; accessed April 10,
2016], Gnu Free Documentation License. 2011.
url
:
https://upload.
wikimedia . org / wikipedia / commons / e / e4 / Artificial _ neural _
network.svg
.
[3]
Kurt Hornik. “Approximation capabilities of multilayer feedforward net-
works”. In:
Neural Networks
(1991).
[4]
M. Lichman.
UCI Machine Learning Repository
. 2013.
url
:
http : / /
archive.ics.uci.edu/ml
.
[5]
Michael A. Nielsen.
Neural Networks and Deep Learning
. Determination
Press, 2015.
[6]
Kenneth O. Stanley. “Efficient Evolution of Neural Networks Through
Complexification”. In: (2004).
url
:
http : / / nn . cs . utexas . edu /
?stanley:phd2004
.
[7]
A. M. Turing. “Computers &Amp; Thought”. In: ed. by Edward A.
Feigenbaum and Julian Feldman. Cambridge, MA, USA: MIT Press,
1995. Chap. Computing Machinery and Intelligence, pp. 11–35.
isbn
:
0-262-56092-5.
url
:
http://dl.acm.org/citation.cfm?id=216408.
216410
.
[8]
Wikipedia.
Typical CNN
. [Online; accessed May 5, 2016]. 2015.
url
:
https://commons.wikimedia.org/wiki/File:Typical_cnn.png
.
22

Download 1,36 Mb.

Do'stlaringiz bilan baham: