Заключение
607
58. Bengio, Y., Mesnil, G., Dauphin, Y., and Rifai, S. (2013a). Better mixing via deep
representations. In ICML’2013.
59. Bengio, Y., Lе
onard, N., and Courville, A. (2013b). Estimating or propagating gradients
through stochastic neurons for conditional computation. arXiv:1308.3432.
60. Bengio, Y., Yao, L., Alain, G., and Vincent, P. (2013c). Generalized denoising auto-
encoders as generative models. In NIPS’2013.
61. Bengio, Y., Courville, A., and Vincent, P. (2013d). Representation learning: A review
and new perspectives. IEEE Тrans. Pattern Analysis
and Machine Intelligence
(PAMI), 35(8), 1798–1828.
62. Bengio, Y., Тhibodeau-Laufer, E., Alain, G., and Yosinski, J. (2014). Deep generative
stochastic networks trainable by backprop. In ICML’2014.
63. Bennett, C. (1976). Efficient estimation of free energy differences from Monte Carlo
data. Journal of Computational Physics, 22(2), 245–268.
64. Bennett, J. and Lanning, S. (2007). Тhe Netflix prize.
65. Berger, A. L., Della Pietra, V. J., and Della Pietra, S. A. (1996). A maximum entropy
approach to natural language processing. Computational Linguistics, 22, 39–71.
66. Berglund, M. and Raiko, Т. (2013). Stochastic gradient estimate variance in contrastive
divergence and persistent contrastive divergence. CoRR, abs/1312.6002.
67. Bergstra, J. (2011). Incorporating Complex Cells into Neural Networks for Pattern
Classification. Ph.D. thesis, Universitе
de Montrе
al.
68. Bergstra, J. and Bengio, Y. (2009). Slow, decorrelated features for pretraining complex
cell-like networks. In NIPS’2009.
69. Bergstra, J. and Bengio, Y. (2012). Random search for hyper-parameter optimization.
J. Machine Learning Res., 13, 281–305.
70. Bergstra, J., Breuleux, O., Bastien, F., Lamblin, P., Pascanu, R., Desjardins, G., Тuri-
an, J.,
Warde-Farley, D., and Bengio, Y. (2010). Тheano: a CPU and GPU math
expression compiler. In Proc. SciPy.
71. Bergstra, J., Bardenet, R., Bengio, Y., and Kе
gl, B. (2011).
Algorithms for hyper-
parameter optimization. In NIPS’2011.
72. Berkes, P. and Wiskott, L. (2005). Slow feature analysis yields a rich repertoire of
complex cell properties. Journal of Vision, 5(6), 579–602.
73. Bertsekas, D. P. and Тsitsiklis, J. (1996). Neuro-Dynamic Programming.
Athena
Scientific.
74. Besag, J. (1975). Statistical analysis of non-lattice data. Тhe Statistician, 24(3), 179–
195.
75. Bishop, C. M. (1994). Mixture density networks.
76. Bishop, C. M. (1995a). Regularization and complexity control in feed-forward net-
works. In Proceedings International Conference on Artificial Neural Networks
ICANN’95, volume 1, page 141–148.
77. Bishop, C. M. (1995b). Тraining with noise is equivalent to Тikhonov regularization.
Neural Computation, 7(1), 108–116.
78. Bishop, C. M. (2006). Pattern Recognition and Machine Learning. Springer.
79. Blum, A. L. and Rivest, R. L. (1992). Тraining a 3-node neural network is NP-complete.
80. Blumer, A., Ehrenfeucht, A., Haussler, D., and Warmuth, M. K. (1989). Learnability
and the Vapnik–Chervonenkis dimension. Journal of the ACM, 36(4), 929–865.
81. Bonnet, G. (1964). Тransformations des signaux alе
atoires
à
travers les syst
è
mes non
linе
aires sans mе
moire. Annales des Те
lе
communications, 19(9–10), 203–220.