Заключение
637
the Twenty-fifth International Conference on Machine Learning (ICML’08), volu-
me 25, pages 872–879. ACM.
640. Salakhutdinov, R., Mnih, A., and Hinton, G. (2007). Restricted Boltzmann machines
for collaborative filtering. In ICML.
641. Sanger, T. D. (1994). Neural network learning control of robot manipulators using
gradually increasing task difficulty. IEEE Transactions on Robotics and Automation,
10(3).
642. Saul, L. K. and Jordan, M. I. (1996). Exploiting tractable substructures in intractable
networks. In D. Touretzky, M. Mozer, and M. Hasselmo, editors, Advances in Neural
Information Processing Systems 8 (NIPS’95). MIT Press, Cambridge, MA.
643. Saul, L. K., Jaakkola, T., and Jordan, M. I. (1996). Mean field theory for sigmoid belief
networks. Journal of Artificial Intelligence Research, 4, 61–76.
644. Savich, A. W., Moussa, M., and Areibi, S. (2007). The impact of arithmetic represen-
tation on implementing mlp-bp on fpgas: A study. Neural
Networks, IEEE Transac-
tions on, 18(1), 240–252.
645. Saxe, A. M., Koh, P. W., Chen, Z., Bhand, M., Suresh, B., and Ng, A. (2011). On ran-
dom weights and unsupervised feature learning. In Proc. ICML’2011. ACM.
646. Saxe, A. M., McClelland, J. L., and Ganguli, S. (2013). Exact solutions to the nonlin-
ear dynamics of learning in deep linear neural networks. In ICLR.
647. Schaul, T., Antonoglou, I., and Silver, D. (2014). Unit tests for stochastic optimiza-
tion. In International Conference on Learning Representations.
648. Schmidhuber, J. (1992). Learning complex, extended sequences using the principle
of history compression.
Neural Computation, 4(2), 234–242.
649. Schmidhuber, J. (1996). Sequential neural text compression. IEEE Transactions on
Neural Networks, 7(1), 142–146.
650. Schmidhuber, J. (2012). Self-delimiting neural networks. arXiv preprint arX-
iv:1210.0118.
651. Sch
ö
lkopf, B. and Smola, A. J. (2002). Learning with kernels: Support vector ma-
chines, regularization, optimization, and beyond. MIT Press.
652. Sch
ö
lkopf, B., Smola, A., and M
ü
ller, K.-R. (1998). Nonlinear component analysis as
a kernel eigenvalue problem. Neural Computation, 10, 1299–1319.
653. Sch
ö
lkopf, B., Burges, C. J. C., and Smola, A. J. (1999). Advances in Kernel Me-
thods – Support Vector Learning. MIT Press, Cambridge, MA.
654. Sch
ö
lkopf, B., Janzing, D., Peters, J., Sgouritsa, E., Zhang, K., and Mooij, J. (2012).
On causal and anticausal learning. In ICML’2012, pages 1255–1262.
655.
Schuster, M. (1999). On supervised learning from sequential data with applications
for speech recognition.
656. Schuster, M. and Paliwal, K. (1997). Bidirectional recurrent neural networks. IEEE
Transactions on Signal Processing, 45(11), 2673–2681.
657. Schwenk, H. (2007). Continuous space language models. Computer speech and lan-
guage, 21, 492–518.
658. Schwenk, H. (2010). Continuous space language models for statistical machine
translation. The Prague Bulletin of Mathematical Linguistics, 93, 137–146.
659. Schwenk, H. (2014). Cleaned subset of WMT ’14 dataset.
660. Schwenk, H. and Bengio, Y. (1998). Training methods for adaptive boosting of neural
networks. In M. Jordan, M. Kearns, and S. Solla, editors, Advances in Neural Infor-
mation Processing Systems 10 (NIPS’97), pages 647–653. MIT Press.