50 Years of Deep Learning and Beyond: an Interview with Jürgen Schmidhuber

INNS Big Data Conference website:
http://innsbigdata.org/

Juergen SchmidhuberOver the last decades, Jürgen Schmidhuber has been one of the leading protagonists in the advancement of machine learning and neural networks. Alone or with his research groups at TU Munich and the Swiss AI Lab IDSIA in Lugano, he pioneered practical recurrent neural networks, universal learning algorithms, the formal theory of creativity and many other topics. Recently he published an extensive overview on the history and theory of deep neural networks (Schmidhuber, 2015).

Prof. Schmidhuber will give a plenary talk and a tutorial and co-chair a workshop on deep learning at our conference. We interviewed him on the past and future of machine learning, on the never-ending quest for intelligence, and on the opportunities of the current big data era.

INNS BigData: Today, machine learning and neural networks are generating a tremendous interest. You have spoken about a “second Neural Network Renaissance”. What are its characteristics?

Jürgen Schmidhuber: It is a bit like the last neural network (NN) resurgence in the 1980s and early 1990s, but with million-times-faster computers. NNs of the 1950s still used learning methods similar to linear regression from the early 1800s. The first training methods for deep nonlinear NNs appeared in the 1960s (Ivakhnenko and others). Two decades later (1st NN renaissance) computers were 10,000 times faster per dollar than those of the 1960s. Another two decades later (2nd NN renaissance), they were again 10,000 times faster. Apparently, we will soon have the raw computational power of a human brain in a desktop machine. That is more than enough to solve many essential pattern recognition problems through (today’s extensions of) the previous millennium’s NN learning algorithms. The 2nd reNNaissance may be the final one. Since physics dictates that efficient future 3D hardware will be brain-like with many processors connected through many short and few long wires, I do not see how non-NN-like systems could cause another NN winter.

INNS BigData: From your emphasis on “second renaissance”, and from your latest review on deep learning, one gets a sense of the importance of machine learning history today. Do you have the impression that sometimes our scientific society pushes in the opposite direction? Is it damaging the community?

The 2nd neural networks reNNaissance may be the final one

Jürgen Schmidhuber: Machine learning is about using past observations to optimize future performance. It is about credit assignment. The machine learning community itself is a learning system, too, and should assign credit to the inventors of relevant methods. Recent “tabloid science” articles about “Deep Learning” didn’t do this, sometimes hyping minuscule improvements over previous work as revolutionary breakthroughs (this actually partially motivated my recent review with 888 references, most from the previous century). Science is a self-correcting business though, and there are encouraging signs that our field is rapidly improving its credit assignment processes, like in more mature fields such as mathematics.

INNS BigData: Let us go back at the present. At the last NIPS conference, Dr. Sutskever stated that “all supervised vector-to-vector problems are now solved thanks to deep feed-forward neural networks”, and “all supervised sequence-to-sequence problems are now solved thanks to deep LSTM networks”. As a pioneer in both methods, do you agree with this view?

Jürgen Schmidhuber: I think it is very generous of Ilya to say that. Take it with a ton of salt though: there are many pattern recognition problems and sequence-to-sequence problems that are NP-hard, without fast solution through any known computational method. Try to quickly distinguish prime numbers from others, or compress data sequences down to their shortest description. Local-minima-ridden gradient-based NNs are not yet perceived as a threat here. I think what Ilya meant is: lots of important real world pattern recognition problems (visual object detection, speech processing, basic text translation, event detection in videos, etc.) that used to be easy for humans but hard for computers are now becoming easy for computers too.

Lots of important real world pattern recognition problems are becoming easy for computers too

INNS BigData: In this sense, what do you expect the next breakthroughs in the field to be? What are the next things to become easy for the computers?

Jürgen Schmidhuber: We will go beyond mere pattern recognition towards the grand goal of AI, which is more or less: efficient reinforcement learning (RL) in complex, realistic, partially observable environments. A few years ago, our team with Jan Koutnik and Faustino Gomez showed for the first time that it is possible to learn complex video game control from scratch through large RL recurrent NNs with raw high-dimensional video input, without any teacher or unsupervised pre-training. I believe it will be possible to greatly scale up such approaches, and build RL robots that really deserve the name.

INNS BigData: Now that machine learning is moving towards implementing a higher level of intelligence, some claim that the new generation of neural network will help us shed light on the inner workings of the brain and on intelligence itself. How do you stand on this aspect? What is the role of neuroscience in the machine learning field?

Jürgen Schmidhuber: Artificial NNs (ANNs) can help to better understand biological NNs (BNNs) in at least two ways. One is to use ANNs as tools for analyzing BNN data. For example, given electron microscopy images of stacks of thin slices of animal brains, an important goal of neuroscientists is to build a detailed 3D model of the brain’s neurons and dendrites. However, human experts need many weeks to annotate the images: Which parts depict neuronal membranes? Which parts are irrelevant background? This needs to be automated (e.g., Turaga et al., 2010). Our team with Dan Ciresan and Alessandro Giusti used ensembles of deep GPU-based max-pooling (MP) convolutional networks (CNNs) to solve this task through experience with many training images, and won the ISBI 2012 brain image segmentation contest.

Jürgen Schmidhuber on deep learning
Jürgen Schmidhuber on deep learning”

Another way of using ANNs to better understand BNNs is the following. The feature detectors learned by single-layer visual ANNs are similar to those found in early visual processing stages of BNNs. Likewise, the feature detectors learned in deep layers of visual ANNs should be highly predictive of what neuroscientists will find in deep layers of BNNs. While the visual cortex of BNNs may use quite different learning algorithms, its objective function to be minimized may be rather similar to the one of visual ANNs. In fact, results obtained with relatively deep artificial NNs (Lee et al., 2008, Yamins et al., 2013) seem compatible with insights about the visual pathway in the primate cerebral cortex, which has been studied for many decades.

INNS BigData: What about the converse? Can machine learning benefits from advances in neuroscience?

Jürgen Schmidhuber: While many early ANNs were inspired by BNNs, I think future progress in ANNs will not be driven much by new insights into BNNs. Since our mathematically optimal universal AIs and problem solvers (developed at the Swiss AI Lab IDSIA in the early 2000s) consist of just a few formulas, I believe that it will be much easier to synthesize intelligence from first principles, rather than analyzing how the brain does it. The brain exhibits many details which hide rather than elucidate the nature of its intelligence.

I believe that it will be much easier to synthesize intelligence from first principles, rather than analyzing how the brain does it

Since the 1990s I have tried to convince my students of this as follows (see also the 2011 preface of our upcoming RNN book): some neuroscientists focus on wetware details such as individual neurons and synapses, akin to electrical engineers focusing on hardware details such as characteristic curves of transistors, although the transistor’s main raison d’être is its value as a simple binary switch.

Others study large-scale phenomena such as brain region activity during thought, akin to physicists monitoring the time-varying heat distribution of a microprocessor, without realizing the simple nature of a quicksort program running on it. The appropriate  language to discuss intelligence is not the one of neurophysiology, electrical engineering, or physics, but the abstract language of mathematics and algorithms, in particular, machine learning and program search.

INNS BigData: Turning to other topics, one of the driving forces behind the current hype is big data. What is your definition of big data? What are the opportunities and challenges of the current big data deluge for the machine learning field?

Jürgen Schmidhuber: At any given moment, “big data” is more data than most people can conveniently store. As for the second question, already existing NN methods will be successfully applied to a plethora of big data problems. There will be lots of little challenges in terms of scaling for example, but perhaps no fundamental ones.

INNS BigData: In a recent interview, Prof. Jordan stated that with an unrestricted 1$ billion grant, he would work on natural language processing. What would you do with a 1$ billion grant?

Jürgen Schmidhuber: Spend a few million on building the prototype of an RL RNN-based, rather general problem solver (which I think is becoming possible now). See my previous answers. Use the rest to pay off my debts.

INNS BigData: Over the last decades, you have introduced the “Formal theory of creativity, fun, and intrinsic motivation”. Creativity is also one of the “founding topics” of artificial intelligence. Do you believe it has been a neglected aspect up to now?

Juergen Schmidhuber's talk: "When creative machines overtake man."
Juergen Schmidhuber’s talk: “When creative machines overtake man.”

Jürgen Schmidhuber: I guess it will attract lots of attention in good time. Most current commercial interest is in plain pattern recognition, while this theory is about the next step, namely, making patterns (related to one of my previous answers). Which experiments should a robot’s RL controller, C, conduct to generate data that quickly improves its adaptive, predictive world model, M, which in turn can help to plan ahead?

The theory says: use the learning progress of M (typically compression progress and speed-ups) as the intrinsic reward or fun for C. This motivates C to create action sequences (experiments) such that M can quickly discover new, previously unknown regularities. For example, a video of 20 falling apples can be greatly compressed after the discovery of the law of gravity, through predictive coding. This discovery is fun, and motivates the history-shaping C. I think this principle will be essential for future artificial scientists and artists.

INNS BigData: Let us close with an easy question. What would you say to a student starting today his/her PhD in machine learning?

Jürgen Schmidhuber: Read our overview web sites, and then our papers :-).  Seriously, study RNNs, and RL in partially observable environments, and artificial curiosity, and optimal program search.

Further readings

Home page of Jürgen Schmidhuber
Videos from Jürgen Schmidhuber

About the Author

Simone Scardapane is a PhD student in La Sapienza. He is enthusiast about machine learning technologies, and he is helping as a publicity chair for the conference.

References

Lee, H., Ekanadham, C., & Ng, A. Y. (2008). Sparse deep belief net model for visual area V2. In Advances in neural information processing systems (pp. 873-880).

Schmidhuber, J. (2015). Deep Learning in Neural Networks: An Overview. Neural Networks, 61, 85-117.

Turaga, S. C., Murray, J. F., Jain, V., Roth, F., Helmstaedter, M., Briggman, K., & Seung, H. S. (2010). Convolutional networks can learn to generate affinity graphs for image segmentation. Neural Computation, 22(2), 511-538.

Yamins, D. L., Hong, H., Cadieu, C., & DiCarlo, J. J. (2013). Hierarchical Modular Optimization of Convolutional Networks Achieves Representations Similar to Macaque IT and Human Ventral Stream. In Advances in Neural Information Processing Systems (pp. 3093-3101).