50 Years of Deep Learning and Beyond: an Interview with Jürgen Schmidhuber

INNS Big Data Conference website:

Juergen SchmidhuberOver the last decades, Jürgen Schmidhuber has been one of the leading protagonists in the advancement of machine learning and neural networks. Alone or with his research groups at TU Munich and the Swiss AI Lab IDSIA in Lugano, he pioneered practical recurrent neural networks, universal learning algorithms, the formal theory of creativity and many other topics. Recently he published an extensive overview on the history and theory of deep neural networks (Schmidhuber, 2015).

Prof. Schmidhuber will give a plenary talk and a tutorial and co-chair a workshop on deep learning at our conference. We interviewed him on the past and future of machine learning, on the never-ending quest for intelligence, and on the opportunities of the current big data era.

INNS BigData: Today, machine learning and neural networks are generating a tremendous interest. You have spoken about a “second Neural Network Renaissance”. What are its characteristics?

Jürgen Schmidhuber: It is a bit like the last neural network (NN) resurgence in the 1980s and early 1990s, but with million-times-faster computers. NNs of the 1950s still used learning methods similar to linear regression from the early 1800s. The first training methods for deep nonlinear NNs appeared in the 1960s (Ivakhnenko and others). Two decades later (1st NN renaissance) computers were 10,000 times faster per dollar than those of the 1960s. Another two decades later (2nd NN renaissance), they were again 10,000 times faster. Apparently, we will soon have the raw computational power of a human brain in a desktop machine. That is more than enough to solve many essential pattern recognition problems through (today’s extensions of) the previous millennium’s NN learning algorithms. The 2nd reNNaissance may be the final one. Since physics dictates that efficient future 3D hardware will be brain-like with many processors connected through many short and few long wires, I do not see how non-NN-like systems could cause another NN winter.

INNS BigData: From your emphasis on “second renaissance”, and from your latest review on deep learning, one gets a sense of the importance of machine learning history today. Do you have the impression that sometimes our scientific society pushes in the opposite direction? Is it damaging the community?

The 2nd neural networks reNNaissance may be the final one

Jürgen Schmidhuber: Machine learning is about using past observations to optimize future performance. It is about credit assignment. The machine learning community itself is a learning system, too, and should assign credit to the inventors of relevant methods. Recent “tabloid science” articles about “Deep Learning” didn’t do this, sometimes hyping minuscule improvements over previous work as revolutionary breakthroughs (this actually partially motivated my recent review with 888 references, most from the previous century). Science is a self-correcting business though, and there are encouraging signs that our field is rapidly improving its credit assignment processes, like in more mature fields such as mathematics.

INNS BigData: Let us go back at the present. At the last NIPS conference, Dr. Sutskever stated that “all supervised vector-to-vector problems are now solved thanks to deep feed-forward neural networks”, and “all supervised sequence-to-sequence problems are now solved thanks to deep LSTM networks”. As a pioneer in both methods, do you agree with this view?

Jürgen Schmidhuber: I think it is very generous of Ilya to say that. Take it with a ton of salt though: there are many pattern recognition problems and sequence-to-sequence problems that are NP-hard, without fast solution through any known computational method. Try to quickly distinguish prime numbers from others, or compress data sequences down to their shortest description. Local-minima-ridden gradient-based NNs are not yet perceived as a threat here. I think what Ilya meant is: lots of important real world pattern recognition problems (visual object detection, speech processing, basic text translation, event detection in videos, etc.) that used to be easy for humans but hard for computers are now becoming easy for computers too.

Lots of important real world pattern recognition problems are becoming easy for computers too

INNS BigData: In this sense, what do you expect the next breakthroughs in the field to be? What are the next things to become easy for the computers?

Jürgen Schmidhuber: We will go beyond mere pattern recognition towards the grand goal of AI, which is more or less: efficient reinforcement learning (RL) in complex, realistic, partially observable environments. A few years ago, our team with Jan Koutnik and Faustino Gomez showed for the first time that it is possible to learn complex video game control from scratch through large RL recurrent NNs with raw high-dimensional video input, without any teacher or unsupervised pre-training. I believe it will be possible to greatly scale up such approaches, and build RL robots that really deserve the name.

INNS BigData: Now that machine learning is moving towards implementing a higher level of intelligence, some claim that the new generation of neural network will help us shed light on the inner workings of the brain and on intelligence itself. How do you stand on this aspect? What is the role of neuroscience in the machine learning field?

Jürgen Schmidhuber: Artificial NNs (ANNs) can help to better understand biological NNs (BNNs) in at least two ways. One is to use ANNs as tools for analyzing BNN data. For example, given electron microscopy images of stacks of thin slices of animal brains, an important goal of neuroscientists is to build a detailed 3D model of the brain’s neurons and dendrites. However, human experts need many weeks to annotate the images: Which parts depict neuronal membranes? Which parts are irrelevant background? This needs to be automated (e.g., Turaga et al., 2010). Our team with Dan Ciresan and Alessandro Giusti used ensembles of deep GPU-based max-pooling (MP) convolutional networks (CNNs) to solve this task through experience with many training images, and won the ISBI 2012 brain image segmentation contest.

Jürgen Schmidhuber on deep learning
Jürgen Schmidhuber on deep learning”

Another way of using ANNs to better understand BNNs is the following. The feature detectors learned by single-layer visual ANNs are similar to those found in early visual processing stages of BNNs. Likewise, the feature detectors learned in deep layers of visual ANNs should be highly predictive of what neuroscientists will find in deep layers of BNNs. While the visual cortex of BNNs may use quite different learning algorithms, its objective function to be minimized may be rather similar to the one of visual ANNs. In fact, results obtained with relatively deep artificial NNs (Lee et al., 2008, Yamins et al., 2013) seem compatible with insights about the visual pathway in the primate cerebral cortex, which has been studied for many decades.

INNS BigData: What about the converse? Can machine learning benefits from advances in neuroscience?

Jürgen Schmidhuber: While many early ANNs were inspired by BNNs, I think future progress in ANNs will not be driven much by new insights into BNNs. Since our mathematically optimal universal AIs and problem solvers (developed at the Swiss AI Lab IDSIA in the early 2000s) consist of just a few formulas, I believe that it will be much easier to synthesize intelligence from first principles, rather than analyzing how the brain does it. The brain exhibits many details which hide rather than elucidate the nature of its intelligence.

I believe that it will be much easier to synthesize intelligence from first principles, rather than analyzing how the brain does it

Since the 1990s I have tried to convince my students of this as follows (see also the 2011 preface of our upcoming RNN book): some neuroscientists focus on wetware details such as individual neurons and synapses, akin to electrical engineers focusing on hardware details such as characteristic curves of transistors, although the transistor’s main raison d’être is its value as a simple binary switch.

Others study large-scale phenomena such as brain region activity during thought, akin to physicists monitoring the time-varying heat distribution of a microprocessor, without realizing the simple nature of a quicksort program running on it. The appropriate  language to discuss intelligence is not the one of neurophysiology, electrical engineering, or physics, but the abstract language of mathematics and algorithms, in particular, machine learning and program search.

INNS BigData: Turning to other topics, one of the driving forces behind the current hype is big data. What is your definition of big data? What are the opportunities and challenges of the current big data deluge for the machine learning field?

Jürgen Schmidhuber: At any given moment, “big data” is more data than most people can conveniently store. As for the second question, already existing NN methods will be successfully applied to a plethora of big data problems. There will be lots of little challenges in terms of scaling for example, but perhaps no fundamental ones.

INNS BigData: In a recent interview, Prof. Jordan stated that with an unrestricted 1$ billion grant, he would work on natural language processing. What would you do with a 1$ billion grant?

Jürgen Schmidhuber: Spend a few million on building the prototype of an RL RNN-based, rather general problem solver (which I think is becoming possible now). See my previous answers. Use the rest to pay off my debts.

INNS BigData: Over the last decades, you have introduced the “Formal theory of creativity, fun, and intrinsic motivation”. Creativity is also one of the “founding topics” of artificial intelligence. Do you believe it has been a neglected aspect up to now?

Juergen Schmidhuber's talk: "When creative machines overtake man."
Juergen Schmidhuber’s talk: “When creative machines overtake man.”

Jürgen Schmidhuber: I guess it will attract lots of attention in good time. Most current commercial interest is in plain pattern recognition, while this theory is about the next step, namely, making patterns (related to one of my previous answers). Which experiments should a robot’s RL controller, C, conduct to generate data that quickly improves its adaptive, predictive world model, M, which in turn can help to plan ahead?

The theory says: use the learning progress of M (typically compression progress and speed-ups) as the intrinsic reward or fun for C. This motivates C to create action sequences (experiments) such that M can quickly discover new, previously unknown regularities. For example, a video of 20 falling apples can be greatly compressed after the discovery of the law of gravity, through predictive coding. This discovery is fun, and motivates the history-shaping C. I think this principle will be essential for future artificial scientists and artists.

INNS BigData: Let us close with an easy question. What would you say to a student starting today his/her PhD in machine learning?

Jürgen Schmidhuber: Read our overview web sites, and then our papers :-).  Seriously, study RNNs, and RL in partially observable environments, and artificial curiosity, and optimal program search.

Further readings

Home page of Jürgen Schmidhuber
Videos from Jürgen Schmidhuber

About the Author

Simone Scardapane is a PhD student in La Sapienza. He is enthusiast about machine learning technologies, and he is helping as a publicity chair for the conference.


Lee, H., Ekanadham, C., & Ng, A. Y. (2008). Sparse deep belief net model for visual area V2. In Advances in neural information processing systems (pp. 873-880).

Schmidhuber, J. (2015). Deep Learning in Neural Networks: An Overview. Neural Networks, 61, 85-117.

Turaga, S. C., Murray, J. F., Jain, V., Roth, F., Helmstaedter, M., Briggman, K., & Seung, H. S. (2010). Convolutional networks can learn to generate affinity graphs for image segmentation. Neural Computation, 22(2), 511-538.

Yamins, D. L., Hong, H., Cadieu, C., & DiCarlo, J. J. (2013). Hierarchical Modular Optimization of Convolutional Networks Achieves Representations Similar to Macaque IT and Human Ventral Stream. In Advances in Neural Information Processing Systems (pp. 3093-3101).

9 thoughts on “50 Years of Deep Learning and Beyond: an Interview with Jürgen Schmidhuber

  1. Bill Howell 16 February 2015 / 3:26

    Does this response lead into the questions of at least a simple form of machine consciousness, and beyond that self definition? For example, if the system goes beyond simply presenting interesting patterns (or conclusions) as part of a specifically assigned or designed task, can it lead into the next step of [prioritizing “hits”, notifying “pertinent” people, recommending actions, helping to form teams for discussion]? In an ill-defined, open system like social media, this might involve having the system generalize its initial “marching orders” and perhaps defining new roles and targets? False positives could be a big problem, but could Deep Learning itself help to reduce “false positive advice” at a more abstract level? Measuring what is there is one thing, publishing a note and seeking reactions and actions is a process where some primitive level of consciousness may be necessary? I’m thinking of John Taylor’s model of concept – sort of going beyond consciousness to a point where a system begins to understand the implications of its actions and the effect on the external environment.

    Liked by 1 person

  2. Bill Howell 16 February 2015 / 3:28

    Oops – I was referring to the comment “… Most current commercial interest is in plain pattern recognition, while this theory is about the next step, namely, making patterns (related to one of my previous answers). Which experiments should a robot’s RL controller, C, conduct to generate data that quickly improves its adaptive, predictive world model, M, which in turn can help to plan ahead? …”


  3. Andrew Morris 17 February 2015 / 12:24

    One of the reasons than brain research has not advanced faster than it has is the practical difficulty of analysing the *complete* working of even relatively small areas of the brain. That is partly because noone knows to what level of scale it is necessary to observe the brain in order to understand its many functions, which may or may not rely crucially on chemical, genetic or quantum effects, but also largely because its working depends on large scale and rapidly changing neural interconnections, which are hard to trace accurately.

    Our understanding of the brain advances as new tools become available for observing it. If recent advances in ML help in the detailed mapping and analysis of brain function, as Jürgen Schmidhuber suggests, then more of the principles by which brains operate are likely to emerge. Given the problem solving power of even insect brains, it is clear that the most important advances in AI are going to result from deeper understanding of the brain. Schmidhuber has too much confidence in the ability of theoretical models and ever faster computers.

    Jürgen will know better than most that the majority of theoretical objectives in RL (reinforcement learning) are mathematically intractable and are only practical to use when gross oversimplifications are made. While RL is a very important line of research, it is no more capable of delivering human level AI than genetic algorithms are likely to evolve AI, even running on the next generation of quantum computers, or whatever, which may run several orders of magnitude faster than present computers.

    On this basis the answer to “Can machine learning benefit from advances in neuroscience?” should therefore be a resounding “yes”. The main role of ML in achieving AI is most likely to be as a tool for the modelling and observation of biological brains. ANNs will also benefit research in BNNs by providing BNN models to be experimented with in place of tortured animals.

    Mathematics and fundamental physics will never be complete, and it is not unlikely that the new physics required to unify relativity with quantum mechanics will depend on advances in physics which both result from, and allow, deeper understanding of the workings of the brain. If there is one problem whose solution could lead towards this unification it is the problem of artificial natural language understanding (NLU). To get over the rut which theoretical physics has now got itself into we need to advance our understanding of meaning itself. From that point of view, as well as the enormous practical utility of any major advances in artificial general intelligence which would come with NLU, Jordan is right that natural language processing should be prioritised at least as highly as theoretical ML when it comes to funding.

    Liked by 2 people

  4. Wolfgang Lorenz 19 February 2015 / 6:38

    Implementing subnet timeshifting with ANN savestates doesn’t help understanding BNN loops.



Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s