Towards an Era of Cognitive Computing: 10 Questions to Brenda Dietrich

INNS Big Data Conference website:
http://innsbigdata.org/

Brenda DietrichBrenda Dietrich is an IBM Fellow and Vice President. Starting in 1984 as a researcher for the IBM Thomas J. Watson center, she currently leads the emerging technologies team for IBM Watson, extending and applying IBM’s cognitive computing technology. She has been the president of INFORMS, has served on the Board of Trustees of SIAM, and is a member of several university advisory boards.

In view of Dr. Dietrich’s plenary talk at our conference, we interviewed her on the Watson platform, her long history with IBM, and her vision about the future of cognitive technology.

INNS BigData: I would like to start from the Watson technology, which is undoubtedly one of IBM’s most advanced projects. Can you describe it? What makes it unique?

Watson represents a new era of computing in which apps and systems interact with human users naturally

Brenda Dietrich: Watson is the first commercially available cognitive computing capability. The system is delivered through the cloud, and it analyzes high volumes of diverse data, understands complex questions posed in natural language and proposes evidenced-based answers. IBM Watson represents a new era of computing in which apps and systems interact with human users naturally, augment our knowledge with Big Data insights and learn to improve how they assist us – the systems understand the world the way that humans do: through senses, learning and experience.

Fueled by innovation with  a mission to transform industries and professions, Watson is uniquely positioned at the forefront of the new era of computing, one that promises to transform the world.

INNS BigData: The public came to know Watson after its victory in the Jeopardy program. Since then, it has been applied to an enormous range of projects, from cooking to the development of cognitive apps. What is IBM’s vision for Watson?

Brenda Dietrich: Hundreds of clients and partners across six continents, 25 countries and more than 20 industries have active projects underway with Watson. The range of professions and industries includes doctors, pharmaceutical, banks and financial institutions, customers, students and academic institutions, retailers and lawyers.

watson_on_jeopardy
A screenshot of the Jeopardy episode with Watson [source]
In terms of our vision, we seek to continue transforming industries and professions by making substantial progress in the development and commercialization of cloud-delivered Watson solutions worldwide. With the Watson Headquarters at 51 Astor Place, we are seeking to help shape New York City’s Silicon Alley by incorporating the power of cognitive computing into one of the most vibrant, exciting areas for technology innovation in the world today. This includes fostering an ecosystem of startups and businesses by placing the power of cognitive computing into the hands of developers and tech enthusiasts via the Watson Developer Cloud to fuel the creation of next-generation apps powered by Watson.

But Watson’s boundaries continue to expand far beyond NYC. We recently announced a major collaboration with telco and business giant SoftBank in Japan where we will help bring a new class of apps to the region, which will be enabled by Watson’s ability to speak Japanese.  We envision this expansion continuing as Watson learns to understand the nuances of language and culture, and this also includes expanding our physical presence worldwide.

INNS BigData: Future applications of Watson clearly include big data visualization, analytics, and data-driven inference. What are the challenging areas in big data research from Watson’s point of view? Do you think they will have long-lasting consequences?

Brenda Dietrich: Some of the challenging areas we see moving forward are also opportunities for Watson, which means continued research and development by our engineers in order to bring long-lasting positive benefits. These include:

  • Evidence creation – being able to determine the shortest path calculation to answer questions around how long it takes to get from one location to another.
  • Evidence evaluation – the simulation of multiple scenarios to assert, with confidence, the shortest path calculation (for example the simulation of traffic scenarios to say that it takes less than an hour to get from Grand Central Terminal in NY to LaGuardia Airport).
  • Natural language interface to Operations Research or Analytic tools – to incorporate context into answers, for example to the question “when should I leave for the airport?”
  • Model creation – supporting the role of the expert modeler in extracting decision variables, objectives, constraints and other relationships from text, and domain-specific modeling languages.
  • Locating and navigating evidence – finding statistical support for hypotheses, much as text search is used to find “factoid” or “passage” support for an answer to a question.
  • Support tools – capturing, recording, mining and incorporating the modifications the user makes to the output of the tool in future versions of the tool.

INNS BigData: Watson is working primarily with text data. Is there a plan to move it to other unstructured domains, e.g. images or videos?

Brenda Dietrich while working at the IBM Thomas J. Watson Center
Brenda Dietrich while working at the IBM Thomas J. Watson Center [source]

Brenda Dietrich: An initial visual recognition service that scores images against pre-defined categories is in beta-test now, and available in the cognitive zone on IBM BlueMix. A visual recognition training service, in which positive and negative examples of a category are provided to train Watson to recognize the category and score new images against it, is also available. In addition, the acquisition of Alchemy brings additional image  recognition capability to Watson.

INNS BigData: You come from an operations research (OR) background, which has been one of the focuses of your research activities. What is the role of OR in the big data field?

Brenda Dietrich: I see significant potential for the use of cognitive computing and the methods of OR to be combined to create more robust, continually learning decision support applications. Much additional research is required in this area, research that will likely advance both fields. This is a topic that I will discuss in some detail during my talk at the conference.

INNS BigData: More in general, you have now been with IBM for more than 40 years, starting as a researcher for the IBM Thomas J. Watson Research Center. Of all the things that you have done, what are the ones that you are most fond of?

Brenda Dietrich: Virtually all the work I have done at IBM has had a common theme: supporting the decision making process through access to data, computation, and mathematical rigor. It’s hard to pick a favorite project, but there are two that stand out. One was a fairly simple tool, first deployed in the late 1980’s, that helped IBM effectively allocate scarce memory modules to PCs.  It enabled transformation of a contentious multi-person, multi-day process to fast evaluation and reporting process that could be executed by a single person whenever needed.  The same methods, extended and reimplemented to take advantage of better hardware and software packages, are still in use today in IBM, buried deep inside a wide variety of resource allocation processes.  The second was a piece of work that demonstrated how planning functions, typically done on a pre-defined schedule, can be used “continually” to adjust and update plans whenever the data used to generate the plans has changed.   This approach, which we called “continual optimization” was used in a number of applications, ranging from workforces scheduling to vehicle routing.   An interesting aspect of the work was the need to include multiple “solvers” each with different performance characteristics to simultaneously address the need to maintain feasibility (requiring a very fast update) and the desire to achieve optimality (requiring more significant computational time).

INNS BigDataPaul Horn said of you that[Brenda] has a rare ability to travel between two very different worlds”, when referring to your double life as mathematician and business consultant. Is it hard to live in two worlds? Do you thinks mathematicians should “go out” more often?

For me, the math is a means to an end, rather than the end itself

Brenda Dietrich: The field of mathematics  (and of computer science, operations research, etc) is vast. While I love the beauty of the theoretical work in my field, I get the most personal satisfaction from creating things that others can use. To find out what is useful, one has to understand how things (e.g. decision processes) are currently done, where there are opportunities for improvement. To understand what is possible, one has to understand both the state of the technology, and the organizational and behavioral issues present in the decision process. I’ve always used my experience in the “real world” with clients and partners to inform and inspire my research and the work of whatever research teams I am able to influence.

And I’m very curious – I want to know why things are done the way they are done, and whether there is any better way to do them.  In particular, I’ve always been interested in the effective use of resources, and in eliminating waste whenever possible.  For me, the math is a means to an end, rather than the end itself.

INNS BigData: All these years of building and moving forward technology put you in a very peculiar position. How do you imagine the next twenty years? Is it going to be a very different world?

Brenda Dietrich: Talking about the next twenty years from a technology standpoint is essentially talking about a lifetime. Looking back on the past twenty years, some of the most pivotal tech innovations have happened in the past 5-10, and that window of time will continue to shrink as these innovations will continue at a rapid pace.

IBM Watson’s Brenda Dietrich at MediFuture 2024
IBM Watson’s Brenda Dietrich at MediFuture 2024

The world will be different in the sense that what we’re seeing now – human and computer collaboration – will only be more seamless and better understood broadly in terms of the benefits. What Watson is achieving today through Chef Watson and our work in the healthcare space, as two examples, showcases the power of combining cognitive systems that transform decision-making with the unique human experience, and as we look into the future the options are endless.

INNS BigData: Let us go back to the present: another futuristic project of IBM is the True North neuromorphic chip. Recently, Yann LeCun stated thatIBM built the wrong thing”, and more in particular “it could have been useful if it had not tried to stick too close to biology”. How would you reply to this? What is your idea of the interplay between neuroscience and the quest for intelligence?

Brenda Dietrich: I think neuroscience is one of many areas that should be explored in the quest for building more intelligent and useful machines.   I am hopeful that as scientists explore neuromorphic  computing, they will learn more about both computation, and neuroscience.

INNS BigData: In our previous interview, we asked Juergen Schmidhuber what he would do with a 1$ billion unrestricted grant, which is more or less what IBM invested in the Watson division. Let us close by “upping the ante”, as poker players say: what would you do with a 10$ billion grant? What is the next technological “grand challenge”?

Brenda Dietrich: I think the ideal usage of the $10b grant would really be to bring our vision to life more quickly – expand into other markets, round out the capabilities, learn more languages and place the power of cognitive computing in the hands of people around the globe sooner rather than later. It’s going to happen, but this would allow it to happen faster, and would give more people across industries and professions the ability to utilize cognitive computing to solve whatever the next ‘technological grand challenge’ may be.

Further Readings

Watson homepage
Watson Developer Cloud
Brenda Dietrich page on IBM and INFORMS

About the Author

Simone Scardapane is a PhD student in La Sapienza. He is enthusiast about machine learning technologies, and he is helping as a publicity chair for the conference.

50 Years of Deep Learning and Beyond: an Interview with Jürgen Schmidhuber

INNS Big Data Conference website:
http://innsbigdata.org/

Juergen SchmidhuberOver the last decades, Jürgen Schmidhuber has been one of the leading protagonists in the advancement of machine learning and neural networks. Alone or with his research groups at TU Munich and the Swiss AI Lab IDSIA in Lugano, he pioneered practical recurrent neural networks, universal learning algorithms, the formal theory of creativity and many other topics. Recently he published an extensive overview on the history and theory of deep neural networks (Schmidhuber, 2015).

Prof. Schmidhuber will give a plenary talk and a tutorial and co-chair a workshop on deep learning at our conference. We interviewed him on the past and future of machine learning, on the never-ending quest for intelligence, and on the opportunities of the current big data era.

INNS BigData: Today, machine learning and neural networks are generating a tremendous interest. You have spoken about a “second Neural Network Renaissance”. What are its characteristics?

Jürgen Schmidhuber: It is a bit like the last neural network (NN) resurgence in the 1980s and early 1990s, but with million-times-faster computers. NNs of the 1950s still used learning methods similar to linear regression from the early 1800s. The first training methods for deep nonlinear NNs appeared in the 1960s (Ivakhnenko and others). Two decades later (1st NN renaissance) computers were 10,000 times faster per dollar than those of the 1960s. Another two decades later (2nd NN renaissance), they were again 10,000 times faster. Apparently, we will soon have the raw computational power of a human brain in a desktop machine. That is more than enough to solve many essential pattern recognition problems through (today’s extensions of) the previous millennium’s NN learning algorithms. The 2nd reNNaissance may be the final one. Since physics dictates that efficient future 3D hardware will be brain-like with many processors connected through many short and few long wires, I do not see how non-NN-like systems could cause another NN winter.

INNS BigData: From your emphasis on “second renaissance”, and from your latest review on deep learning, one gets a sense of the importance of machine learning history today. Do you have the impression that sometimes our scientific society pushes in the opposite direction? Is it damaging the community?

The 2nd neural networks reNNaissance may be the final one

Jürgen Schmidhuber: Machine learning is about using past observations to optimize future performance. It is about credit assignment. The machine learning community itself is a learning system, too, and should assign credit to the inventors of relevant methods. Recent “tabloid science” articles about “Deep Learning” didn’t do this, sometimes hyping minuscule improvements over previous work as revolutionary breakthroughs (this actually partially motivated my recent review with 888 references, most from the previous century). Science is a self-correcting business though, and there are encouraging signs that our field is rapidly improving its credit assignment processes, like in more mature fields such as mathematics.

INNS BigData: Let us go back at the present. At the last NIPS conference, Dr. Sutskever stated that “all supervised vector-to-vector problems are now solved thanks to deep feed-forward neural networks”, and “all supervised sequence-to-sequence problems are now solved thanks to deep LSTM networks”. As a pioneer in both methods, do you agree with this view?

Jürgen Schmidhuber: I think it is very generous of Ilya to say that. Take it with a ton of salt though: there are many pattern recognition problems and sequence-to-sequence problems that are NP-hard, without fast solution through any known computational method. Try to quickly distinguish prime numbers from others, or compress data sequences down to their shortest description. Local-minima-ridden gradient-based NNs are not yet perceived as a threat here. I think what Ilya meant is: lots of important real world pattern recognition problems (visual object detection, speech processing, basic text translation, event detection in videos, etc.) that used to be easy for humans but hard for computers are now becoming easy for computers too.

Lots of important real world pattern recognition problems are becoming easy for computers too

INNS BigData: In this sense, what do you expect the next breakthroughs in the field to be? What are the next things to become easy for the computers?

Jürgen Schmidhuber: We will go beyond mere pattern recognition towards the grand goal of AI, which is more or less: efficient reinforcement learning (RL) in complex, realistic, partially observable environments. A few years ago, our team with Jan Koutnik and Faustino Gomez showed for the first time that it is possible to learn complex video game control from scratch through large RL recurrent NNs with raw high-dimensional video input, without any teacher or unsupervised pre-training. I believe it will be possible to greatly scale up such approaches, and build RL robots that really deserve the name.

INNS BigData: Now that machine learning is moving towards implementing a higher level of intelligence, some claim that the new generation of neural network will help us shed light on the inner workings of the brain and on intelligence itself. How do you stand on this aspect? What is the role of neuroscience in the machine learning field?

Jürgen Schmidhuber: Artificial NNs (ANNs) can help to better understand biological NNs (BNNs) in at least two ways. One is to use ANNs as tools for analyzing BNN data. For example, given electron microscopy images of stacks of thin slices of animal brains, an important goal of neuroscientists is to build a detailed 3D model of the brain’s neurons and dendrites. However, human experts need many weeks to annotate the images: Which parts depict neuronal membranes? Which parts are irrelevant background? This needs to be automated (e.g., Turaga et al., 2010). Our team with Dan Ciresan and Alessandro Giusti used ensembles of deep GPU-based max-pooling (MP) convolutional networks (CNNs) to solve this task through experience with many training images, and won the ISBI 2012 brain image segmentation contest.

Jürgen Schmidhuber on deep learning
Jürgen Schmidhuber on deep learning”

Another way of using ANNs to better understand BNNs is the following. The feature detectors learned by single-layer visual ANNs are similar to those found in early visual processing stages of BNNs. Likewise, the feature detectors learned in deep layers of visual ANNs should be highly predictive of what neuroscientists will find in deep layers of BNNs. While the visual cortex of BNNs may use quite different learning algorithms, its objective function to be minimized may be rather similar to the one of visual ANNs. In fact, results obtained with relatively deep artificial NNs (Lee et al., 2008, Yamins et al., 2013) seem compatible with insights about the visual pathway in the primate cerebral cortex, which has been studied for many decades.

INNS BigData: What about the converse? Can machine learning benefits from advances in neuroscience?

Jürgen Schmidhuber: While many early ANNs were inspired by BNNs, I think future progress in ANNs will not be driven much by new insights into BNNs. Since our mathematically optimal universal AIs and problem solvers (developed at the Swiss AI Lab IDSIA in the early 2000s) consist of just a few formulas, I believe that it will be much easier to synthesize intelligence from first principles, rather than analyzing how the brain does it. The brain exhibits many details which hide rather than elucidate the nature of its intelligence.

I believe that it will be much easier to synthesize intelligence from first principles, rather than analyzing how the brain does it

Since the 1990s I have tried to convince my students of this as follows (see also the 2011 preface of our upcoming RNN book): some neuroscientists focus on wetware details such as individual neurons and synapses, akin to electrical engineers focusing on hardware details such as characteristic curves of transistors, although the transistor’s main raison d’être is its value as a simple binary switch.

Others study large-scale phenomena such as brain region activity during thought, akin to physicists monitoring the time-varying heat distribution of a microprocessor, without realizing the simple nature of a quicksort program running on it. The appropriate  language to discuss intelligence is not the one of neurophysiology, electrical engineering, or physics, but the abstract language of mathematics and algorithms, in particular, machine learning and program search.

INNS BigData: Turning to other topics, one of the driving forces behind the current hype is big data. What is your definition of big data? What are the opportunities and challenges of the current big data deluge for the machine learning field?

Jürgen Schmidhuber: At any given moment, “big data” is more data than most people can conveniently store. As for the second question, already existing NN methods will be successfully applied to a plethora of big data problems. There will be lots of little challenges in terms of scaling for example, but perhaps no fundamental ones.

INNS BigData: In a recent interview, Prof. Jordan stated that with an unrestricted 1$ billion grant, he would work on natural language processing. What would you do with a 1$ billion grant?

Jürgen Schmidhuber: Spend a few million on building the prototype of an RL RNN-based, rather general problem solver (which I think is becoming possible now). See my previous answers. Use the rest to pay off my debts.

INNS BigData: Over the last decades, you have introduced the “Formal theory of creativity, fun, and intrinsic motivation”. Creativity is also one of the “founding topics” of artificial intelligence. Do you believe it has been a neglected aspect up to now?

Juergen Schmidhuber's talk: "When creative machines overtake man."
Juergen Schmidhuber’s talk: “When creative machines overtake man.”

Jürgen Schmidhuber: I guess it will attract lots of attention in good time. Most current commercial interest is in plain pattern recognition, while this theory is about the next step, namely, making patterns (related to one of my previous answers). Which experiments should a robot’s RL controller, C, conduct to generate data that quickly improves its adaptive, predictive world model, M, which in turn can help to plan ahead?

The theory says: use the learning progress of M (typically compression progress and speed-ups) as the intrinsic reward or fun for C. This motivates C to create action sequences (experiments) such that M can quickly discover new, previously unknown regularities. For example, a video of 20 falling apples can be greatly compressed after the discovery of the law of gravity, through predictive coding. This discovery is fun, and motivates the history-shaping C. I think this principle will be essential for future artificial scientists and artists.

INNS BigData: Let us close with an easy question. What would you say to a student starting today his/her PhD in machine learning?

Jürgen Schmidhuber: Read our overview web sites, and then our papers :-).  Seriously, study RNNs, and RL in partially observable environments, and artificial curiosity, and optimal program search.

Further readings

Home page of Jürgen Schmidhuber
Videos from Jürgen Schmidhuber

About the Author

Simone Scardapane is a PhD student in La Sapienza. He is enthusiast about machine learning technologies, and he is helping as a publicity chair for the conference.

References

Lee, H., Ekanadham, C., & Ng, A. Y. (2008). Sparse deep belief net model for visual area V2. In Advances in neural information processing systems (pp. 873-880).

Schmidhuber, J. (2015). Deep Learning in Neural Networks: An Overview. Neural Networks, 61, 85-117.

Turaga, S. C., Murray, J. F., Jain, V., Roth, F., Helmstaedter, M., Briggman, K., & Seung, H. S. (2010). Convolutional networks can learn to generate affinity graphs for image segmentation. Neural Computation, 22(2), 511-538.

Yamins, D. L., Hong, H., Cadieu, C., & DiCarlo, J. J. (2013). Hierarchical Modular Optimization of Convolutional Networks Achieves Representations Similar to Macaque IT and Human Ventral Stream. In Advances in Neural Information Processing Systems (pp. 3093-3101).