The Generative and Discriminative Learning Interface


Workshop on the Generative and Discriminative Learning Interface
In conjunction with the NIPS conference.

Overview

Generative and discriminative learning are two of the major paradigms for solving prediction problems in machine learning, each offering important distinct advantages. They have often been studied in different sub-communities, but over the past decade, there has been increasing interest in trying to understand and leverage the advantages of both approaches. The goal of this workshop is to map out our current understanding of the empirical and theoretical advantages of each approach as well as their combination, and to identify open research directions.

Background and Objectives

In generative approaches for prediction tasks, one models a joint distribution on inputs and outputs and parameters are typically estimated using a likelihood-based criterion. In discriminative approaches, one directly models the mapping from inputs to outputs (either as a conditional distribution or simply as a prediction function); parameters are estimated by optimizing objectives related to various loss functions. Discriminative approaches have shown better performance given enough data, as they are better tailored to the prediction task and appear more robust to model misspecification. Despite the strong empirical success of discriminative methods in a wide range of applications, when the structures to be learned become more complex than the amount of training data (e.g., in machine translation, scene understanding, biological process discovery), some other source of information must be used to constrain the space of candidate models (e.g., unlabeled examples, related data sources or human prior knowledge). Generative modeling is a principled way of encoding this additional information, e.g., through probabilistic graphical models or stochastic grammar rules. Moreover, they provide a natural way to use unlabeled data and are sometimes more computationally efficient.

Theoretical analysis of generative versus discriminative learning has a long history in statistics, where the focus was on asymptotic analyses (e.g. [Efron 75]). Ng and Jordan provided an initial comparison of generative versus discriminative learning in the non-asymptotic regime in the most cited paper on the topic in machine learning [Ng 02]. For a few years, this paper was one of the only machine learning papers providing a theoretical comparison, and was responsible for the conventional wisdom: "use generative learning for small amount of data and discriminative learning for large amounts". Recently, there has been new advances on our theoretical understanding [Liang 08, Xue 08] and their combination [Bouchard 07, Xue 09].

On the empirical side, combinations of discriminative and generative methodologies have been explored by several authors [Raina 04, Bouchard 04, McCallum 06, Bishop 07, Schmah 09] in many fields such as natural language processing, speech recognition, and computer vision. In particular, the recent "deep learning" revolution of neural networks relies heavily on a hybrid generative-discriminative approach: an unsupervised generative learning phase ("pre-training") is followed by discriminative fine-tuning. Given these recent trends, a workshop on the interplay of generative and discriminative learning seem especially relevant.

Hybrid generative-discriminative techniques face computational challenges. For some models, training these hybrids is akin to the discriminative training of generative models, which is a notoriously hard problem ([Bottou 91] for discriminatively trained HMM, [Jebara 04, Salojarvi 05] for EM-like algorithms), though for other models, learning can be in fact simple [Raina 04, Wettig 03]. Alternatively, the use of generative models in predictive settings has been be explored, e.g., through the use of Fisher kernels [Jaakkola 98] or other probabilistic kernels. One of the goal of the workshop will be to highlight the connections between these approaches.

The aim of this workshop is to provide a platform for both theoretical and applied researchers from different communities to discuss the status of our understanding on the interplay between generative and discriminative learning, as well as to identify forward-looking open problems of interest to the NIPS community. Examples of topics of interest to the workshop are as follows:

  • Theoretical analysis of generative vs. discriminative learning
  • Techniques for combining generative and discriminative approaches
  • Successful applications of hybrids
  • Empirical comparison of generative vs. discriminative learning
  • Inclusion of prior knowledge in discriminative methods (semi-supervised approaches, generalized expectation criteria, posterior regularization, etc.)
  • Insights into the role of generative/discriminative interface for deep learning
  • Computational issues in discriminatively trained generative models/hybrid models
  • Map of possible generative/discriminative approaches and combinations
  • Bayesian approaches optimized for predictive performance
  • Comparison of model-free and model-based approaches in statistics or reinforcement learning

References

[Bishop 07] C. M. Bishop and J. Lasserre, Generative or Discriminative? getting the best of both worlds. In Bayesian Statistics 8, Bernardo, J. M. et al. (Eds), Oxford University Press. 3–23, 2007.

[Bottou 91] L. Bottou, Une approche théorique de l'apprentissage connexionniste: Applications à la reconnaissance de la parole. Doctoral dissertation, Université de Paris XI, 1991.

[Bouchard 04] G. Bouchard and B. Triggs, The tradeoff between generative and discriminative classifiers. In J. Antoch, editor, Proc. of COMPSTAT'04, 16th Symposium of IASC, volume 16. Physica-Verlag, 2004.

[Bouchard 07] G. Bouchard, Bias-variance tradeoff in hybrid generative-discriminative models. In proc. of the Sixth International conference on Machine Learning and Applications (ICMLA 07), Cincinnati, Ohio, USA, 13-15 December 2007.

[Efron 75] B. Efron, The Efficiency of Logistic Regression Compared to Normal Discriminant Analysis. Journal of the American Statistical Association, 70(352), 892—898, 1975.

[Greiner 02] R. Greiner and W. Zhou. Structural extension to logistic regression: Discriminant parameter learning of belief net classifiers. In Proceedings of the Eighteenth Annual National Conference on Artificial Intelligence (AAAI-02), 167–173, 2002.

[Jaakkola 98] T. Jaakkola and D. Haussler. Exploiting generative models in discriminative classifiers. In Advances in Neural Information Processing Systems 11, 1998.

[Jaakkola 99] T. Jaakkola, M. Meila, and T. Jebara. Maximum entropy discrimination. In Advances in Neural Information Processing Systems 12. MIT Press, 1999.

[Jebara 04] T. Jebara, Machine Learning - Discriminative and Generative. International Series in Engineering and Computer Science, Springer, Vol. 755, 2004.

[Liang 08] P. Liang and M. I. Jordan, An asymptotic analysis of generative, discriminative, and pseudo-likelihood estimators. In Proceedings of the 25th International Conference on Machine Learning (ICML), 2008.

[McCallum 06] A. McCallum, C. Pal, G. Druck and X. Wang, Multi-Conditional Learning: Generative/Discriminative Training for Clustering and Classification. AAAI, 2006.

[Ng 02] A. Y. Ng and M. I. Jordan, On Discriminative vs. Generative Classifiers: A comparison of logistic regression and Naive Bayes. In Advances in Neural Information Processing Systems 14, 2002.

[Salojarvi 05] J. Salojärvi, K. Puolamäki and S. Kaski, Expectation maximization algorithms for conditional likelihoods. In Proceedings of the 22nd International Conference on Machine Learning (ICML), 2005.

[Schmah 09] T. Schmah, G. E Hinton, R. Zemel, S. L. Small and S. Strother, Generative versus discriminative training of RBMs for classification of fMRI images. In Advances in Neural Information Processing Systems 21, 2009.

[Wettig 03] H. Wettig, P. Grünwald, T. Roos, P. Myllymäki and H.Tirri, When discriminative learning of Bayesian network parameters is easy. In Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence (IJCAI 2003), 491-496, August 2003

[Xue 08] J.-H Xue and D.M. Titterington, Comment on "discriminative vs. generative classifiers: a comparison of logistic regression and naive Bayes". Neural Processing Letters, 28(3), 169-187, 2008.

[Xue 09] J.-H Xue and D.M. Titterington, Interpretation of hybrid generative/discriminative algorithms. Neurocomputing, 72(7-9), 1648-1655, 2009.


PASCAL2.png
xlogo.gif
The workshop is sponsored by the PASCAL-2 European Network of Excellence
and Xerox

back to top

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License