You are missing trading opportunities:
- Free trading apps
- Over 8,000 signals for copying
- Economic news for exploring financial markets
Registration
Log in
You agree to website policy and terms of use
If you do not have an account, please register
CS480/680 Lecture 6: Sum-product networks (Pranav Subramani)
CS480/680 Lecture 6: Sum-product networks (Pranav Subramani)
The lecture discusses the concepts of sum-product networks (SPN) which are networks composed of sums and products, used for tractable probabilistic modelling that yields non-exponential runtimes and has many applications such as interpretability and easy marginal density computation. The video also mentions SPN's excellent performance with convolutional neural networks, its potential in building better generative models when combined with models like GANs and variation water encoders, and the untapped potential research areas for SPNs including adversarial robustness, reinforcement learning scenarios, and modelling expected utilities in games. The theoretical guarantee of interpreting the model and the opportunity for academics to make significant contributions in the field of machine learning were also highlighted.
CS480/680 Lecture 6: EM and mixture models (Guojun Zhang)
CS480/680 Lecture 6: EM and mixture models (Guojun Zhang)
In CS480/680 Lecture 6, Professor Guojun Zhang discusses the basics of unsupervised learning and clustering, focusing on mixture models and their use in clustering data. The lecture centers around the Expectation-Maximization algorithm and its Estep and Mstep processes, as well as gradient descent as an optimization method. The potential project proposed involves studying how EM and gradient descent behave in learning mixture models, with the ultimate goal being to propose a better algorithm to avoid bad local minimums. A mathematical background is noted as necessary for the project.
CS480/680 Lecture 6: Model compression for NLP (Ashutosh Adhikari)
CS480/680 Lecture 6: Model compression for NLP (Ashutosh Adhikari)
In this video, the presenter discusses the concept of model compression for NLP and the challenges of processing time and memory requirements as the number and depth of deep neural networks increase. Model compression techniques are categorized, and the oldest method, parameter pruning and sharing, is introduced. The speaker further elaborates on the concept of a student-teacher system for model compression in NLP and how the objective function is used to compress a larger model into a smaller student model while retaining accuracy. Finally, the potential importance of compressing models in the context of recent work on developing large-scale NLP models is highlighted.
CS480/680 Lecture 7: Mixture of Gaussians
CS480/680 Lecture 7: Mixture of Gaussians
In this lecture about mixture of Gaussians, the speaker explains how the model can be used for classification by constructing a prior distribution for each class, which enables the construction of a probabilistic model using Bayes' theorem to estimate the probability of a class for a given data point. The lecture also covers the process of calculating the likelihood of a data point belonging to a certain class and how this is used to determine the class prediction. The lecture notes explore the relationship between the softmax function and the arc max distribution and how the shape and boundaries of the Gaussian is determined by the covariance matrix. Finally, the lecture details the process of maximum likelihood learning and how it can be used to estimate the mean and covariance matrix for a mixture of Gaussians model.
CS480/680 Lecture 8: Logistic regression and generalized linear models
CS480/680 Lecture 8: Logistic regression and generalized linear models
This first part of the lecture on "CS480/680: Logistic Regression and Generalized Linear Models" introduces the idea of the exponential family of distributions and its relation to logistic regression, a powerful technique used for classification problems. The lecture explains that logistic regression aims to fit the best logistic function that models the posterior for a given dataset, and for problems with a few dimensions and weights, Newton's method can be used to find the minimum of the objective function, which is a convex function. The instructor also highlights the importance of logistic regression in recommender systems and ad placement, where the simplicity and efficiency of the technique make it ideal for making personalized recommendations based on user characteristics and behaviors.
The lecture also covers the topic of logistic regression and generalized linear models. The instructor discusses the limitations of Newton's method for logistic regression, such as the issue of overfitting caused by arbitrary large weights and singularity problems in the Hessian matrix. To prevent overfitting, regularization is suggested. The instructor introduces generalized linear models (GLMs) that can be used to work with non-linear separators efficiently. GLMs involve mapping the inputs to a new space where linear regression and classification can be done in a non-linear way as long as the mapping is non-linear. The lecture also covers basis functions and their types that can be used to perform nonlinear regression and classification.
CS480/680 Lecture 9: Perceptrons and single layer neural nets
CS480/680 Lecture 9: Perceptrons and single layer neural nets
This lecture introduces neural networks with a focus on the elementary type, the perceptron, which produces a linear separator for classification. The lecture explores how weights are used to compute a linear combination of inputs that pass through an activation function to produce outputs, and how different weights can be used to approximate logic gates such as AND, OR, and NOT gates. The lecturer discusses the feedforward neural network and how the perceptron learning algorithm is used for binary classification and how gradient descent can optimize weights. Limitations of using a line to separate data are discussed, and the logistic sigmoid activation function is introduced as a possible solution, with a focus on how the weights can be trained using the logistic sigmoid activation function.
This lecture on Perceptrons and single-layer neural nets covers the use of logistic sigmoid activation functions to minimize squared error and the introduction of the learning rate as a crucial parameter in sequential gradient descent. The lecturer also demonstrates how neural networks with multiple layers can be composed to approximate any function arbitrarily closely using trash-holding functions, and how backpropagation can be used to train a network to learn arbitrary functions. The instructor emphasizes the versatility and efficiency of neural networks, citing their widespread use in solving various problems such as speech recognition, computer vision, machine translation, and word embeddings.
CS480/680 Lecture 10: Multi-layer neural networks and backpropagation
CS480/680 Lecture 10: Multi-layer neural networks and backpropagation
This lecture on multi-layer neural networks and backpropagation explains the limitations of linear models and the need for non-linear models such as multi-layer neural networks. The lecturer discusses the different activation functions that can be used in neural networks and how they allow for non-linear basis functions. The lecture goes on to explain how the backpropagation algorithm is used to compute the gradient of the error with respect to every weight in a neural network. Automatic differentiation tools are also discussed as a way to efficiently compute the deltas and gradients in a neural network. Overall, the lecture emphasizes the flexibility and power of neural networks in approximating a wide range of functions.
The lecturer in this video discusses issues in optimizing neural networks, such as slow convergence, local optimization, non-convex optimization, and overfitting. To overcome slow convergence, techniques such as regularization and dropout can be used. Additionally, the speaker explains the behavior of gradient descent for optimization, highlighting the need for optimizing step size to improve its efficiency. The DES grant algorithm is proposed as a solution, which adjusts the learning rate of each dimension separately. The speaker also introduces RMSProp, a weighted moving average of previous gradients. Finally, the speaker discusses Adam, which involves taking a weighted moving average of the gradient itself, and shows that it outperforms other techniques such as SGD Nesterov.
CS480/680 Lecture 11: Kernel Methods
CS480/680 Lecture 11: Kernel Methods
In this lecture, the concept of kernel methods is introduced as a way to scale up generalized linear models by mapping data from one space into a new space using a nonlinear function. The dual trick or kernel trick is explained as a technique that enables working in high-dimensional spaces without paying additional costs, leading to the use of a kernel function that computes the dot product of pairs of points in the new space. Various methods for constructing kernels are discussed, including the polynomial and Gaussian kernels, which can be used to measure similarity between data points and are useful in classification tasks. Rules for composing kernels are also introduced to construct new kernels that can control their complexity. The lecture emphasizes the importance of choosing functions that have a correspondence with Phi transpose Phi, as the gram matrix must be positive semi-definite and have eigenvalues greater than or equal to zero.
In this lecture on kernel methods, the speaker defines kernels as positive semi-definite functions that can be decomposed into a matrix times its transpose. Various types of kernels, such as polynomial and Gaussian, and their applications are discussed for comparing different types of data such as strings, sets, and graphs. The speaker also explains how substring kernels can quickly compute similarity between words by increasing the length of substrings and using dynamic programming. Additionally, support vector machines are shown to be effective in performing document classification using news articles from Reuters.
CS480/680 Lecture 13: Support vector machines
CS480/680 Lecture 13: Support vector machines
This lecture is an introduction to support vector machines (SVMs) as a type of kernel method used for classification. SVMs are still popular for problems with low data and are considered sparse as they can work with a subset of the data and ignore the rest. The speaker explains the concept of support vectors, which are the closest data points to the decision boundary and the visual example of SVMs finding a linear separator to separate classes while maximizing the margin. The differences between SVMs and perceptrons are discussed, with SVMs employing a unique max margin linear separator and being less prone to overfitting. The optimization problem for SVMs can be rewritten using the Lagrangian, resulting in an equivalent problem without constraints. The solution obtained from the Lagrangian can be substituted back to obtain an expression that involves the kernel function, leading to a dual problem optimization. The benefits of working in the dual space with a kernel function that computes the similarity between pairs of data points are also explained. SVMs calculate the degree of similarity between a query point and all support vectors to determine the most similar ones, and the discussion also revolves around the number of support vectors and how it affects the classification of points.
This video discusses the concept of support vector machines (SVMs) in text categorization, where documents are represented as vectors of word counts. SVMs are effective in minimizing the worst-case loss, making the classifier suitable for any possible sample, even for different datasets. Researchers used SVMs with dual representation and kernel mapping to map data into an even higher dimensional space, without losing accuracy or sacrificing scalability. The lecture also covers the use of SVMs in retrieving relevant documents from a dataset and balancing precision and recall. The video concludes with a discussion on SVMs' ability to provide linear or nonlinear separators for data and the challenges associated with multi-class classification and non-linearly separable data.
CS480/680 Lecture 14: Support vector machines (continued)
CS480/680 Lecture 14: Support vector machines (continued)
This section of the lecture is focused on handling non-linearly separable data and overlapping classes when using support vector machines (SVMs) by introducing slack variables and considering a soft margin. The speaker explains how slack variables allow points within the margin to be classified without introducing a classification error. A penalty term is added to the optimization problem to regulate the use of slack variables, controlled by the weight C, which adjusts the trade-off between error minimization and model complexity. The speaker also discusses different approaches to using SVMs for multi-class classification problems, including one-against-all, pairwise comparison, and continuous ranking, with the latter being the de-facto approach for SVMs with multiple classes. Additionally, the concept of multi-class margin is introduced, which involves a buffer around the linear separator, defined by the difference of weight vectors for each pair of classes.