You are missing trading opportunities:
- Free trading apps
- Over 8,000 signals for copying
- Economic news for exploring financial markets
Registration
Log in
You agree to website policy and terms of use
If you do not have an account, please register
Lecture 24. Linear Programming and Two-Person Games
24. Linear Programming and Two-Person Games
This YouTube video covers the topic of linear programming and two-person games. Linear programming is the process of optimizing a linear cost function subject to a set of linear constraints, and it is used in fields such as economics and engineering. The video explains the algorithms used in linear programming, including simplex method and interior point methods, and the concept of duality, where the primal problem and its dual problem are closely connected and can be solved using the simplex method. The video also covers how linear programming can be applied to two-person games, including the process of finding an upper bound on the maximum flow in a network and solving a game with a matrix. Finally, the video briefly discusses the limitations of applying these techniques to three or more person games and mentions that the next lecture will cover stochastic gradient descent.
Lecture 25. Stochastic Gradient Descent
25. Stochastic Gradient Descent
In this video, the concept of stochastic gradient descent (SGD) is introduced as an optimization method for solving large-scale machine learning problems often posed in the form of a finite sum problem. The speaker explains how SGD selects random data points to compute the gradient to speed up the computation and how it behaves differently from batch gradient descent as it approaches the optimum due to the fluctuating nature of the method. The key property of SGD is that the stochastic gradient estimate is an unbiased version of the true gradient in expectation and the variance of the stochastic gradient must be controlled to reduce the noise. The use of mini-batches is discussed as a means of cheap parallelism in deep learning GPU training, but selecting the right mini-batch size is still an open question that can impact the robustness of the solution in the presence of unseen data. Challenges in optimizing SGD include determining the mini-batch size and computing stochastic gradients, but researchers are trying to understand the efficacy of SGD in neural networks through developing a theory of generalization.
Lecture 26. Structure of Neural Nets for Deep Learning
26. Structure of Neural Nets for Deep Learning
This video discusses the structure of neural networks for deep learning. The goal is to classify data in a binary way by constructing a neural network with feature vectors that have m features, creating a learning function that can classify data as one of two categories. Non-linearity is essential in creating these functions, as linear classifiers are unable to separate non-linear data. The video also discusses the importance of the number of weights and layers in the neural net, and provides resources such as the TensorFlow playground for users to practice creating functions. Finally, the video discusses the recursion used to prove the formula for the number of flat pieces obtained by cutting a cake and how it relates to the optimization problem of minimizing total loss in deep learning.
Lecture 27. Backpropagation: Find Partial Derivatives
27. Backpropagation: Find Partial Derivatives
This video covers several topics related to backpropagation and finding partial derivatives. The speaker demonstrates the use of the chain rule for partial derivatives and emphasizes the importance of the order of calculations in matrix multiplication. Backpropagation is highlighted as an efficient algorithm for computing gradients, and various examples are given to demonstrate its effectiveness. The convergence of stochastic gradient descent is briefly discussed, along with a project idea related to the use of a random order of loss function samples in stochastic gradient descent. Overall, the video provides a comprehensive overview of backpropagation and its applications.
Lecture 30: Completing a Rank-One Matrix, Circulants!
Lecture 30: Completing a Rank-One Matrix, Circulants!
In Lecture 30, the lecturer discusses completing a rank-one matrix and circulant matrices. They begin with a 2x2 determinant and use this to narrow down which values can be filled in a matrix to make it rank one. The lecturer then moves onto a combinatorial problem for a 4x4 matrix and introduces circulant matrices which feature cyclical patterns that can be created with only four given numbers. The lecture also covers cyclic convolution, eigenvalues, and eigenvectors of circulant matrices, which are important in signal processing.
Lecture 31. Eigenvectors of Circulant Matrices: Fourier Matrix
31. Eigenvectors of Circulant Matrices: Fourier Matrix
In this video on eigenvectors of circulant matrices, the speaker discusses how circulant matrices relate to image processing and machine learning, as well as its connection to the Fourier matrix. The speaker emphasizes the importance of understanding convolution and circulant matrices in relation to the discrete Fourier transform (DFT) and Fourier transforms. The speaker discusses the eigenvectors of circulant matrices, particularly the Fourier matrix, and how they are all constructed from the same set of eight numbers which are also the eigenvalues. The speaker also talks about the properties of the Fourier matrix, including how the columns are orthogonal but not orthonormal and how its eigenvectors add up to zero due to the symmetry of the circulant matrix, making them orthogonal to each other. Finally, the speaker demonstrates the concept of the Argan Vector as an eigenvector of the Fourier Matrix with examples.
Lecture 32: ImageNet is a Convolutional Neural Network (CNN), The Convolution Rule
Lecture 32: ImageNet is a Convolutional Neural Network (CNN), The Convolution Rule
In Lecture 32 of a deep learning course, the power of convolutional neural networks (CNNs) in image classification is discussed, with the example of the ImageNet competition won by a large deep CNN featuring convolution layers, normal layers, and max pooling layers. The lecture also focuses on the convolution rule, which connects multiplication and convolution, with examples of two-dimensional convolutions, the use of the Kronecker product for a two-dimensional Fourier transform and in signal processing, and the difference between periodic and non-periodic cases with regards to convolution. The lecturer also discusses eigenvectors and eigenvalues of a circulant matrix and the Kronecker sum operation.
Lecture 33. Neural Nets and the Learning Function
33. Neural Nets and the Learning Function
In this video, the speaker discusses the construction of the learning function f for neural nets, which is optimized by gradient descent or stochastic gradient descent and applied to training data to minimize the loss. He explains the use of a hand-drawn picture to illustrate the concept of neural nets and the learning function, as well as various loss functions used in machine learning, including cross-entropy loss. The speaker also talks about the problem of finding the positions of points given their distances, which is a classic problem with various applications, such as in determining the shapes of molecules using nuclear magnetic resonance. He concludes by discussing the construction of X, the final step in obtaining the structure of a neural network, and mentions a call for volunteers to discuss a project on Friday.
Lecture 34. Distance Matrices, Procrustes Problem
34. Distance Matrices, Procrustes Problem
The speaker discusses the Procrustes problem, which involves finding the best orthogonal transformation that takes one set of vectors as close as possible to another set of vectors. They explain different expressions to calculate the Frobenius norm of a distance matrix and its connection to the Procrustes problem. The speaker also introduces the concept of the trace of matrices and finds the correct Q in the Procrustes problem. Additionally, they address the question of whether deep learning actually works and present the solution to a matrix problem involving finding the best orthogonal matrix, which involves computing the SVD of the dot product of two matrices and using the orthogonal matrices from the SVD.
Lecture 35. Finding Clusters in Graphs
35. Finding Clusters in Graphs
This video discusses clustering in graphs and how to find clusters using different algorithms such as K-means and spectral clustering. The Laplacian matrix is used in spectral clustering and can provide information about clusters in the graph through its eigenvectors. The Fiedler eigenvector, which is the eigenvector for the smallest positive eigenvalue, is important for clustering. The speaker also emphasizes the importance of eigenvectors being orthogonal in identifying different clusters. Additionally, there is a brief preview of the next lecture, which will cover back propagation using Julia in linear algebra. Students are encouraged to submit their projects online or outside the instructor's office.