You are missing trading opportunities:
- Free trading apps
- Over 8,000 signals for copying
- Economic news for exploring financial markets
Registration
Log in
You agree to website policy and terms of use
If you do not have an account, please register
Lecture 9 -- Monte Carlo -- Philipp Hennig
Numerics of ML 9 -- Monte Carlo -- Philipp Hennig
In this video on the topic of Monte Carlo, Philipp Hennig explains how integration is a fundamental problem in machine learning when it comes to Bayesian inference using Bayes' Theorem. He introduces the Monte Carlo algorithm as a specific way of doing integration and provides a brief history of the method. He also discusses the properties of Monte Carlo algorithms, such as unbiased estimation and variance reduction with an increase in the number of samples. Additionally, Hennig delves into the Metropolis-Hastings algorithm, Markov Chain Monte Carlo, and Hamiltonian Monte Carlo, providing an overview of each algorithm's properties and how they work when sampling from a probability distribution. Ultimately, Hennig notes the importance of understanding why algorithms are used, rather than blindly applying them, to achieve optimal and efficient results.
In the second part of the video, Philipp Hennig discusses Monte Carlo methods for high-dimensional distributions, specifically the No U-turn Sampler (NUTS) algorithm which overcomes the problem with the U-turn idea of breaking the detailed balance. Hennig emphasizes that while these algorithms are complex and tricky to implement, understanding them is crucial for using them effectively. He also questions the knee-jerk approach to computing expected values using Monte Carlo methods and suggests there may be other ways to approximate without randomness. Hennig discusses the concept and limitations of randomness, the lack of convergence rates for Monte Carlo methods, and proposes the need to consider other methods for machine learning rather than relying on deterministic randomness.
Lecture 10 -- Bayesian Quadrature -- Philipp Hennig
Numerics of ML 10 -- Bayesian Quadrature -- Philipp Hennig
In this video, Philipp Hennig discusses Bayesian Quadrature as an efficient method for the computational problem of integration in machine learning. He explains how a real-valued function can be uniquely identified but difficult to answer questions directly. Bayesian Quadrature is an inference method that treats the problem of finding an integral as an inference problem by putting a prior over the unknown object and the quantities that can be computed, then performs Bayesian inference. Hennig also compares this approach to Monte Carlo rejection and importance sampling, showing how Bayesian Quadrature can outperform classical quadrature rules. The lecture covers the Kalman filter algorithm for Bayesian Quadrature and its connection to classic integration algorithms, with a discussion on using uncertainty estimates in numerical methods. Finally, Hennig explores how the social structure of numerical computation affects algorithm design, discusses a method for designing computational methods for specific problems, and how probabilistic machine learning can estimate the error in real-time.
In the second part of the video, Philipp Hennig discusses Bayesian quadrature, which involves putting prior distributions over the quantities we care about, such as integrals and algorithm values, to compute something in a Bayesian fashion. The method assigns both a posterior estimate and an uncertainty estimate around the estimates, which can be identified with classic methods. Hennig explains how the algorithm adapts to the observed function and uses an active learning procedure to determine where to evaluate next. This algorithm can work in higher dimensions and has some non-trivially smart convergence rates. He also discusses limitations of classic algorithms and quadrature rules and proposes a workaround through adaptive reasoning.
Lecture 11 --Optimization for Deep Learning -- Frank Schneider
Numerics of ML 11 --Optimization for Deep Learning -- Frank Schneider
Frank Schneider discusses the challenges of optimization for deep learning, emphasizing the complexity of training neural networks and the importance of selecting the right optimization methods and algorithms. He notes the overwhelming number of available methods and the difficulty in comparing and benchmarking different algorithms. Schneider provides real-world examples of successful training of large language models and the need for non-default learning rate schedules and mid-flight changes to get the model to train successfully. Schneider highlights the importance of providing users with more insight into how to use these methods and how hyperparameters affect the training process, as well as the creation of benchmarking exercises to help practitioners select the best method for their specific use case. He also discusses newer methods like Alpha and how it can be leveraged to steer the training process for a neural network.
In the second part of the video on the numerics of optimization for deep learning, Frank Schneider introduces the "Deep Debugger" tool Cockpit, which provides additional instruments to detect and fix issues in the training process, such as data bugs and model blocks. He explains the importance of normalizing data for optimal hyperparameters, the relationship between learning rates and test accuracy, and the challenges of training neural networks with stochasticity. Schneider encourages students to work towards improving the training of neural networks by considering the gradient as a distribution and developing better autonomous methods in the long run.
Lecture 12 -- Second-Order Optimization for Deep Learning -- Lukas Tatzel
Numerics of ML 12 -- Second-Order Optimization for Deep Learning -- Lukas Tatzel
In this video, Lukas Tatzel explains second-order optimization methods for deep learning and their potential benefits. He compares the trajectories and convergence rates of three optimization methods - SGD, Adam, and LBFGS - using the example of the Rosenberg function in 2D. Tatzel notes that the jumpy behavior of SGD makes slower convergence compared to the well-informed steps of LBFGS. He introduces the Newton step as a faster method for optimization and discusses its limitations, such as the dependence on the condition number. Tatzel also explains the concept of the Generalized Gauss-Newton matrix (GGN) as an approximation to the Hessian for dealing with ill-conditioned problems. Additionally, he discusses the trust region problem, how to deal with non-convex objective functions, and the Hessian-free approach that uses CG for minimizing quadratic functions.
This second part of the video explores second-order optimization techniques for deep learning, including BFGS and LBFGS, Hessian-free optimization, and KFC. The speaker explains that the Hessian-free approach linearizes the model using the Jacobian Vector product, while KFC is an approximate curvature based on official information metrics. However, stochasticity and biases can occur with these methods, and damping is recommended to address these issues. The speaker proposes the use of specialized algorithms that can use richer quantities like distributions to make updates and notes that the fundamental problem of stochasticity remains unsolved. Overall, second-order optimization methods offer a partial solution to the challenges of deep learning.
Lecture 13 -- Uncertainty in Deep Learning -- Agustinus Kristiadi
Numerics of ML 13 -- Uncertainty in Deep Learning -- Agustinus Kristiadi
The video discusses uncertainty in deep learning, particularly in the weights of neural networks, and the importance of incorporating uncertainty due to the problem of asymptotic overconfidence, where neural networks give high confidence predictions for out-of-distribution examples that should not be classified with certainty. The video provides insights on how to use second-order quantities, specifically curvature estimates, to get uncertainty into deep neural networks, using a Gaussian distribution to approximate the last layer's weights and the Hessian matrix to estimate the curvature of the neural network. The video also discusses the Bayesian formalism and LaPlace approximations for selecting models and parameters of neural networks.
In the second part of the lecture Agustinus Kristiadi discusses various ways to introduce uncertainty in deep learning models in this video. One technique involves using linearized Laplace approximations to turn a neural network into a Gaussian model. Another approach is out-of-distribution training, where uncertainty is added in regions that are not covered by the original training set. Kristiadi emphasizes the importance of adding uncertainty to prevent overconfidence in the model and suggests using probabilistic measures to avoid the cost of finding the ideal posterior. These techniques will be explored further in an upcoming course on probabilistic machine learning.
Lecture 14 -- Conclusion -- Philipp Hennig
Numerics of ML 14 -- Conclusion -- Philipp Hennig
Philipp Hennig gives a summary of the "Numerics of Machine Learning" course, emphasizing the importance of solving mathematical problems in machine learning related to numerical analysis, such as integration, optimization, differential equations, and linear algebra. He discusses the complexity of performing linear algebra on a data set and how it relates to the processing unit and disk. Hennig also covers topics such as handling data sets of non-trivial sizes, algorithms for solving linear systems, solving partial differential equations, and estimating integrals. He concludes by acknowledging the difficulty in training deep neural networks and the need for solutions to overcome the stochasticity problem.
In the conclusion of his lecture series, Philipp Hennig emphasizes the importance of going beyond just training machine learning models and knowing how much the model knows and what it doesn't know. He talks about estimating the curvature of the loss function to construct uncertainty estimates for deep neural networks and the importance of being probabilistic but not necessarily applying Bayes' theorem in every case due to computational complexity. Hennig also emphasizes the importance of numerical computation in machine learning and the need to develop new data-centric ways of computation. Finally, he invites feedback about the course and discusses the upcoming exam.
Support Vector Machine (SVM) in 7 minutes - Fun Machine Learning
Support Vector Machine (SVM) in 7 minutes - Fun Machine Learning
The video explains Support Vector Machines (SVM), a classification algorithm used for data sets with two classes that draws a decision boundary, or hyperplane, based on the extremes of the data set. It also discusses how SVM can be used for non-linearly separable data sets by transforming them into higher dimensional feature spaces using a kernel trick. The video identifies the advantages of SVM such as effectiveness in high-dimensional spaces, memory efficiency, and the ability to use different kernels for custom functions. However, the video also identifies the algorithm's disadvantages, such as poor performance when the number of features is greater than the number of samples and the lack of direct probability estimates, which require expensive cross-validation.
'The Deep Learning Revolution' - Geoffrey Hinton - RSE President's Lecture 2019
'The Deep Learning Revolution' - Geoffrey Hinton - RSE President's Lecture 2019
Geoffrey Hinton, known as the "Godfather of Deep Learning," discusses the history and evolution of deep learning and neural networks, the challenges and exciting possibilities of using deep learning to create machines that can learn in the same way as human brains, and the tricks and techniques that have made backpropagation more effective. He also describes the success of neural networks in speech recognition and computer vision, the evolution of neural networks for computer vision and unsupervised pre-training, and their effectiveness in language modeling and machine translation. He finishes by highlighting the value of reasoning by analogy and discusses his theory of "capsules" and wiring knowledge into a model that predicts parts from the whole.
Geoffrey Hinton, a pioneer in deep learning, delivers a lecture advocating for the integration of associative memories, fastweight memories and multiple timescales into neural networks to allow for long-term knowledge and temporary storage, which is necessary for real reasoning. Additionally, he discusses the balancing act between prior beliefs and data, the potential of unsupervised learning, the efficiency of convolutional nets in recognizing objects with the incorporation of viewpoint knowledge and translational equivariance, and the need to combine symbolic reasoning with connectionist networks, like transformer networks. He also addresses the issue of unconscious biases in machine learning and believes that they can be fixed more easily than human bias by identifying and correcting for biases. Lastly, he stresses the need for more funding and support for young researchers in the field of AI.
How ChatGPT actually works
How ChatGPT actually works
ChatGPT is a machine learning model that is able to correctly identify harmful content in chat conversations. Its architecture is based on human input, and its shortcomings are outlined. Recommended readings are also provided.
Machine Learning From Scratch Full course
Machine Learning From Scratch Full course
Implementing machine learning models yourself is one of the best ways to master them. Despite seeming like a challenging task, it's often easier than you might imagine for most algorithms. Over the next 10 days, we'll be using Python and occasionally Numpy for specific calculations to implement one machine learning algorithm each day.
You can find the code in our GitHub repository: https://github.com/AssemblyAI-Examples/Machine-Learning-From-Scratch