Machine Learning and Neural Networks - page 21

 

CONFERENCE JENSEN HUANG (NVIDIA) and ILYA SUTSKEVER (OPEN AI).AI TODAY AND VISION OF THE FUTURE



CONFERENCE JENSEN HUANG (NVIDIA) and ILYA SUTSKEVER (OPEN AI).AI TODAY AND VISION OF THE FUTURE

The CEO of NVIDIA, Jensen Huang, and the co-founder of OpenAI, Ilya Sutskever, discuss the origins and advancements of artificial intelligence (AI) in a conference. Sutskever explains how deep learning became clear to him, how unsupervised learning through compression led to the discovery of a neuron that corresponded to sentiment, and how pre-training a neural network led to instructing and refining with human and AI collaboration. They also discuss the advancements and limitations of GPT-4 and multi-modality learning, as well as the role of synthetic data generation and improving the reliability of AI systems. Despite being the same concept from 20 years ago, they both marvel at the progress made in AI research.

  • 00:00:00 In this section, Jensen Huang, the CEO of NVIDIA, praises Ilia Sutskever, the co-founder of OpenAI, for his achievements in the artificial intelligence field. He asks Ilia about his intuition around deep learning and how he knew it was going to work. Ilia explains that he was interested in artificial intelligence due to his curiosity about consciousness and its impact, and it seemed like progress in AI would truly help with that. He adds that in 2002-2003, computers could not learn anything and it wasn't even clear if it was possible in theory but finding Jeff Hinton, who worked in neural networks, gave Ilia hope.

  • 00:05:00 In this section, Sutskever discusses the origins of the AlexNet and how the idea of using supervised learning to build a deep and large neural network became clear to him. He explains that the machine learning field was not looking at neural networks at the time and were using other methods that were theoretically elegant but could not represent a good solution. Sutskever also mentions the breakthrough optimization method by another grad student that proved that large neural networks could be trained. It was then clear that if a large convolutional neural network was trained on the ImageNet data set, it must succeed. Sutskever also talks about the appearance of GPU in the lab and how Alex Krizhevsky was able to program fast convolutional kernels and train the neural network data set, which led to breaking the record of a computer vision by such a wide margin. The significance of this breakthrough was that the data set was so obviously hard and outside of the reach of classical techniques.

  • 00:10:00 In this section, Jensen Huang and Ilya Sutskever discuss the early days of OpenAI and their initial ideas about how to approach intelligence. Back in 2015-2016, the field was still in its infancy, with far fewer researchers and much less understood. OpenAI's first big idea was the concept of unsupervised learning through compression, which was an unsolved problem in machine learning at the time. Sutskever believed that really good compression of data would lead to unsupervised learning, allowing the extraction of all the hidden secrets that exist within it. This led to several works at OpenAI, including the sentiment neuron, which discovered a neuron inside an LSTM that corresponded to its sentiment.

  • 00:15:00 In this section, Ilya Sutskever discusses the concept of unsupervised learning and the importance of predicting the next token in a sequence as a worthwhile goal to learn a representation. He mentions that the hard part in supervised learning is not about where to get the data but rather why to bother training neural nets to predict the next token. Scaling for performance improvement was also an important factor in their work, and reinforcement learning was another crucial area of focus, particularly when training a reinforcement learning agent to play the real-time strategy game, DotA 2, in order to compete against the best players in the world.

  • 00:20:00 In this section, Ilya Sutskever explains the process of pre-training a large neural network to predict the next word in different texts from the internet, which leads to the learning of a compressed abstract usable representation of the world. However, pre-training doesn't specify the desired behavior that we expect from the neural network, and this is where the second stage of fine-tuning and reinforcement learning from human and AI collaboration comes in. The second stage is essential because it's where we communicate with the neural network and instruct it on what to do and not do.

  • 00:25:00 In this section, the speakers discuss the advancements in AI technology such as GPT-4, which has become the fastest growing application in the history of humanity just a few months after its launch. GPT-4 is an improvement upon Chat GPT, with better accuracy in predicting the next word in text, leading to an increase in the understanding of the text. By constantly researching and innovating the fidelity, AI has become more reliable and precise in following intended instructions. Additionally, the conversation can refine ambiguity until the AI understands the user's intent. Furthermore, the improved performance of GPT-4 in many areas such as SAT scores, GRE scores, and bar exams, among others, is remarkable and noteworthy.

  • 00:30:00 In this section, the speakers discuss the current limitations and potential for improvement in the reasoning capabilities of neural networks, specifically GPT4. While neural networks demonstrate some reasoning skills, reliability remains a major obstacle in their usefulness. The speakers suggest that asking the neural network to think out loud and introducing ambitious research plans could improve reliability and accuracy. Currently, GPT4 does not have a built-in retrieval capability, but it excels at being a next word predictor and can consume images.

  • 00:35:00 In this section, Jensen Huang and Ilya Sutskever discuss multi-modality learning and its importance. They explain that multi-modality learning, which involves learning from both text and images, is useful for neural networks to better understand the world, as humans are visual animals. Multi-modality learning also enables neural networks to learn more about the world by providing additional sources of information. They argue that while it is important to see to understand things like color, text-only neural networks can still learn information that is hard to learn from text alone through exposure to trillions of words.

  • 00:40:00 In this section, Sutskever and Huang discuss the importance of different data sources in AI learning, including visuals and audio. They touch on the idea of multi-modality and how combining different data sources can be extremely helpful in learning about the world and communicating visually. Sutskever also mentions a paper that suggests the world will eventually run out of tokens to train on and how AI generating its own data could be a possible solution to that problem.

  • 00:45:00 In this section, the speakers discuss the role of synthetic data generation in AI training and self-teaching. While the availability of existing data is not to be underestimated, the possibility of AI generating its own data for learning and problem-solving is a future possibility. The focus in the near future will be on improving the reliability of AI systems, so that they can be trusted for important decision-making. The potential of AI models, such as GPT-4, to reliably solve math problems and produce creative content is exciting, but there is still work to be done to improve their accuracy and clarity in understanding and responding to user intent.

  • 00:50:00 In this section, Jensen Huang and Ilya Sutskever discuss the surprising success of neural networks in AI today. Despite being the same neural network concept from 20 years ago, it has become more serious and intense as it is trained on larger data sets in different ways with the same fundamental training algorithm. Sutskever's seminal works on Alexnet and GPT at Open AI are remarkable achievements, and Huang admires his ability to break down the problem and describe the state of the art of large language models. The two catch up and marvel at the progress made in the field of AI.
CONFERENCE JENSEN HUANG (NVIDIA) and ILYA SUTSKEVER (OPEN AI).AI TODAY AND VISION OF THE FUTURE
CONFERENCE JENSEN HUANG (NVIDIA) and ILYA SUTSKEVER (OPEN AI).AI TODAY AND VISION OF THE FUTURE
  • 2023.03.23
  • www.youtube.com
#chatgpt,#ai#chatbot,#openai,#nvidia,#artificialintelligence,@ilyasutskever
 

It’s Time to Pay Attention to A.I. (ChatGPT and Beyond)



It’s Time to Pay Attention to A.I. (ChatGPT and Beyond)

The video discusses the development of artificial intelligence (AI), and how it is changing the way we work and live. Some people are excited about the potential of AI, while others are worried about its potential implications. The speaker also provides a brief summary of a recent podcast episode.

  • 00:00:00 ChatGPT is an AI program that was released in 2022 that is capable of generating text that tries to predict what the next word in a sentence will be based on what it has seen in its massive internet data set. ChatGPT is an improved version of gpt3, which open AI is calling GPT 3.5. The main difference between GPT 3.5 and GPT is that they've added human feedback during the training process, which is called supervised reinforcement learning. In essence, during the training multiple versions of responses by the AI were ranked by quality from best to worst by the humans and the AI is digitally rewarded when it improves the model. ChatGPT is being used by budding entrepreneurs to wonder what the next big thing is, as open AI CEO has some interesting insights into the future of all industry.

  • 00:05:00 ChatGPT is a startup that is designed to make it easier for customers to file complaints, cancel subscriptions, and more. Additionally, ChatGPT can form opinions on very specific topics, something that no search engine can do. ChatGPT is also said to be good at coding, something that is not commonly thought of as a skill that can be improved with AI. While ChatGPT has many useful applications, it is still in its early stages and has a long way to go before it can be considered a truly revolutionary technology. Nevertheless, the potential implications of ChatGPT are worth considering, and it is likely that it will only become more important in the future.

  • 00:10:00 ChatGPT is a chatbot that is capable of "speaking" in a human-like way and has been used to question ethical boundaries set by open AI. It is noted that ChatGPT can be unpredictable and unstable, making it difficult to control. It is also noted that ChatGPT has the potential to reduce the number of workers needed in multiple fields.

  • 00:15:00 The author discusses the potential impacts of automation on the workforce, and how to prepare. He also discusses how AI is rapidly progressing, with some near-future predictions that should be kept in mind by entrepreneurs.

  • 00:20:00 ChatGPT is a new technological platform which will be used to create models of the future, such as models of medicine or computers. There will be a new set of startups which will use the platform to tune existing large models to create models specific to an industry or use case.

  • 00:25:00 The video discusses the development of artificial intelligence (AI), and how it is changing the way we work and live. Some people are excited about the potential of AI, while others are worried about its potential implications. The speaker also provides a brief summary of a recent podcast episode.
It’s Time to Pay Attention to A.I. (ChatGPT and Beyond)
It’s Time to Pay Attention to A.I. (ChatGPT and Beyond)
  • 2022.12.15
  • www.youtube.com
Imagine being able to have a language conversation about anything with a computer. This is now possible and available to many people for the first time with ...
 

The Inside Story of ChatGPT’s Astonishing Potential | Greg Brockman | TED



The Inside Story of ChatGPT’s Astonishing Potential | Greg Brockman | TED

In this section of the video, Greg Brockman discusses the role of AI in improving education. He argues that traditional education methods are often inefficient and ineffective, with students struggling to retain knowledge and teachers struggling to teach in a way that engages every student. Brockman suggests that AI could help to solve these problems by providing personalized learning experiences for each student. With AI tools, it is possible to monitor student progress in real-time, adjusting the curriculum to their needs and preferences. This could lead to more engaging and efficient learning experiences, allowing students to retain more knowledge and teachers to focus on more important tasks. Brockman also emphasizes the importance of designing AI tools with privacy in mind, ensuring that student data is protected and used only for educational purposes.

  • 00:00:00 In this section, Greg Brockman, the CEO of OpenAI, demonstrated the capabilities of an AI tool called Dolly that builds tools for AIs. By using this tool with ChatGPT, users can generate images and text to achieve their intent with a unified language interface, allowing them to take away small details and checks it by incorporating it with other applications. This new way of thinking about a user interface will expand the capabilities of what AI can do on the user's behalf and take the technology to newer heights.

  • 00:05:00 In this section, Greg Brockman explains how the AI is trained to use the tools and produce the desired outcome through feedback. The process has two steps - first, an unsupervised learning process is used where the AI is shown the whole world and asked to predict what comes next in text it has never seen before. The second step involves human feedback where the AI is taught what to do with those skills by trying out multiple things, and human feedback is provided to reinforce the whole process used to produce the answer. This feedback allows it to generalize and apply the learning to new situations. The AI is also used to fact check and can issue search queries and write out its whole chain of thought, making it more efficient to verify any piece of the chain of reasoning.

  • 00:10:00 In this section of the video, Greg Brockman discusses the potential for collaboration between humans and AI in solving complex problems. He shows an example of a fact-checking tool that requires human input to produce useful data for another AI, demonstrating how humans can provide management, oversight, and feedback while machines operate in a trustworthy and inspectable manner. Brockman believes this will lead to solving previously impossible problems, including rethinking how we interact with computers. He demonstrates how ChatGPT, a powerful AI language model, can be used to analyze a spreadsheet of 167,000 AI papers and provide insights through exploratory graphs, showing the potential for AI to assist with data analysis and decision-making.

  • 00:15:00 In this section, Greg Brockman discusses the potential of AI, stating that getting it right will require the participation of everyone in setting the rules and guidelines for its integration into our daily lives. He believes that achieving the OpenAI mission of ensuring that artificial general intelligence benefits all of humanity is possible through literacy and the willingness to rethink the way we do things. Brockman acknowledges that while the technology is amazing, it is also scary, as it requires rethinking everything we currently do. The success of OpenAI's chatGPT model is due in part to their deliberate choices, confronting reality, and encouraging collaboration among diverse teams. Brockman also attributes the emergence of new possibilities to the growth of language models and the principle of emergence, where many simple components can lead to complex emergent behaviors.

  • 00:20:00 In this section of the video, Greg Brockman discusses the astonishing potential of ChatGPT's ability to learn and predict, even in areas that were not explicitly taught to the machine. However, he notes that while the machine can handle adding 40 digit numbers, it will often get an addition problem wrong when presented with a 40 digit number and a 35 digit number. Brockman also emphasizes the importance of engineering quality with machine learning, rebuilding the entire stack to ensure every piece is properly engineered before doing predictions. He acknowledges that scaling up such technology could lead to unpredictable outcomes, but believes in deploying incremental change to properly supervise and align the machine's intent with ours. Ultimately, Brockman believes that with proper feedback and integration with humans, the journey to truth and wisdom with AI is possible.

  • 00:25:00 In this section, Greg Brockman addresses concerns about the responsibility and safety implications of releasing artificial intelligence (AI) like GPT without proper guardrails. He explains that the default plan of building in secret and then hoping safety is properly executed is terrifying and doesn't feel right. Instead, he argues that the alternative approach is to release the AI and allow people to give input before they become too powerful. Brockman shares a story of contemplating whether he would want the technology to be 5 years or 500 years away, concluding that it's better to approach this right with collective responsibility and provide guardrails for the AI to be wise rather than reckless.
The Inside Story of ChatGPT’s Astonishing Potential | Greg Brockman | TED
The Inside Story of ChatGPT’s Astonishing Potential | Greg Brockman | TED
  • 2023.04.20
  • www.youtube.com
In a talk from the cutting edge of technology, OpenAI cofounder Greg Brockman explores the underlying design principles of ChatGPT and demos some mind-blowin...
 

MIT Deep Learning in Life Sciences - Spring 2021



MIT Deep Learning in Life Sciences - Spring 2021

The "Deep Learning in Life Sciences" course applies machine learning to various life sciences tasks and is taught by a researcher in machine learning and genomics with a teaching staff of PhD students and undergraduates from MIT. The course covers machine learning foundations, gene regulatory circuitry, variation in disease, protein interactions and folding, and imaging using TensorFlow through Python in a Google Cloud platform. The course will consist of four problem sets, a quiz, and a team project, with mentoring sessions interspersed to aid students in designing their own projects. The instructor emphasizes the importance of building a team with complementary skills and interests and provides various milestones and deliverables throughout the term. The course aims to provide real-world experience, including grant and fellowship proposal writing, peer review, yearly reports, and developing communication and collaboration skills. The speaker discusses the differences between traditional AI and deep learning, which builds an internal representation of a scene based on observable stimuli, and emphasizes the importance of deep learning in the life sciences due to the convergence of training data, compute power, and new algorithms.

The video is an introductory lecture on deep learning in life sciences, explaining the importance of machine learning and deep learning in the exploration of the complexity of the world. The talk focuses on the concept of Bayesian inference and how it plays a crucial role in classical and deep machine learning along with the differences between generative and discriminative approaches to learning. The lecture also highlights the power of support vector machines, classification performance, and linear algebra for understanding networks across biological systems. The speaker notes that the course will cover various topics in deep learning, including regularization, avoiding overfitting, and training sets. The lecture concludes by addressing questions related to the interpretability of artificial neurons and deep networks for future lectures.

  • 00:00:00 In this section, the speaker introduces the course, Deep Learning in Life Sciences, and explains its focus on applying machine learning to tasks in the life sciences, including gene regulation, disease, therapeutic design, medical imaging, and computational biology. The course meets twice a week with optional mentoring sessions on Fridays and is taught by the speaker, who is a researcher in machine learning and genomics, and a teaching staff consisting of PhD students and undergraduates from MIT. The speaker also provides links to last year's coursework pages with recordings of all the lectures.

  • 00:05:00 In this section of the transcript, the instructor introduces the foundations that the course will build upon such as calculus, linear algebra, probability and statistics, and programming. The course will also have an introductory biology foundation that students will be able to build on. The instructor then details the grading breakdown for the course, which includes problem sets, a quiz, a final project, and participation. The section concludes with an explanation of why deep learning is important in the life sciences due to the convergence of inexpensive large data sets, foundational advances in machine learning methods, and high performance computing, which has completely transformed the scientific field.

  • 00:10:00 In this section, the speaker discusses the importance and benefits of computational biology. Students provide answers to the question of why computational biology is important, including the handling of large amounts of data, the ability to speed up discovery, creating mathematical models for complex processes, understanding patterns from biological data, and the use of visualization to extract meaningful patterns. The speaker emphasizes the existence of underlying patterns and principles in biology that can be understood through computation and encourages students to explore the different courses offered in the department and across departments.

  • 00:15:00 In this section, the speaker discusses how computational methods can help not only in applied research but also in generating new foundational understanding in basic biological research. They emphasize that while the computational methods used may not always give perfect results, they can provide important approximations that may be even more interesting. Additionally, the speaker shows how computational biology allows for the integration of various research areas into a more comprehensive understanding of complex diseases that affect multiple organs. Finally, they mention the use of computational tools to simulate long-term temporal processes like disease transmission and disease progression.

  • 00:20:00 In this section of the video, the speaker discusses the role of computation in the life sciences, specifically how it can simulate the progression of processes over time, shortening the discovery and development time for drugs and treatments. The use of deep learning is also becoming more prevalent for designing drugs and creating synthetic test data. The speaker also highlights the importance of studying genetic diversity across demographics for true equity in genetic data sets. Life itself is digital and the challenge in understanding biology is to extract signals from noise and recognize meaningful patterns in data sets.

  • 00:25:00 In this section, the course instructor outlines the main tasks and challenges that will be covered in the course, including machine learning foundations, gene regulatory circuitry, variation in disease, protein interactions and folding, and imaging. The course will utilize problem sets to introduce students to each of these frontiers, and students will be using TensorFlow through Python in a programming environment within the Google Cloud platform. The first problem set will focus on character recognition, followed by using these techniques to analyze genomic data and recognize sequence patterns associated with gene regulatory events.

  • 00:30:00 In this section, the instructor discusses the structure and goals of the course, which will consist of four problem sets, a quiz, and a team project throughout the duration of the class. The instructors emphasize that the course will be interactive and encourage students to sign up to be scribes for lectures of their interest, allowing them to invest in that particular field. Students will also have the opportunity to interact with guest lecturers who are active in the field of deep learning in life sciences, and team projects will be built on discussions for research project directions, giving students the opportunity to apply their new skills to solve practical problems. Furthermore, the instructors mention how the field of deep learning in life sciences is only ten years old, and guest lecturers will introduce key papers in the field, making the course quite exciting and interactive for students.

  • 00:35:00 In this section, the course instructor discusses how the course will have mentoring sessions interspersed with the modules to aid students in designing their own projects, coming up with ideas and balancing them with their partners and mentors. These mentoring sessions will feature staff members or researchers who are active in the relevant areas, allowing students to bounce ideas off them and prepare to become active researchers in computational biology. The instructor also emphasizes the intangible aspects of the education that the course will help with, including crafting a research proposal, working in complementary skill sets, receiving peer feedback, and identifying potential flaws in peers' proposals. The course will have a term project mirroring these intangible tasks in real life. Students are also encouraged to meet their peers, form teams early with complementary expertise, and submit a profile and a video introduction.

  • 00:40:00 In this section, the instructor discusses the various milestones established for the course to ensure sufficient planning, feedback, and finding of projects that match students' skills and interests. He mentions the importance of building a team with complementary skills and interests, providing links to last year's projects and recent papers for inspiration, and establishing periodic mentoring sessions with senior students, postdocs, and course staff. The course will also include group discussions on various topics and aspects of peer review to encourage critical thinking about proposals and provide feedback and suggestions. The instructor emphasizes the real-world experience that will be gained through this course, including grant and fellowship proposal writing, peer review, yearly reports, and developing communication and collaboration skills. The instructor invites students to meet with each other during various breakout sessions throughout the course and provides an overview of the milestones and deliverables that will be due throughout the term.

  • 00:45:00 structure of the course and the projects, the instructor provides an overview of the different modules and papers available for each topic. In addition, the timeline for the course is outlined, including the due date for project proposals and end-to-end pipeline demos. The instructor emphasizes the importance of having data and tools early on in the course to avoid issues later on. Mid-course reports and a lecture on presenting are also mentioned, as well as the due date for final projects and presentations. Guest lecturers who have authored some of the papers may also be invited.

  • 00:50:00 In this section, the speaker introduces the resources and support available for the course, including mentoring and feedback labs. They also share the results of an introductory survey revealing the diverse backgrounds of the students taking the course, with a majority from majors six and 20. The speaker spends around 10 minutes introducing some of the machine learning topics and biology that will be covered in the course, highlighting the importance of deep learning and its various applications. They also explain the difference between artificial intelligence, deep learning, and machine learning.

  • 00:55:00 In this section, the lecturer discusses the differences between traditional artificial intelligence (AI) approaches and deep learning. While traditional AI relies on human experts to code rules and scoring functions, deep learning aims to learn intuition and rules on its own, without explicit human guidance. The lecturer uses the example of chess to illustrate these differences and notes that deep learning has revolutionized AI by enabling machines to navigate complex environments such as natural scenes and real-world situations. The lecturer identifies convergence of training data, compute power, and new algorithms as the three key pillars of deep learning, and explains that machines build an internal representation of a scene based on observable stimuli.

  • 01:00:00 In this section, the speaker explains that machine learning and deep learning involve building representations of the complexity of the world by analyzing observations and data. Traditional machine learning uses simple representations, while deep learning uses hierarchical representations. Generative models allow one to express the forward probability of an event given the hidden state of the world, while Bayes rule allows one to estimate the posterior probability that it's a particular season given the observation. This involves going from the probability of data given a hypothesis to the probability of a hypothesis given the data through a product of likelihood and prior probabilities. The marginal probability of data is used to sum over all hypotheses to get the overall probability of the data.

  • 01:05:00 In this section, the speaker explains the concept of Bayesian inference and its role in classical and deep machine learning. Bayesian inference involves having a generative model for the world and then inferring something about that model, which is especially helpful in supervised learning where labels for some points exist, and classification of objects based on features can be achieved. In traditional machine learning, a major task was feature engineering, or selecting the right features from a dataset, whereas in deep learning, features are learned automatically. Clustering is a form of unsupervised learning where data sets can be learned and represented, and Bayesian inference can be used to iteratively estimate the parameters of a generative model for the data set to improve the features of data.

  • 01:10:00 In this section of the video, the instructor discusses the differences between generative and discriminative approaches to learning, highlighting how discriminative learning is focused on learning the best separator between data elements instead of trying to capture the whole distribution of data. The lecture also touches on the power of support vector machines, classification performance, and linear algebra for understanding networks across biological systems. The instructor notes that the class will focus on deep learning, specifically the building of simple and more abstract features through layers to classify various objects and concepts about the world. Finally, the lecture emphasizes that not all learning is deep and reviews the historical approaches to artificial intelligence and machine learning.

  • 01:15:00 In this section, the speaker discusses how the human brain processes images and recognizes objects, using layers of neurons that learn abstract layers of inferences. He compares this process to the architecture of neural networks used in deep learning and AI, which have been ported from the biological space to the computational space. The course will cover various topics in deep learning, including regularization, avoiding overfitting, training sets, and testing sets. The speaker also mentions autoencoders for clamping down representations to simpler ones and supervised algorithms functioning as unsupervised methods. Additionally, he welcomes attendees to the course and highlights the importance of the biological aspects of the course.

  • 01:20:00 In this section, the speaker addresses several questions related to the interpretability of artificial neurons and deep networks, which will be covered in detail in a future lecture. They also remind the students to fill out their profiles and upload their video introductions.
 

Machine Learning Foundations - Lecture 02 (Spring 2021)



Machine Learning Foundations - Deep Learning in Life Sciences Lecture 02 (Spring 2021)

This lecture covers the foundations of machine learning, introducing concepts such as the training and test sets, types of models such as discriminative and generative, evaluating loss functions, regularization and overfitting, and neural networks. The lecturer goes on to explain the importance of hyperparameters, evaluating accuracy in life sciences, correlation testing, and probability calculations for model testing. Finally, the basics of deep neural networks and the structure of a neuron are discussed, highlighting the role of non-linearity in learning complex functions.

In the second section of the lecture, the concept of activation functions in deep learning is explained, as well as the learning process of adjusting weights to match the output function using partial derivatives in tuning weight updates to minimize errors, which is the foundation of gradient-based learning. The concept of backpropagation is introduced as a method for propagating derivatives through a neural network in order to adjust weights. The various methods for optimizing weights in multiple layers of deep learning models are discussed, including stochastic gradient descent and the concept of model capacity and the VC dimension. The effectiveness of a model's capacity on a graph and the bias and variance are also discussed, along with various regularization techniques such as early stopping and weight decay. The importance of finding the right balance of complexity is emphasized, and students are encouraged to introduce themselves to their classmates positively.

  • 00:00:00 In this section, the lecturer introduces the foundations of machine learning and its definition. Machine learning is the process of converting experience into expertise or knowledge, and it uses computational methods to accurately predict future outcomes using the uncovered patterns in data. The goal of machine learning is to develop methods that can automatically detect patterns in data and use them to make good predictions of the output. The lecturer also explains the concept of the training set, which is used to fit the model parameters and architecture, and the test set, which evaluates the performance and generalization power of the model. Finally, the lecturer touches on the importance of regularization in controlling the parameters and model complexity to avoid overfitting.

  • 00:05:00 In this section of the lecture, the instructor introduces the different types of objects used in machine learning, such as scalars, vectors, matrices, and tensors. The input space is defined as individual examples of these objects, where a particular data set is used with specific indices and features. The label space is also introduced, with the predicted label denoted as y hat. The goal of machine learning is to evaluate features extracted from input data and compute an output result using a function that translates the input into the output. The instructor also explains the difference between training and test sets and how the function takes in input parameters and computes an output using weight vectors and biases.

  • 00:10:00 In this section, the speaker explains how weights and biases are used to optimize the output of a linear function when there's no axis. The transformation function can be seen as the model of the world, which makes inferences and classifications about the world. There are two types of models – discriminative models that differentiate between two classes, and generative models that try to model the joint distribution of multiple classes. Linear regression is just one type of machine learning, with regression being a common task besides classification.

  • 00:15:00 In this section, the lecturer discusses the different types of machine learning, including supervised, semi-supervised, unsupervised, and reinforcement learning. The focus is on supervised learning and the various types of outputs, such as multivariate regression, binary and multi-class classification, and multi-label classification. The lecturer also talks about objective functions, which are used to optimize machine learning models during training, and can be in the form of loss, cost, or error functions. Different types of loss functions are presented, including zero one loss, cross entropy loss, and hinge loss, and the lecture concludes with a discussion of mean squared error and mean absolute error for regression.

  • 00:20:00 In this section, the lecturer introduces the concepts of L1 and L2 regularization, which are linear and quadratic penalties for deviating from a predicted value, respectively. They discuss how these can be used for penalizing far outliers and avoiding overfitting by assigning constraints on the parameters. The lecturer then explores different loss functions for classification tasks such as binary cross-entropy loss, which weighs everything by the probability of a value's occurrence, and categorical cross-entropy loss, which uses an information-based approach. Additionally, they touch on the soft max function for mapping data to a zero to one range. These concepts all factor into the maximum likelihood estimator and posterior probabilities in a Bayesian setting.

  • 00:25:00 In this section, the lecture explains the output of using a particular formula throughout the class, which is one if it belongs to a specified class and zero otherwise. The lecture also discusses the structure of the problem, including input data, weights, and a bias term. The optimizer is built based on discrepancies between the weights, and these weights are trained using a loss function, such as mean squared error or mean absolute error. The lecture also introduces the idea of risk, which accounts for the cost associated with particular predictions, and explains how to use risk to optimize the objective function. The lecture then describes how to update weights based on the loss function and how to use training and testing sets to evaluate the model.

  • 00:30:00 In this section, the instructor explains the concept of overfitting and underfitting in machine learning. He describes how, as the training set improves, the model becomes better at predicting data in the validation set as well. However, after a certain point, the model starts to overfit the training set, and the error on the validation set begins to increase. Therefore, the instructor emphasizes the importance of splitting the data into training, validation, and test sets, such that the validation set is used to tune the hyperparameters and the test set to evaluate the fully trained model's performance.

  • 00:35:00 In this section, the speaker discusses how to evaluate the accuracy of machine learning models in the context of life sciences. They explain different methods of evaluation such as true positive power, which focuses on true positives and true negatives, and false positives and false negatives. The speaker also discusses other evaluation techniques such as precision, specificity, recall, and accuracy, and the importance of considering the balance of the dataset. They then introduce the receiver operating characteristic (ROC) curve and how it helps evaluate the sensitivity and specificity trade-off of a classifier. Additionally, the precision recall curve is mentioned as a better option for very unbalanced datasets for certain regression settings. Both curves are complementary and capture different aspects of a model's performance.

  • 00:40:00 In this section, the speaker discusses the concept of correlation and how it can be used to evaluate regression predictors. They explain that correlation measures the relationship between the values being predicted and the actual values, and that there are different types of correlation tests, such as Pearson correlation and Spearman rank correlation. The speaker also mentions the significance of correlation tests and how they can be used to evaluate the accuracy of the predictor. They explain the use of statistical tests such as the Student's t-distribution and binomial tests to determine the probability of getting a certain correlation value and whether it deviates significantly from the expected value.

  • 00:45:00 In this section, the speaker discusses the probability of the classifier making the correct choice at random by calculating the probability of k observations being classified correctly just by chance using the hypergeometric distribution. He also emphasizes that if you are testing multiple hypotheses, you need to adjust the probability of the null and can use a strict Bonferroni correction or a less stringent Benjamin Hofberg correction to adjust your threshold. The speaker warns of the dangers of finding correlations almost anywhere with enough data and underscores that lack of correlation does not imply a lack of relationships. The section ends with a stretch break before the speaker moves on to discussing neural networks.

  • 00:50:00 In this section of the lecture, the instructor introduces the concept of deep neural networks and their roots in the hierarchy of abstraction in learning. The instructor describes the layers of the network, starting with the input layer and progressing through several hidden layers that learn increasingly complex features. The concept of convolutional filters is briefly mentioned but will be covered more in-depth in a later lecture. The instructor also notes that these networks are inspired by the biological structure of neurons in the human brain.

  • 00:55:00 In this section, the lecturer explains the basics of a deep learning neural network. He describes the structure of a neuron as a computational construct that receives weighted inputs, crosses a threshold, and then sends identical outputs to its descendants. The learning in a neural network is embedded in these weights, and the function being computed is a transformed probability based on the inputs received. The lecturer emphasizes that neural networks became powerful when they moved beyond linear functions and introduced a non-linearity that can learn almost any function. The original non-linearity was the sigmoid unit, representing a neuron either firing at one or remaining at zero until a threshold is crossed. Beyond that, the soft plus unit was introduced to approximate more complex functions.

  • 01:00:00 In this section of the lecture, the speaker explains the concept of activation functions in deep learning and how they help neurons fire in response to inputs. He introduces various activation functions such as the soft plus, the sigmoid, and the rectified linear unit (ReLU), among others. The speaker also discusses the learning process of adjusting the weights to match the output function, and the role of partial derivatives in tuning weight updates to minimize errors. This, he explains, is the foundation of gradient-based learning.

  • 01:05:00 In this section of the lecture, the concept of backpropagation is introduced as a method for propagating derivatives through a neural network in order to adjust weights. The chain rule is used to compute the derivative of each layer as a function of the previous layer, allowing for adjustments to be made at each level. Additional bells and whistles can be added to this process, such as a learning rate to scale the gradient, weight decay to prevent large weights, and consideration of the delta at the previous time step to determine the direction and amount of change needed.

  • 01:10:00 In this section, the speaker explains the different methods for optimizing weights in multiple layers of deep learning models. These methods include using the chain rule to compute the derivatives of the output with respect to each weight, as well as stochastic gradient descent which randomly samples a subset of training data to update the weights. Additionally, the speaker discusses the concept of model capacity and the VC dimension which describes the overall modeling ability of a deep learning model based on both its parameters and the types of functions it can compute. The capacity of a non-parametric model is defined by the size of the training set.

  • 01:15:00 In this section, the concept of k-nearest neighbor and its generalizability are introduced. While k-nearest neighbor is a good baseline method, it may have poor generalization power because it does not learn the function that separates data sets, making it difficult to perform well on previously unseen inputs. The effectiveness of a model's capacity on a graph is also discussed, where the x-axis displays the effective number of parameters or dimensions, and increasing this number may lead to better matches to the data, but with a higher generalization error. The bias or how well one matches given data, and variance or how well one can match future data sets, are also introduced. Finally, models can be regularized by trading off parameter regularization and model complexity regularization, which can be demonstrated by comparing data sets with different levels of neuron complexity.

  • 01:20:00 In this section of the lecture, the instructor discusses various techniques to add regularization to neural networks, such as early stopping, weight decay, adding noise as a regularizer, and Bayesian priors. The concept of capacity is also discussed, which depends on the activation functions and the number of weights. The instructor emphasizes that the trade-off between more layers, wider layers, and more connections is an art rather than a theory, and it is essential to strike the right balance of complexity. The instructor encourages students to introduce themselves positively to their classmates and take the time to meet and learn about their profiles and videos.
Machine Learning Foundations - Deep Learning in Life Sciences Lecture 02 (Spring 2021)
Machine Learning Foundations - Deep Learning in Life Sciences Lecture 02 (Spring 2021)
  • 2021.02.23
  • www.youtube.com
6.874/6.802/20.390/20.490/HST.506 Spring 2021 Prof. Manolis KellisDeep Learning in the Life Sciences / Computational Systems BiologyPlaylist: https://youtube...
 

CNNs Convolutional Neural Networks - Lecture 03 (Spring 2021)



CNNs Convolutional Neural Networks - Deep Learning in Life Sciences - Lecture 03 (Spring 2021)

This video lecture covers the topic of convolutional neural networks (CNNs) in deep learning for life sciences. The speaker discusses the principles of the visual cortex and how they relate to CNNs, including the building blocks of the human and animal visual systems, such as the basic building blocks of summing and weighing and the bias activation threshold of a neuron. They explain that CNNs use specialized neurons for low-level detection operations and layers of hidden units for abstract concept learning. The lecture also covers the role of convolution and pooling layers, the use of multiple filters for extracting multiple features, and the concept of transfer learning. Finally, non-linearities and the use of padding to address edge cases in convolution are also discussed. Overall, the lecture highlights the power and potential of CNNs in a variety of life science applications.

The second part of the lecture covers various concepts related to convolutional neural networks (CNNs). In the lecture, the speaker talks about the importance of maintaining input size in CNNs, data augmentation as a means of achieving invariance to transformations, and different CNN architectures and their applications. The lecture also covers challenges associated with learning in deep CNNs, hyperparameters and their impact on overall performance, and approaches to hyperparameter tuning. The speaker emphasizes the importance of understanding the fundamental principles behind CNNs and highlights their versatility as a technique applicable in multiple settings.

  • 00:00:00 In this section, the speaker introduces the topic of convolutional neural networks (CNNs) and highlights their significance in deep learning across various domains. The speaker credits the 6s191 course and Tess Fernandez's Coursera notes as great resources for studying CNNs. The speaker explains how CNNs were inspired by the human brain's own neural networks and the findings of neuroscience studies on animal visual cortex in the 50s and 60s. The speaker goes on to explain some of the key principles that foundational studies of neuroscience discovered, including the concept of having only limited receptive fields and cells responding to edges at the right angles. These concepts form the basis of convolutional filters and the CNNs used today.

  • 00:05:00 In this section, the speaker discusses the principles of the visual cortex and how they relate to convolutional neural networks (CNNs). The visual cortex contains simple primitive operations like edge detection, which are constructed from individual neurons detecting light and dark in different places and thresholding that signal. There are higher order neurons that are invariant to the position of the detected edge or object, which led to the concept of positional invariance in the pooling layers of CNNs. The speaker also discusses the building blocks of the human and animal visual systems, which contain similar principles found in neural networks, such as the basic building blocks of summing and weighing and the bias activation threshold of a neuron.

  • 00:10:00 In this section of the lecture, the speaker discusses activation functions in neurons, which determine whether a neuron fires or not based on input above a certain threshold. The non-linearity of this process allows for more complex functions to be learned, since linear transformations of linear information are still linear transformations. Neurons are connected into networks which have emergent properties and allow for learning and memory. The human brain is extremely powerful, containing 86 billion neurons and quadrillions of connections that are organized into simple, large, and deep networks that allow for abstraction and recognition of complex concepts like edges and lines. An example is given of how an edge detector can be created at a lower level of neurons based on positive and negative signaling in response to light and dark areas.

  • 00:15:00 In this section, the speaker explains how the neural connections in the brain detect very basic linear and circular primitives, such as edges and bars, and use them to sense more complex features like color, curvature, and orientation. The higher layers of the brain's visual cortex correspond to abstraction layers in deep learning, which build complex concepts from simpler parts. The brain's malleability also allows it to utilize different parts of the brain to sense corresponding signals, and experiments in animals have shown that circuits in the brain are interchangeable and can be rewired in injury. Additionally, the speaker notes the tremendous size difference between the brains of humans and mice, and how the expansion of the neocortex in mammals, particularly in primates, gave rise to higher levels of abstraction and social intelligence.

  • 00:20:00 In this section, the lecturer explains how neural networks can learn an immense range of functions that are well suited to the physical world we inhabit, despite not being able to learn every mathematical function. The lecture also explores how visual illusions can reveal the primitives and building blocks of the computations going on inside the brain, which can be exploited by deep learning to create experiences like seeing a person turn into a monstrous combination of animals. The lecture then moves on to discuss the key ingredients of convolutional neural networks, such as locality and the computation of convolutional filters, which are computed locally rather than in a fully connected network.

  • 00:25:00 In this section of the lecture on CNNs and deep learning in life sciences, the speaker discusses several key features of convolutional neural networks. These include the use of specialized neurons that carry out low-level detection operations, layers of hidden units where abstract concepts are learned from simpler parts, activation functions that introduce non-linearities, pooling layers for position invariance and reduced computation time, multiple filters that capture different aspects of the original image, and ways of limiting the weight of individual hidden units for regularization. These features are all important for building effective CNNs that can learn and recognize patterns in complex images or genomic data.

  • 00:30:00 In this section, the lecturer explains that the human brain also uses various mechanisms to strengthen useful connections while limiting the over-reliance on any single connection for a particular task. He mentions the examples of reducing firing of neurons over time and using reinforcement learning to improve motor tasks. He also draws parallels between these primitive learning mechanisms in the human brain and the backpropagation algorithm used in convolutional neural networks. The lecturer encourages students to think beyond current architectures and consider new computational architectures that could be derived from individual primitives. Finally, he addresses a question from the chat about how to think about applications that do or do not need locality within a fully connected network.

  • 00:35:00 In this section, the speaker discusses the two parts of deep neural networks: representation learning and classification. By having hierarchical layers of learning, combinations of pixels turn into feature extraction, and detection of features follows. This enables the network to learn a complex non-linear function through the coupling of the two tasks of backpropagation and feature extraction. The speaker mentions that this paradigm is very powerful and generalizable across different application domains. The field is still in its infancy, and there is a lot of room for creativity and exploration, particularly in genomics, biology, neuroscience, imaging, and electronic health records. Therefore, these application domains can drive the development of new architectures that might have broad applicability to data science across different fields.

  • 00:40:00 In this section, the speaker explains the concept of convolutional neural networks and the role of convolutions in exploiting spatial structure, carrying out local computation, and sharing parameters across the entire image. By applying a filter or kernel to every single patch of an image, convolution is used to compute a feature map that tells us how much a feature was present in every patch of the image, effectively doing feature extraction. The speaker emphasizes the use of multiple filters to extract multiple features, such as edges and whiskers, and spatially sharing the parameters of each filter to learn from fewer parameters.

  • 00:45:00 In this section on CNNs, the speaker explains that the parameters for each feature extraction process, such as edge detection, are shared and applied to the entire image at once. Each neuron in a hidden layer takes input from a patch, computes a weighted sum, and applies a bias in order to activate with a non-linear function. The convolutional filters are used to extract features from the image and learn representations, which can be learned through task-specific filters. Different species have evolved convolutional filters hard-coded from birth, which can be reused for the most helpful tasks.

  • 00:50:00 In this section, the lecturer talks about the process of learning filters through convolutional neural networks, which extract common features from images and identify specific features for different tasks. While certain filters are hard-coded, such as those specific to a particular species, others, like edge and face detection, are helpful for various applications. The concept of transfer learning is discussed, where previous convolutional filters can be applied to new data, to pre-learn intermediate and high-level representations before retraining for new features. The hierarchy of features from low-level to high-level is tuned to the classification task at hand. The lecturer also explains that convolution refers to the effect of twisting one thing into another, after which detection comes into play with the use of non-linearities.

  • 00:55:00 In this section, the speaker discusses the concept of non-linearities and how they allow for detection by introducing silence until a specific feature is observed. They also discuss the use of pooling layers, which find the maximum value within a certain section and reduce the size of the representation, making some detected features more robust. The fully connected layer then learns much more complex functions and captures combinations of the features extracted from the network, ultimately allowing for classification. The speaker also touches on the edge cases in convolution and how padding the images with zeros before convolving solves this issue.

  • 01:00:00 In this section, the speaker discusses the importance of maintaining the input size in convolutional neural networks and the different ways to achieve this, including zero padding and dilated convolution. The concept of data augmentation is introduced as a way to achieve invariance to transformations in the real world, such as changes in orientation or shape. By transforming the images in the first place, the network can learn to recognize objects regardless of their location or orientation. The speaker emphasizes the importance of learning millions of features from the bottom up and transforming images to achieve invariance.

  • 01:05:00 In this section, the speaker summarizes the concepts discussed in the previous sections of the lecture, including locality, filters and features, activation functions, pooling, and multi-modality. He then shows an example of a deep convolutional neural network, consisting of an input volume of an RGB image, followed by 20 filters with a stride of 2, which creates a volume of 10. The speaker emphasizes that the number of computed filters creates a corresponding volume, which changes at every layer of the network. He also demonstrates how to implement these concepts in TensorFlow using the Keras engine for deep learning, including different filter sizes, activation functions, pooling, and stride size.

  • 01:10:00 In this section, the speaker discusses different architectures of convolutional neural networks and their applications, starting with LeNet-5 for document recognition which helped establish the series of convolutional filters, sub-sampling, and fully connected layers that make up CNNs today. The speaker explains that the training of CNNs is an art, as it requires significantly more training data due to the higher number of parameters and layers. The importance of normalization in training is also emphasized, as asymmetric data can impact the performance of the model. Overall, the speaker highlights the natural and effective way that CNNs are able to accomplish classification tasks.

  • 01:15:00 In this section, the lecturer discusses several challenges associated with learning in deep convolutional neural networks. One of the challenges is the vanishing or exploding gradients, which can be mitigated by choosing initial values carefully and normalizing the data. The lecturer also explains how to choose the batch size, where you can train on the entire dataset or use mini-batches, and talks about different techniques for training, such as RMS prop and simulated annealing. The lecture also covers hyperparameters, which are the architecture and training parameters, and their impact on overall performance. Finally, the lecturer introduces two approaches to hyperparameter tuning, grid search, and random search, and discusses their benefits and drawbacks.

  • 01:20:00 In this section, the speaker emphasizes the importance of the fundamental principles behind convolutional neural networks rather than focusing on logistics and traditional approaches. The lecture covers the key features of CNNs, including convolutions, learning representations, detection, non-linearities, and pooling layers. The speaker also highlights the practical issues of making training invariant to small perturbations and addressing different types of architectures. Furthermore, the class will discuss the art of training models in future sessions. Overall, the lecture presents CNNs as an extremely versatile technique applicable in multiple settings.
CNNs Convolutional Neural Networks - Deep Learning in Life Sciences - Lecture 03 (Spring 2021)
CNNs Convolutional Neural Networks - Deep Learning in Life Sciences - Lecture 03 (Spring 2021)
  • 2021.03.02
  • www.youtube.com
6.874/6.802/20.390/20.490/HST.506 Spring 2021 Prof. Manolis KellisDeep Learning in the Life Sciences / Computational Systems BiologyPlaylist: https://youtube...
 

Recurrent Neural Networks RNNs, Graph Neural Networks GNNs, Long Short Term Memory LSTMs - Lecture 04 (Spring 2021)



Recurrent Neural Networks RNNs, Graph Neural Networks GNNs, Long Short Term Memory LSTMs

This video covers a range of topics starting with recurrent neural networks (RNNs) and their ability to encode temporal context, which is critical for sequence learning. The speaker introduces the concept of hidden markov models and their limitations, which leads to the discussion of long short-term memory (LSTM) modules as a powerful approach to deal with long sequences. The video also discusses the transformer module, which learns temporal relationships without unrolling or using RNNs. Graph neural networks are introduced, and their potential applications in solving classic network problems and in computational biology. The talk concludes with a discussion of research frontiers in graph neural networks, such as their application in degenerative graph models and latent graph inference.

This second part of the video discusses Recurrent Neural Networks (RNNs), Graph Neural Networks (GNNs), and Long Short Term Memory (LSTM) modules. It explains how traditional feedforward neural networks have limitations when dealing with graph-based data, but GNNs can handle a wide range of invariances and propagate information across the graph. The speakers also discuss Graph Convolutional Networks (GCNs) and their advantages and challenges. Additionally, the video describes the importance of attention functions in making GNNs more powerful and flexible.

  • 00:00:00 In this section, the lecturer introduces the topics that will be covered in the class, including recurrent neural networks and long short term memory modules. The lecturer discusses how machines can understand context and attention and encode temporal context using hidden markov models and recurrent neural networks. The lecture also covers how to avoid vanishing gradients by using memory modules and introduces the Transformer module that can learn temporal relations without unrolling the sequence. The lecture also touches on graph neural networks and how they use graph connectivity patterns to guide training. The lecturer then discusses the human brain's ability to read and understand context and introduces examples of phonemic restoration and filling in missing words based on context.

  • 00:05:00 In this section of the video, the speaker discusses how the brain processes language and sound through predicting what comes next, which is at the root of understanding. Recurrent neural networks are used to encode temporal context when applying machine learning to sequences in order to turn an input sequence into an output sequence that lives in a different domain, such as turning a sequence of sound pressures into a sequence of word identities. The speaker also gives examples of cognitive effects related to auditory and visual context information, like the McGurk effect and delayed auditory feedback, and explains how they work.

  • 00:10:00 In this section of the video, the speaker discusses the power of using a sequence prediction model to learn about the world. By predicting the next term in a sequence, the unsupervised learning process can be turned into a supervised learning process. This allows for the use of methods designed for supervised learning without the need for annotation. The speaker explains that a single common function can be learned and applied to the entire sequence, allowing for the prediction of future events. By incorporating hidden nodes and internal dynamics, more complex models can be built, and information can be stored for a long time. The speaker describes how probability distributions over hidden state vectors can be inferred, and how the input can be used to drive either the hidden nodes directly or indirectly by giving information to the current hidden node.

  • 00:15:00 In this section, the speaker discusses Hidden Markov Models (HMM) and their limitations. HMMs have two types of parameters: an emission matrix that represents the probability of observing each output given the hidden state, and a transition matrix that represents the probability of transitioning to another hidden state given the current one. However, at every time step, only one of the hidden states can be selected, and with n states, only log n bits of information can be remembered. To encode more previous information, an enormous number of states would be required. This limitation is addressed by recurrent neural networks (RNNs) that allow for explicit encoding of information.

  • 00:20:00 In this section, we learn about the properties of recurrent neural networks (RNNs), which allow us to store a lot of information efficiently using a distributed hidden state, updated in more complicated ways with non-linear dynamics. While the posterior probability distribution of hidden states in a linear dynamical system or hidden markov model (HMM) is stochastic, the hidden state of an RNN is deterministic. Unlike HMMs or linear dynamic systems, which are stochastic by nature, RNNs can have all kinds of behaviors such as oscillation or behaving chaotically, allowing for unpredictable decisions. Feedforward networks and RNNs are the same when the time is unrolled, making it equivalent to a feedforward network with an infinite number of stacks, with the same shared weights used to compute each character in the network.

  • 00:25:00 In this section, the speaker explains the specific architectures that can be used for recurrent neural networks (RNNs) and how they can be trained using backpropagation. One way is to have inputs feeding into hidden units and then have a single output after the entire sequence. Another way is to have an output prediction at every time step, allowing for information to flow between hidden units and enabling the prediction of output variables. The same backpropagation algorithm can be used to update the weights of these architectures. The speaker emphasizes that the weights are shared across different levels of the network, which allows for more efficient learning.

  • 00:30:00 In this section, the speaker discusses the concept of back propagation through time in recurrent neural networks (RNNs) and how it allows for the encoding of memory from previous time steps. They explain that this can be accomplished by flowing the output from the previous time step into the current hidden unit or by feeding the correct output label for the previous utterance to the current model during training. The training process involves taking the derivative of the loss function versus every weight and using this to update the weights under linear constraints. The speaker notes that while RNNs may seem complex, they can be trained using the same procedures as other neural networks.

  • 00:35:00 In this section, the speaker discusses the concept of modeling sequences with machine learning tools and how to deal with long sequences. He explains that in cases like translating sentences or transcribing spoken words, input sequences need to be turned into output sequences. However, when there is no separate target sequence, a teaching signal can be obtained by trying to predict the next term of the input sequence. The challenge of this approach arises when dealing with very long sequences where the influence of a particular word decays over time. To deal with this, the speaker explains various methods, such as echo state networks and the utilization of momentum, but highlights long short-term memory modules as the most powerful approach. These modules use logistic and linear units with multiplicative interactions to design a memory cell that can remember values for hundreds of time steps.

  • 00:40:00 In this section, the speaker explains the concept of an analog memory cell in long short-term memory (LSTM) neural networks. The memory cell is a linear unit with a self-link that has a weight of one, ensuring that information remains unchanged and undiluted by any kind of weight decay. The cell is controlled by gates that maintain an echo chamber where the information is repeated constantly until it's needed, and the network decides when to remember or forget a piece of information. The activation of the read and keep gates allows for the retrieval and maintenance of the information, respectively. The network is given capabilities of remembering, forgetting, storing, and retrieving a memory, and it decides when it's helpful to remember or forget a particular piece of information. The implementation of these gates allows for the preservation of information for a long time in recurrent neural networks.

  • 00:45:00 In this section, the speaker discusses the application of recurrent neural networks (RNNs) in reading cursive handwriting. RNNs with long short-term memory modules were found to be the best system for this task in 2009. A sequence of small images was used as a substitute for pen coordinates to train the network. The speaker shows a demo of online handwriting recognition, where the characters are inferred from the handwriting over time, and the posterior probabilities for each of those characters are accessed. The state of the system is maintained, and different points receive different weights. The speaker explains how characters are learned and which parts of the system are important. The speaker also discusses the initialization of hidden and output units of RNNs and how their initial state can be treated as learn parameters instead of explicitly encoding them.

  • 00:50:00 In this section, the video describes a new development in neural networks called the transformer module, which learns temporal relationships without unrolling and without using recurrent neural networks. The transformer module uses an input with a positional encoding to indicate where the network is in the sequence without the need to unroll the network over time. The encoder shifts the output embedding by one relative to the input to predict the next item in the sentence while the attention modules determine the most vital points in the sentence. The module uses a query representation of one word in the sequence, keys representations of all the words in the sequence and value representations of all the words in the sequence to achieve the temporal relationships.

  • 00:55:00 In this section, the speaker discusses the Transformer, a type of neural network that is useful for sequence translation or any kind of sequential task. It encodes the context information of the entire sentence each time in producing each word, and the relationships between consecutive words are encoded in this input-output relationship, which is shifted by one. The speaker also introduces graph neural networks and describes how they can be used to solve classic network problems, as well as the potential application in computational biology. The talk concludes with a discussion of research frontiers of graph neural networks, such as their application in degenerative graph models and latent graph inference.

  • 01:00:00 In this section, the speaker talks about the advantages of using Convolutional Neural Networks (CNNs) on grid-structured data like images and the potential of using Graph Neural Networks (GNNs) on non-grid data like social networks, brain connectivity maps, and chemical molecules. The speaker also discusses the three different types of features that may be present in GNNs: node features, edge features, and graph-level features. Additionally, the speaker highlights the issues of using a fully connected network for graph predictions, including the number of parameters scaling with the number of nodes, making it impractical for many situations.

  • 01:05:00 In this section, the speaker discusses some of the limitations of using traditional feedforward neural networks for graph-based data, including the issue of graph size and the lack of invariance to node ordering. They then introduce graph neural networks (GNNs), which can handle a wide class of invariances and propagate information across a graph to compute node features and make downstream predictions. The basic formula for GNNs involves sampling information from the node's surrounding neighborhood and updating the node's representation based on this information. The speaker notes that this process is similar to the process used in convolutional neural networks (CNNs) for image data.

  • 01:10:00 In this section, the speaker discusses the concept of two-layer graph neural networks and how they are updated for different nodes in a graph. They explain that graph neural networks are different from other types of networks because they allow for more information overall, instead of just higher order interactions between different parts of the input space. The speaker also talks about the graph convolutional networks (GCNs) and how they consider undirected graphs, with an update rule that applies a weight matrix to each hidden representation from a node's neighbors. The scalability of graph neural networks is also discussed, with the suggestion of subsampling the number of contributors to node updates to prevent the network from blowing up.

  • 01:15:00 In this section, the speaker explains graph neural networks (GNNs), which are a type of neural network used for graph data. GNNs are less weight-dependent than fully connected networks and are invariant to permutations, allowing for classification on large graphs. GNNs have indirect support for edge features, and one adaptation is to use edge embeddings to fully pass messages through the network. The speaker uses citation networks as an example and explains how the update mechanism works in GNNs.

  • 01:20:00 In this section of the video, the speaker explains how a graph neural network works for edge to vertex update, and how the attention function plays a vital role in making the network flexible and powerful. The goal of the GNNs edge to vertex update is to get the state of one of the edges, which can be achieved by taking an aggregation of the representations from the incident nodes and applying a nonlinear function specific to the edge updates. Similarly, the vertex updates involve information from the incident edges of a node. However, edge-based activations become huge, making it intractable to handle large graphs. Attention function provides an explicit vector representation without including all the edges' information, reducing the models' computational requirements while still retaining their flexibility and power. The speaker describes how attention scores can show how much each neighbor is contributing to the central node's update, making it possible to infer some relationship or contribute properties.

  • 01:25:00 In this section, the speakers discuss Graph Convolutional Networks (GCNs) and their advantages and challenges. GCNs allow for multiple layers to be applied throughout the graph, and each update has the same form. They are useful for node classification, graph classification, and link prediction. However, there are still optimization issues due to the parallel updates throughout the graph, and normalization constants may need to be fixed to avoid destabilization. Additionally, GCNs can suffer from expressivity issues compared to other methods like Graph Attention Networks (GATs). Nonetheless, GCNs are still faster than methods that require edge embeddings or neural message passing.

  • 01:30:00 In this section, the speaker discusses Graph Neural Networks (GNNs), which are a type of model that can be applied to graphs or networks of data. GNNs involve taking the dot product between the representations of any two nodes in the graph, applying a non-linear function such as a sigmoid, and then producing a probability of the existence of that edge. GNNs also enable predictive modeling in areas such as gene interaction in biology. The speaker concludes by summarizing the various types of networks discussed, including Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), Long Short-Term Memory modules, and Transformer modules.
Recurrent Neural Networks RNNs, Graph Neural Networks GNNs, Long Short Term Memory LSTMs
Recurrent Neural Networks RNNs, Graph Neural Networks GNNs, Long Short Term Memory LSTMs
  • 2021.03.02
  • www.youtube.com
Deep Learning in Life Sciences - Lecture 04 - RNNs, LSTMs, Transformers, GNNs (Spring 2021)6.874/6.802/20.390/20.490/HST.506 Spring 2021 Prof. Manolis Kellis...
 

Interpretable Deep Learning - Deep Learning in Life Sciences - Lecture 05 (Spring 2021)



Interpretable Deep Learning - Deep Learning in Life Sciences - Lecture 05 (Spring 2021)

This video discusses the importance of interpretability in deep learning models, especially in the field of life sciences where decisions can have dire consequences. The speaker explains two types of interpretability: building it into the design of the model from the outset and developing post hoc interpretability methods for already-built models. They go on to explore different techniques for interpreting models, including weight visualization, surrogate model building, and activation maximization, and discuss the importance of understanding the internal representations of the model. The lecturer also explains several methods for interpreting individual decisions, such as example-based and attribution methods. Additionally, the speaker discusses the challenge of interpreting complex concepts and the limitations of neural network model interpretations, as well as exploring hypotheses related to the discontinuity of gradients in deep learning neural networks.

In the second part of the lecture, the speaker addressed the challenges of discontinuous gradients and saturated functions in deep learning models in the life sciences field. They proposed methods such as averaging small perturbations of input over multiple samples to obtain a smoother gradient, using random noise to highlight the salient features in image classification, and backpropagation techniques such as deconvolutional neural networks and guided backpropagation to interpret gene regulatory models. The speaker also discussed the quantitative evaluation of attribution methods, including the pixel flipping procedure and the remove and replace score approach. Finally, they emphasized the need for interpretability in deep learning models and the various techniques for achieving it.

  • 00:00:00 In this section, the presenter discusses the importance of interpretability in deep learning and the different methods for achieving it. They explain that while deep learning models can outperform humans, it is important to understand how they are making decisions and whether these decisions can be trusted. Interpretability can help with debugging, making discoveries, and providing explanations for decisions. The presenter goes on to discuss anti-hawk and post hoc methods for interpretation, as well as interpreting models versus decisions. They then delve into specific methods for interpreting models, such as weight visualization, building surrogate models, activation maximization, and example-based models. Finally, the presenter discusses attribution methods and evaluating the effectiveness of these methods through qualitative and quantitative measures.

  • 00:05:00 In this section, the importance of interpretability in machine learning is emphasized, especially in the life sciences field where wrong decisions can have costly consequences. The traditional approach of building a giant model without understanding how and why it works is no longer sufficient, and instead, interpretable information must be extracted from black box models. Interpretable machine learning provides verified predictions optimized not just for the generalization error, but also for the human experience. It is important to understand the physical, biological, and chemical mechanisms of disease to train doctors better and gain insight into how the human brain functions. Additionally, the right to explanation is crucial in fighting biases that may be inherent in training data sets due to centuries of discrimination.

  • 00:10:00 In this section of the video, the speaker discusses two types of interpretability in deep learning: building interpretability into the design of the model, and building post hoc interpretability by developing special techniques for interpreting complex models after they have been built. They explain that deep learning has millions of parameters, making it impossible to build interpretable models to start with. Therefore, the focus is on developing techniques for post hoc interpretability based on their degree of locality. The speaker also discusses ways of building interpretable neural networks at both the model and decision levels.

  • 00:15:00 In this section, the speaker discusses the two types of interpretable models for deep learning: those that interpret the model itself and those that interpret the decisions. The decisions can be based on either attribution methods or example-based methods. The speaker also talks about analyzing the representations themselves and generating data from the model. They introduce four types of approaches to analyzing representations which include weight visualization, surrogate model building, and understanding the inputs that maximize activation units. Finally, the speaker highlights the importance of understanding the internal representations of the model, specifically the hierarchical features extracted from the left half of the model which can provide insight into how deep learning models make inferences.

  • 00:20:00 In this section, the lecturer discusses the idea of interpreting deep learning by looking at the internal workings of the neural network. He explains that just like how scientists studied the visual cortex in cats and monkeys to understand how individual neurons fired at different orientations, we can look at the neurons firing in a neural network to understand the primitives or features that the network has learned to recognize. However, with millions of parameters and thousands of internal nodes, it's not feasible to visualize every one of them. Therefore, the lecturer introduces the idea of surrogate models or approximation models that are simpler and more interpretable. The lecture also covers activation maximization, where the goal is to generate data that maximizes the activation of a particular neuron.

  • 00:25:00 In this section, the speaker discusses an optimization problem that involves maximizing the class posterior probability for a given input while also using a regularization term to ensure that the output is human interpretable. They explain that simply maximizing based on class probability can result in images that don't make much sense, so the additional regularization term is necessary to constrain the output to be interpretable. They also touch on the concept of latent variables and parameters that can help parameterize noisy vectors and improve the quality of interpretations. The goal is to generate data that matches the training data more closely so that the output resembles the class-related patterns and is easier to interpret for humans.

  • 00:30:00 The goal is to maximize or minimize certain features, and then use those instances to understand how the model is making its decisions. This can be done through activation maximization within the space of possible inputs, where the input is constrained to come from a human-like distribution of data. Alternatively, a generative model can be used to sample from the probability density function of that distribution. By forcing the presentation to be within the code space, the resulting images are more interpretable and can be used to build more interpretable models. Other techniques for building interpretable models include weight visualization, building surrogate models that approximate the output, and example-based interpretation where instances that either maximize or minimize certain features are used to understand the model's decision-making process.

  • 00:35:00 In this section, the speaker discusses four different ways of interpreting decisions made by the model, specifically in terms of practical applications. The first method is example-based, which involves selecting examples that are misclassified and close to the particular input, to teach the model how to improve. The second method is active attribution, which involves looking at why a particular gradient is noisy. The third method is gradient-based attribution with smooth grad or interior gradients, and the final method is back prop-based attribution with convolution and guided black propagation. The limitations of model-level interpretation are also noted, particularly when it comes to determining the best image to interpret the classification.

  • 00:40:00 In this section, the speaker discusses the challenge of interpreting deep learning models in terms of finding a prototype or a typical image that represents a complex concept, such as a motorcycle or a sunflower. The example-based method is introduced as a way of identifying which training instance influences a decision the most, without specifically highlighting the important features of those images. The method aims to determine the nearest training images based on their influence on the classification of a particular image, rather than pixel proximity. The speaker also talks about the fragility of neural network model interpretations and the use of influence functions in understanding the underlying learning process.

  • 00:45:00 In this section, the speaker introduces two methods for interpreting deep learning models. The first is example-based interpretation, which looks at individual examples in the training set to understand the neural network's decision-making process. The second is attribution methods, which assign an attribution value to each pixel in an image to determine how much it contributes to the classification decision. The goal of both methods is to make machine learning interpretable and understandable by humans, and to identify the features that are most salient in an image. By visualizing the attribution values as heat maps, researchers can develop a better understanding of how deep neural networks make decisions and which pixels in an image are most responsible for that decision.

  • 00:50:00 In this section, the speaker explains how to calculate the saliency of an image using the same methodology as back propagation during training. Instead of looking at derivatives relative to weights, they look at derivatives relative to pixels. The saliency map is then calculated by visually attributing these pixels back to the image. However, these saliency maps tend to be noisy and not precise. The speaker details two hypotheses to explain why this is the case: either the scattered pixels are important to the neural network decision-making process or that gradients might be discontinuous. The speaker then explains how these hypotheses guided the development of methods to address the noisy saliency maps.

  • 00:55:00 In this section of the lecture, the speaker discusses three hypotheses related to the discontinuity of gradients in deep learning neural networks. The first hypothesis is that the function being learned is not smooth, and as more layers are added, the firing becomes extremely discontinuous, leading to misclassifications. The second is that gradients are discontinuous due to the number of layers and non-derivative functions, causing noise and allowing for trickery in classification functions. The third hypothesis suggests that the function saturates, preventing the ability to learn anything smoother. To improve upon these partial derivatives with respect to input, one possibility discussed is to add noise to perturb the input and use the gradient on the perturbed input or take the average over multiple perturbations to smooth out the noisy gradient.

  • 01:00:00 In this section, the speaker discussed solutions for deep learning challenges caused by discontinuous gradients or saturated functions. These included methods for changing the gradients or backpropagation and using multiple images with added noise. The speaker also discussed various attribution methods, such as layer-wise relevance propagation and deep lift, for interpreting gene regulatory models. To address gradients' discontinuity, they suggested defining a smooth gradient function by averaging small perturbations of the input over many samples, effectively smoothing the gradient function to operate like a shallow network rather than a deep network. Moreover, the speaker explained how adding random noise to images could help demonstrate the robustness of the model and highlight the salient features recurrently used in image classification.

  • 01:05:00 In this section, the lecturer discusses three hypotheses for interpreting deep learning models in the life sciences field. The second hypothesis suggests that the gradients are discontinuous with any one particular image, but by taking a sample of multiple images surrounding that one, a smoother gradient can be obtained. The third hypothesis suggests that the function saturates, leading to extreme activations. To address this, the lecturer proposes scaling back the images to bring them closer to the distributable functions. This is done through interior gradients which are used to rescale the pixels of the image. Backdrop based methods are also explored, such as deconvolutionary neural networks and guided back propagation due to the challenge of zeroed-out values in the rectified linear unit.

  • 01:10:00 In this section, the speaker discusses the challenges of backpropagation in deep learning and how they can be addressed using deconvolutional neural networks. By mapping feature patterns to the input space and reconstructing the image, deconvolutional neural networks can obtain a valid feature reconstruction and remove noise by removing negative gradients. The speaker also explains how guided backpropagation can be used to combine information from the forward and backward passes to generate images that are representative of the original image. Additionally, the speaker discusses methods for evaluating these attribution methods, including qualitative and quantitative approaches based on coherence and class sensitivity. Finally, the speaker explores different attribution methods, such as deep lift, saliency maps, and smooth grad, and their effectiveness in capturing specific pixels responsible for a particular classification.

  • 01:15:00 In this section, the speaker discusses the quantitative evaluation of attribution methods in deep learning. The goal is to evaluate whether these methods are properly capturing the intended object of interest and distinguishing different object classes. The speaker introduces the pixel flipping procedure to remove individual features with high attribution values and evaluates the classification function to measure the sensitivity of the method. The accuracy of saliency attributions and classification attributions can be measured using a curve, and the speaker suggests removing and retraining to achieve better accuracy. Overall, the section discusses quantitative ways to evaluate the effectiveness of deep learning attribution methods.

  • 01:20:00 In this section, the speaker explains how the performance of a classifier can be measured by removing specific features based on the attribution method. The "remove and replace score" approach involves replacing a certain percentage of the most or least important pixels and retraining the deep neural network to measure the change in accuracy. This provides a quantitative metric for evaluating the accuracy of interpreting decisions. The speaker also recaps the importance of interpretability and different techniques for interpreting deep learning models using attribution methods and activation maximization, as well as the challenges of post hoc methods.

  • 01:25:00 In this section, the lecturer discusses how deep learning models can be constrained and the most salient features found using backpropagation, deconvolution, and guided backpropagation. Various methods of scoring these division methods were also highlighted, including coherence, class sensitivity, and quantitative metrics to remove features with high attribution. The lecturer then introduced remove and retrain methods where individual pixels can be removed, retrained, and the drop in accuracy measured. The lecture was concluded with a review of the covered topics, and upcoming lectures were announced.
Interpretable Deep Learning - Deep Learning in Life Sciences - Lecture 05 (Spring 2021)
Interpretable Deep Learning - Deep Learning in Life Sciences - Lecture 05 (Spring 2021)
  • 2021.03.03
  • www.youtube.com
Deep Learning in Life Sciences - Lecture 05 - Interpretable Deep Learning (Spring 2021)6.874/6.802/20.390/20.490/HST.506 Spring 2021 Prof. Manolis KellisDeep...
 

Generative Models, Adversarial Networks GANs, Variational Autoencoders VAEs, Representation Learning - Lecture 06 (Spring 2021)



Generative Models, Adversarial Networks GANs, Variational Autoencoders VAEs, Representation Learning - Lecture 06 (Spring 2021)

This video discusses the concept of representation learning in machine learning, emphasizing its importance in classification tasks and potential for innovation in developing new architectures. Self-supervised and pretext tasks are introduced as ways to learn representations without requiring labeled data, through techniques such as autoencoders and variational autoencoders (VAEs). The speaker also discusses generative models, such as VAEs and generative adversarial networks (GANs), which can generate new data by manipulating the latent space representation. The pros and cons of each method are discussed, highlighting their effectiveness but also their limitations. Overall, the video provides a comprehensive overview of different approaches to representation learning and generative models in machine learning.

The video explores the concepts of Generative Adversarial Networks (GANs), Variational Autoencoders (VAEs), and representation learning in generative models. GANs involve the generator and discriminator having opposing objectives, and the training process is slow for fake samples, but improvements in resolution and objective function can lead to realistic-looking images. The speaker demonstrates how GANs can generate architecturally plausible rooms and transfer one room to another. VAEs explicitly model density functions and capture the diversity of real-world images through meaningful latent space parameters. The speaker encourages creativity and experimentation with open architectures and models, and the application of generative models and representation learning in various domains is a rapidly growing field with limitless possibilities.

  • 00:00:00 In this section, the speaker discusses the concept of representation learning in machine learning and how it has been used in convolutional neural networks (CNNs) to learn about the world. They emphasize that the true advance of deep learning came from CNN's ability to learn non-linearities and representations about the world through feature extraction. The speaker argues that classification tasks are driving feature extraction and that this is where all of the knowledge representation of the world comes from. They also suggest that there is potential for innovation in developing new architectures for representation learning in various domains that go beyond existing architectures. Finally, they assert that the most exciting part of generative models is the latent space representation rather than the labels and that such models can be used to learn a model of the world without relying on labels.

  • 00:05:00 In this section, the speaker discusses representation learning and the use of self-supervised learning for this purpose. Self-supervised learning involves using part of the data to train another part of the data and tricking the data into being its own supervisor. This allows for the learning of cool representations that can be used to generate views of the world. Generative models work by running the model backwards and going from the compressed representation of the world to more examples of it. Another approach to representation learning is pre-text tasks, where the task at hand is merely an excuse to learn representations. The example given is predicting self, which is what auto-encoders are all about. The concept of going through a compressed representation and re-expanding it into the image itself through a clamp is meaningful enough that a representation underlying the world can be learned. Variational auto-encoders explicitly model the variance and the distributions.

  • 00:10:00 In this section, the speaker discusses the concept of pretext tasks, which refers to processing input signals through a network to learn representations of the network and using the input signal to create a training signal that is a task that one doesn't really care about. Examples of pretext tasks include predicting before and after images, predicting the remaining pixels of an image after removing a patch, and colorizing black and white images. The goal of pretext tasks is to force oneself to learn representations of the world, leading to effective supervised learning tasks. The importance of understanding this concept is crucial as it leads to the subsequent topics of discussion, such as autoencoders, variational autoencoders, and generative adversarial networks.

  • 00:15:00 In this section, the concept of self-supervised learning is introduced as a way to learn good representations by constructing pretext tasks that enable learning without requiring labeled data. Pretext tasks include inferring the structure of an image, transforming images, or using multiple images, among others. One example of a pretext task is inferring the relative orientation of image patches, while another is a jigsaw puzzle task where the original image must be reconstructed. The pros and cons of each self-supervised method are discussed, highlighting their effectiveness but also their limitations, such as assuming photographic canonical orientations in training images or limited outward space.

  • 00:20:00 In this section, the speaker explains the concept of pretext tasks, building the same kind of concept of pretext tasks and applying them to different examples to learn a representation of the world that will make us solve seemingly complex tasks by actually learning something interesting about the world. One of the examples includes creating an encoder and decoder representation to learn lower-dimensional feature representations from unlabeled data which is training an unsupervised learning task into a supervised learning task. The goal is to force meaningful representations of data variations and use features to construct the decoded version of your encoded original image, and the loss function is the difference between the original and the predicted.

  • 00:25:00 In this section, the speaker explains how autoencoders can be used to build representations of the world and generate images through a generator function. The z vector in autoencoders can provide meaningful information about the relative features and presence of different features in the world, which can be used to generate additional images. The encoder and decoder can be used separately for different tasks, such as using the decoder as a generative model and the encoder as a feature space vector for representation learning. The speaker then introduces the concept of variational autoencoders (VAEs), which is a probabilistic spin on autoencoders that lets us sample from the model to generate additional data. VAEs learn from a multi-dimensional representation of a set of scalars and associated variances for every scalar. By sampling from the true prior of the latent space vector, we can generate images based on various attributes of the image.

  • 00:30:00 In this section, the speaker discusses generative models and their goal of capturing the world through tuning various vectors in the autoencoder. These vectors end up being meaningful representations of the world, allowing for the sampling of different images by varying the parameters. The strategy for training the generative models is to maximize the likelihood of the training data by learning the model parameters. The speaker then introduces variational autoencoders, which probabilistically generate models by explicitly modeling the mean and variance of the data. By having the encoder provide both a single z and a variance of the z, the speaker is able to sample from both normal distributions and recognize different variations of objects, such as boats.

  • 00:35:00 In this section, the speaker explains the concept of variational autoencoders (VAEs) and how they work. VAEs consist of an encoder network that maps input data to a latent space, a decoder network that generates output data from the latent space, and a generation network that generates images from the representation learned by the encoder network. The speaker explains that the VAE loss function is a variational lower bound that maximizes the reconstruction of the input data and the approximation of the prior distribution of images using the decoder network. The speaker also mentions that the KL divergence term is intractable but can be treated as a lower bound for optimization through gradient descent.

  • 00:40:00 In this section, the speaker explains how generative models, such as variational autoencoders (VAEs), can be used to construct a representation of the world with meaningful features. By encoding images using only two dimensions, the VAE can capture the space of all possible characters and generate any kind of character that can be represented using just a two-dimensional coordinate. By diagonalizing the prior on z, the network is learning independent latent variables and the different dimensions of z encode interpretable factors of variation in a good feature representation. This encoder network allows users to generate data and decode the latent space through the prior distribution of z, making VAEs a useful tool for representation learning.

  • 00:45:00 In this section, the video discusses the use of variational autoencoders (VAEs) as a principled approach to generative models that allows for the inference of the latent space given x, which can be a useful representation for other tasks. However, VAEs have some cons such as maximizing the lower bound of the likelihood, which is not as good as explicitly evaluating the likelihood. The generated samples from VAEs are also blurrier and lower quality compared to those from generative adversarial networks (GANs). There is ongoing research on improving the quality of samples from VAEs such as using more flexible approximations for richer posterior distributions and incorporating structure in the latent variables. The video also summarizes the key takeaways from the previous sections on generation, unsupervised learning, and latent space parameters.

  • 00:50:00 In this section, the concept of generative adversarial networks (GANs) is discussed. GANs are designed to generate complex high-dimensional images by sampling from a simple distribution, such as random noise, and learning transformations to create images from a training set. The system consists of a generator network to create fake images, and a discriminator network to distinguish between real and fake images. The aim is to train the generator to create more realistic images by fooling the discriminator, which becomes an adversary in the process. The system is self-supervised, meaning no manual labeling is necessary, and replaces the need for human evaluators.

  • 00:55:00 In this section, the speaker explains the concept of generative adversarial networks (GANs) that use a mini-max game approach to train a generator and a discriminator network. The discriminator is trained to determine whether the generated images are real or fake, while the generator is trained to create images that can fool the discriminator into believing they are real. Through this joint likelihood function, the weights and parameters of both networks are trained simultaneously, with the objective of having the discriminator output a score of 1 for real images and 0 for fake images. The generator, on the other hand, aims to minimize that score by generating images that are indistinguishable from real ones.

  • 01:00:00 In this section, the concept of Generative Adversarial Networks (GANs) is explained, where a generator and discriminator have opposing objectives in a game-like scenario. The generator tries to produce fake data that will fool the discriminator, which has learned to classify real data correctly. However, the training is slow when the sample is fake, so a trick is used where the objective of the generator is changed to maximize the likelihood of the discriminator being wrong for the fake data. Jointly training the two networks can be challenging, but progressively increasing the resolution of the images can improve stability. The GAN training algorithm involves alternating between updating the discriminator by ascending the stochastic gradient and updating the generator using the improved objective function.

  • 01:05:00 In this section, the video discusses the concept of Generative Adversarial Networks (GANs) and the training process involved in building a generator network to create realistic images. The video explains how the discriminator network is trained to distinguish between the generated images and actual images, and how the generator network is trained to improve the quality of the generated images to the extent that they surpass human performance. The video further explains how to build deep convolutional architectures with fractionally striated convolutions and use ReLU and leaky ReLU activation functions to obtain realistic-looking images. The video demonstrates the potential of using GANs to generate architecturally plausible rooms and shows how to transfer one room to another by interpolating between latent space coordinates.

  • 01:10:00 In this section, the speaker discusses generative models such as GANs, Variational Autoencoders (VAEs), and representation learning. The aim of these models is to generate diverse and realistic samples by learning the underlying patterns and styles of the real world. The speaker presents examples of how these models are capable of performing various image manipulation tasks, such as upscaling, domain knowledge transfer, and texture synthesis. The speaker also highlights the advancements made in these models, such as Progressive GANs, which allows for generating high-resolution images, and Nvidia's "This person does not exist" website, which uses a large number of parameter spaces to learn orthogonal parameters that enable the decomposition of different image components.

  • 01:15:00 In this section, the speaker explains a taxonomy of generative models, which can involve modeling explicit or implicit density functions. Generative adversarial networks (GANs) model density functions implicitly through coupling generator and discriminator networks, while variational autoencoders (VAEs) model density functions explicitly. The power of deep learning lies in representation learning, and the speaker encourages creativity and experimentation with the young field's many open architectures and models. The use of pretext tasks, such as predicting self or filling in missing patches, can help to learn meaningful latent representations of the world and move towards truly generative models that can sample from a true distribution of latent space parameters.

  • 01:20:00 In this section, the speaker discusses the concept of capturing the diversity of real-world images through meaningful latent space parameters in variational autoencoders (VAEs). By constraining the latent space parameters to be orthogonal and distinct from each other, the resulting images can be indistinguishable from real people. Additionally, the speaker notes that the application of generative models and representation learning is a rapidly growing field with limitless possibilities in various domains.
 

Regulatory Genomics - Deep Learning in Life Sciences - Lecture 07 (Spring 2021)



Regulatory Genomics - Deep Learning in Life Sciences - Lecture 07 (Spring 2021)

The lecture covers the field of regulatory genomics, including the biological foundations of gene regulation, classical methods for regulatory genomics, motif discovery using convolutional neural networks, and the use of machine learning models to understand how sequence encodes gene regulation properties. The speaker explains the importance of regulatory motifs in gene regulation and how disruptions to these motifs can lead to disease. They introduce a new model using a convolutional neural network that maps sequencing reads to the genome and counts how many five-prime ends each base pair on the two strands has. The model can be used for multiple readouts of different proteins and can be fitted separately or simultaneously using a multitask model. The speaker also shows how the model can analyze any kind of assay, including genomic data, using interpretation frameworks that uncover biological stories about how syntax affects TF cooperativity. The models can make predictions that are validated through high-resolution CRISPR experiments.

The video discusses how deep learning can improve the quality of low-coverage ATAC-seq data by enhancing and denoising signal peaks. AttackWorks is a deep learning model that takes in coverage data and uses a residual neural network architecture to improve signal accuracy and identify accessible chromatin sites. The speaker demonstrates how AttackWorks can be used to handle low quality data and increase the resolution of studying single-cell chromatin accessibility. They also describe a specific experiment on hematopoietic stem cells that used ATAC-seq to identify specific regulatory elements involved in lineage priming. The speaker invites students to reach out for internships or collaborations.

  • 00:00:00 In this section, the lecturer introduces the field of regulatory genomics and invites guest speakers to discuss influential papers and provide opportunities for collaboration and internships for the students. The lecture is the start of Module 2 on gene regulation and covers the biological foundations of gene regulation, classical methods for regulatory genomics, and motif discovery using convolutional neural networks. The lecture emphasizes the complexity of the genetic code, allowing for the construction and development of a self-healing organism with intricate interconnections across every aspect of the body, from head to toes.

  • 00:05:00 In this section, the complexity of cells and how they remember their identity despite having no contextual information is discussed. The regulatory circuitry of cells is also highlighted, which is based on a set of primitives and constructs that allow cells to remember the state of every piece of the genome. The packaging of DNA in both structural and functional constructs is integral to this process, enabling cells to compact so much DNA inside them. This packaging is done through nucleosomes, little beads in a string view of DNA, composed of four histone proteins, each with a long amino acid tail that can be post-translationally modified with different histone modifications. These modifications work with additional epigenomic marks, such as CPG dinucleotides, directly on the DNA to enable cells to remember their identity.

  • 00:10:00 In this section, the speaker discusses the three types of modifications in epigenomics: DNA accessibility, histone modifications, and DNA methylation. He explains how these modifications can affect gene regulation and the binding of transcription factors. By using the language of epigenomics, one can program every cell type in the body by tuning the compacting of the DNA to specific signatures of promoter regions. Promoters, transcribed regions, repressed regions, and enhancer regions are all marked by different sets of marks that can be identified and studied. Enhancers, in particular, are extremely dynamic and marked by H3K4 monomethylation, H3K27 acetylation, and DNA accessibility.

  • 00:15:00 n this section, the speaker explains the concept of "chromatin states," which are different states of the chromatin corresponding to enhancers, promoters, transcribed and repressed regions, among others. A multivariate hidden Markov model is used to discover these chromatin states, and this is used to locate enhancer regions, promoter regions, and transcribed regions in different cell types of the body. The way that proteins recognize DNA is also explained, with transcription factors using DNA binding domains to recognize specific DNA sequences in the genome. The speaker also talks about DNA motifs and position weight matrices, which allow for the recognition of the specificity of a DNA sequence, and information theoretic measures that distinguish binding sites for regulators.

  • 00:20:00 In this section of the lecture, the speaker discusses the importance of regulatory motifs in gene regulation and how disruptions to these motifs can lead to disease. The speaker then explains three technologies for probing gene regulation: chromatin immunoprecipitation, DNA accessibility, and ATAC-seq. These technologies can be used to map the locations of enhancers and discover the language of DNA by using motifs and building deep learning models.

  • 00:25:00 In this section of the video, the speaker discusses the use of machine learning models to understand how sequence encodes gene regulation properties. She introduces different experiments that profile regulatory DNA and highlights the need to understand the complex syntax of regulatory elements to drive specific responses. The problem is modeled as a machine learning task where each genome is partitioned into little chunks of thousand base pairs, and each of these base pairs are associated with some signal from the experiment.

  • 00:30:00 In this section, the speaker discusses the traditional approach of summarizing genetic information by mapping sequences to scalars using various machine learning models. However, this approach results in a loss of information, as read coverage profiles at single nucleotide resolution contain geometries that reflect protein DNA interaction, resulting in high-resolution footprints. These intricate details are lost when summarizing information into a scalar. To fix this issue, the speaker emphasizes the need to build a new model that can model the data at its most basic resolution, which is accomplished with a convolutional neural network that maps sequencing reads to the genome and counts how many five-prime ends each base pair on the two strands has. They then use a neural network that does this translation, starting with sequence, into real-valued readouts, which move towards the profile, resulting in a straight sequence to the profile model.

  • 00:35:00 In this section, the speaker explains the use of loss functions for modeling counts of reads falling on a sequence and how those reads are distributed across base pairs. They use a combination of mean square error for total counts and multinomial negative log-likelihood for the precise distribution of reads at each base pair. The model can be used for multiple readouts of different proteins and can be fitted separately or simultaneously using a multitask model. The speaker applies this model to four famous pluripotency transcription factors in mouse embryonic stem cells using chip nexus experiments with high-resolution footprints.

  • 00:40:00 In this section, the speaker focuses on the accuracy of the models in making predictions about enhancers in the genome, which are highly accurate despite some noise or differences from the observed data due to denoising, imputation, and other factors. To evaluate the models' genome-wide performance, they use two metrics: the Jensen-Shannon divergence and the similarity between replicate experiments, with the former showing better performance than the latter, which is computed to provide upper and lower bounds. The speaker then explains their interpretation approach, using the DeepLift algorithm to recursively decompose the contributions of neurons across layers and nucleotides, providing high-resolution interpretations of which pieces of the sequence drive binding by each of the four transcription factors, revealing a combinatorial syntax.

  • 00:45:00 In this section, the speaker discusses two methods used to summarize the patterns learned by the model across the whole genome. The first method, Modisco, takes all the sequences bound by a protein of interest and infers deep lift scores for each nucleotide in every sequence. The sequences are then clustered based on similarity and collapsed into non-redundant motifs. The second method focuses on syntax, or the higher-order arrangements of motifs that drive cooperative binding. Using the example of the nano motif, the neural network is able to detect important nucleotides flanking the core site and identify periodic patterns precisely at ten and a half base pairs, indicating that nano binds DNA in a way that involves something happening on the same side of the DNA helix.

  • 00:50:00 In this section, the speaker discusses a soft syntax preference in DNA that is demonstrated through a preferred spacing of multiples of ten and a half base pairs. The speaker shows how the model is able to learn this syntax through the presence of subtle spikes in signal seen in the genome, allowing it to co-localize with specific sites and learn the syntax that drives binding. The speaker also describes experiments conducted in silico to gain insight on how syntax drives binding of different proteins, including a synthetic experiment where two motifs are embedded in a random sequence and the spacing between them is changed to predict binding of proteins, as well as a in silico CRISPR experiment where actual enhancers are mutated and the effects of binding of four proteins are predicted by the model. The speaker notes that the syntax is learned in higher layers of the model and shows that its removal causes the syntax to disappear completely.

  • 00:55:00 In this section, the speaker explains a synthetic experiment using an enhancer that is bound by OP4 and nano to show the effects of deleting specific motifs. The model predicts the effects of deleting the motifs and the experiments show the same effect, thus validating the model's predictions. The speaker introduces BPNet, a model that can analyze any kind of assay, including genomic data, using interpretation frameworks that uncover biological stories about how syntax affects TF cooperativity. The models can make predictions that are validated through high-resolution CRISPR experiments. The talk ends with a discussion of a recent paper on deep learning-based enhancement of epigenomic data with ATAC-seq, which is a collaboration between the speaker's team and the Bowing Rosenthal lab.

  • 01:00:00 In this section, the concept of chromatin accessibility via sequencing is explained. The peaks in the coverage track represent the active regulatory regions of the genome, allowing identification of active regulatory elements in different types of cells or tissues. Ataxi can also be performed at the single-cell level, providing higher resolution into biology. However, data quality can be an issue, as sequencing depth, sample preparation, and the number of cells in a single-cell Ataxi experiment can all impact the results. Attack works as a deep learning model developed to address some of these issues.

  • 01:05:00 In this section, the speaker discusses the AttackWorks tool, which takes in the coverage track from a noisy experiment and uses a residual neural network architecture to denoise and enhance the ataxic signal, as well as identify locations of peaks or accessible chromatin sites. The model uses one-dimensional convolutional layers and dilated convolutions, and includes a multi-part loss function that measures the accuracy of the denoised coverage track and classification accuracy of peak locations. Unlike other models, AttackWorks only takes in coverage data rather than genome sequence in order to be more transferable across different types of cells. The speaker explains the simple training strategy used to train the model and shows example results of its use on different human cell types.

  • 01:10:00 In this section, the speaker explains how deep learning can be used to improve the quality of low-coverage ATAC-seq data by denoising the signal and enhancing peaks that were previously hard to identify. They show examples of how attack works, a deep learning model, is able to distinguish peaks from nearby noise and accurately identify the location of accessible chromatin in different cell types, even in new data that wasn't present in the training data. They also discuss how attack works can reduce the cost of experiments by producing the same quality results for less sequencing. Additionally, they demonstrate how attack works can handle low quality ATAC-seq data by cleaning up background noise and identifying peaks that match closely with high quality data. Finally, they measure the performance of attack works by looking at the enrichment of coverage around transcription start sites.

  • 01:15:00 In this section, the speaker discusses how deep learning can be applied to address the issue of limited ability to study small populations of cells in single-cell ataxic data. They can randomly select a subset of cells from an abundant cell type and use those to get a noisy signal. They can then train an attack works model to take the signal from a few cells and denoise it to predict what the signal from many cells would
    look like. Once they have this trained model, they can apply it to small populations of very few cells to predict what the data would have looked like if they had more cells to sequence. This approach significantly increases the resolution at which they can study single-cell chromatin accessibility, and they show that the models are transferable across experiments, cell types, and even species.

  • 01:20:00 In this section, the speaker discusses a single cell sequencing experiment on hematopoietic stem cells, which can differentiate into either the lymphoid or erythroid lineage of cells. The experiment revealed heterogeneity across the single cell population and identified sub-populations of cells that are primed to differentiate into one of the two lineages. The team used ATAC-seq to denoise the signal and identify specific regulatory elements that control the lineage priming process. They acknowledge the team involved in the project and invite students to reach out for internships or collaborations.
Regulatory Genomics - Deep Learning in Life Sciences - Lecture 07 (Spring 2021)
Regulatory Genomics - Deep Learning in Life Sciences - Lecture 07 (Spring 2021)
  • 2021.03.16
  • www.youtube.com
Deep Learning in Life Sciences - Lecture 07 - Regulatory Genomics (Spring 2021)6.874/6.802/20.390/20.490/HST.506 Spring 2021 Prof. Manolis KellisDeep Learnin...