Machine Learning and Neural Networks - page 23

 

Machine Learning for Pathology - Lecture 19



Machine Learning for Pathology - Lecture 19 - MIT Deep Learning in the Life Sciences (Spring 2021)

The lecture covers various aspects of the application of deep learning in computational pathology, including the challenges and limitations of the technology. The speaker discusses the need for caution in trusting algorithms blindly and emphasizes the importance of understanding what a network is learning. The lecture explores several examples of how deep learning is being used in cancer diagnosis, prognosis, and treatment response assessment to develop prognostic and predictive tools for precision medicine. The speaker also discusses the challenges of developing multi-drug treatments for tuberculosis and proposes various lab projects to tackle the issue. Overall, the lecture underscores the potential of deep learning in pathology, while also acknowledging its limitations and the need for a multi-disciplinary approach to ensure its effective deployment in clinical settings.

In this YouTube video titled "Machine Learning for Pathology - Lecture 19 - MIT Deep Learning in the Life Sciences (Spring 2021)," the speaker discusses their team's attempts to address batch to batch and cell to cell heterogeneity in machine learning for pathology using typical variation normalization (TVN) and a k-nearest neighbor approach. They also describe using morphological profiling to classify drugs based on their effects on bacteria and developing a data-driven approach to designing and prioritizing combinations of drugs using both supervised and unsupervised learning. Additionally, the speaker thanks her lab members for their contributions to drug synergy versus antagonism studies, highlighting the importance of considering the larger context for understanding and advancing research in the field.

  • 00:00:00 In this section, Anand Madabhushi discusses the impact of deep learning in the field of computational pathology, specifically with regards to medical image analysis. While the digitization of pathology has made it a hotbed for the application of deep learning due to the vast amount of data available, Madabhushi cautions that the specialized methodologies that involve hand-crafting features through decades of expertise may not have been surpassed by deep learning methods. He also provides some statistics on cancer diagnosis and mortality rates to underscore the importance of accurately diagnosing cancer at an early stage through the use of imaging. Madabhushi hopes to share his lessons learned and thoughts on where and how deep learning can be most useful in this field.

  • 00:05:00 In this section, the speaker discusses the issue of overdiagnosis and overtreatment of cancers, particularly with indolent ones like prostate cancer. Despite advances in biomarkers and therapeutics, overdiagnosis and overtreatment remain problematic and contribute to financial toxicity for patients. The speaker then explores the potential for machine learning in the context of cancer diagnosis, prognosis, and treatment response assessment to help develop prognostic and predictive tools for precision medicine. While there are already tools like gene expression-based assays, they have limitations and do not account for intra-tumor heterogeneity. Machine learning presents an opportunity to improve upon these limitations and better manage and treat cancers.

  • 00:10:00 In this section, the lecturer discusses the use of digitized pathology slides and advanced machine learning image analysis to identify features and patterns that can't be visually discerned by human pathologists. By identifying individual cells, lymphocytes, and cancer cells, data scientists can use network theory to examine the spatial architecture of individual cells and analyze different quantitative metrics from the spatial arrangement of the individual cells to better understand diagnosis, prognosis, and treatment response of patients. This process allows for a non-invasive and cloud-based approach to pathology analysis.

  • 00:15:00 In this section, the speaker discusses the impact of deep learning in the field of computational pathology where the amount of data in pathology slides has superseded any other medical imaging domains. A neural network was published six years ago which used annotations of individual cells to the stacked sparse autoencoder where it learned based on the annotations of the cells, allowing the neural network to pick up on smaller details such as the gradients and rough elliptical shapes of the cells. The network was trained on patches that had cells and didn't have cells that were hand-labeled by students breaking the image down into a series of bounding boxes. Although some cells were missed, the network was able to pick up on the nuances of the different cell types.

  • 00:20:00 In this section, the speaker discusses the limitations of deep learning in pathology, specifically in regards to staining and annotations. Staining can have a significant impact on the fidelity of segmentations, and the network wasn't trained in the most robust way due to the discrepancy between small and large cells. The speaker also discusses their work in training a CNN algorithm based on unsupervised feature generation to distinguish between normal hearts and those at risk for heart failure. The CNN algorithm outperformed the pathologists, achieving an AUC of 0.97 compared to the pathologists' AUC of only 0.74.

  • 00:25:00 In this section, the speaker discusses a surprising discovery they made while running the same algorithm on two sets of patients from the same institution and scanner. Despite no differences in the pathology of the images, the second set's AUC dropped dramatically due to a small software upgrade that subtly changed image features. This underscored the need for caution in blindly trusting algorithms, even in seemingly controlled settings. The panel in f also showed that while unsupervised feature generation with CNNs mainly learned convolutions that were sensitive to pre-analytic sources of variation, it also highlighted the importance of certain types of cells and their spatial arrangements. This led to a subsequent approach that generated an AUC comparable to the initial high score but with more resilience to variations across sites and canvases.

  • 00:30:00 In this section, the speaker discusses the importance of understanding what a network is learning and being cautious about trusting brute force algorithms in medical diagnosis. He shares an example of a network that learned to distinguish between huskies and wolves based solely on the presence of snow in the background, which emphasizes the need for caution when interpreting network results. Despite these limitations, the speaker identifies the utility of deep learning in detection and segmentation tasks in pathology and shares an interactive tool called Quick Annotator, which enables users to segment out a few representative examples, train a network in the background, and fine-tune the results in an interactive learning mode.

  • 00:35:00 In this section, the speaker discusses the challenges with the annotation process for pathology images, particularly the lack of time available for pathologists. To address this issue, the speaker explains how handcrafted features can help improve the efficiency of the annotation process. They give examples of using deep learning to identify different tissue compartments and types of cells, and then invoking graph networks to look at spatial statistics and the interplay of different cell types within tissue compartments. The speaker also describes how deep learning was used to segment out the collagen fibers and assign a vector to their orientation, which was then used to determine the entropy and prognostic value for breast cancer patients. Finally, the speaker presents a new study on prostate cancer that uses deep learning to do the segmentation of glands and then looks at the spatial arrangement and architecture of the glands to predict which patients will have recurrence after surgery.

  • 00:40:00 In this section, the speaker discusses a head-to-head comparison between a commercial molecular assay for predicting prostate cancer outcomes and an image-based approach using deep learning algorithms. The results showed that the image-based approach combined with two simple clinical factors performed almost twice as well as the costly molecular assay. Moreover, the image-based approach using deep learning algorithms yielded interpretable and validated features, which could be analyzed at a much lower cost compared to the molecular assay. The speaker also highlighted the need for interpretability in clinical applications of deep learning and emphasized the importance of handcrafted feature engineering in conjunction with deep learning approaches.

  • 00:45:00 In this section, the focus is on the challenges of interpretability in machine learning for pathology, particularly in the context of designing multi-drug therapies for tuberculosis (TB). The lack of interpretability poses a significant challenge for clinicians, who need to understand the representations underlying the models to trust their decisions. The speaker emphasizes the need to constantly question the network and not take anything for granted. They also discuss the importance of starting with the simplest methodology first and deciding when to use deep learning. The lab's work on TB highlights the difficulty in treating the disease, the need for multi-drug therapies, and the significant heterogeneity involved.

  • 00:50:00 In this section, the speaker discusses the challenges of developing multi-drug treatments for tuberculosis due to the diverse micro-environments of the bacteria in the lung, which require different drugs to ensure susceptibility. The speaker notes that while there are currently many drugs available for TB treatment, the vast unexplored combination space makes it difficult to test every potential combination. The speaker proposes two lab projects to tackle this issue: first, narrowing down the single drug space through imaging to identify the pathway of action of new drugs, and second, using machine learning to make systematic combination measurements and develop classifiers to predict the most effective novel combinations. The lab uses time-lapse imaging to capture the changes in the bacteria's cell morphology to assess different treatment outcomes.

  • 00:55:00 In this section, the speaker describes a project that used unsupervised learning and clustering to associate similar drug profiles in E.coli. They hypothesized that when profiles look the same, those drugs have a similar mechanism of action. They applied this idea to TB, but the cells did not take up the stain as expected, and the morphological features did not look very distinct from each other. However, they still found statistically significant differences from untreated cells in some treatment groups. The typical pipeline for cytological profiling was established, and they hoped to do a classification trial to try and figure out which treatment groups looked most similar to each other. They found that the pathogens were responding to drugs, but were diverse in their mechanism of response, and had extremely thick cell walls making it difficult for drugs to get in.

  • 01:00:00 In this section of the lecture, the speaker discusses their team's attempts to address the batch to batch and cell to cell heterogeneity of their experiments in machine learning for pathology. They tried using a neural net, which didn't work due to the variable data. They then used a method called typical variation normalization (TVN), developed by their collaborator Mike Ando at Google, to align the covariance matrices produced by the principal component analysis (PCA) of the untreated controls from each experiment to reduce non-biological variations. They also incorporated cell-to-cell heterogeneity metrics and shifted from using PCA to a k-nearest neighbor approach to capture the subtle morphological changes. They used a stochastic approach to avoid fragility and selected a new set of untreated controls for each classification trial.

  • 01:05:00 In this section, the speaker describes the process of using morphological profiling to classify drugs based on their effects on bacteria. The process involves treating bacteria with a low and high dose of a drug, fixing and staining the bacteria, extracting features, normalizing the data, and performing stochastic simulation. The resulting consensus classification is about 75% accurate, and a network diagram is used to visualize the connections between drugs. However, the speaker notes that one drug, bedaquiline, was misclassified as a cell wall acting agent, which led to the hypothesis that it was inducing an energy crisis in the bacteria. This hypothesis was confirmed by growing the bacteria on fatty acids, which resulted in a different classification.

  • 01:10:00 In this section of the lecture, the speaker discusses the mechanism of action of the drug Bedaquiline on tuberculosis, and how it depends on the metabolic state of the bacteria. The speaker also describes using morphological profiling to determine proximal damages and secondary effects of antibacterials on TB. They explain that this method provides a targeted approach to help direct researchers towards the pathway space they should focus on for secondary studies. The speaker also touches on measuring combinations of drugs using a checkerboard assay, which is traditionally inefficient for high-order combinations in TB treatment.

  • 01:15:00 In this section, the speaker discusses the challenges associated with measuring high-order drug combinations in tuberculosis and presents a solution called Diamond (Diagonal Measurements of n-way drug interactions). Diamond is a geometric optimization of the checkerboard assay that preserves the unit of a dose-response curve and measures the most information-rich parts of the checkerboard. By projecting a line, the speaker explains how the degree of drug interaction can be quantified with the fractional inhibitory concentration. Diamond has been used to efficiently measure up to 10-way drug combinations. The speaker discusses a large dataset that was used to tackle the two major issues in designing multi-drug combinations using in vitro studies in tuberculosis. The study measured all single, pairwise, and three-way combinations of drugs in vitro across eight different growth environments to computationally merge them together, modeling what happens in different animal models. The speaker concludes that the drug interaction profiles are highly dependent on the growth environment, and there is no single combination that is synergistic across all conditions.

  • 01:20:00 In this section, the speaker discussed their data-driven approach to designing and prioritizing combinations of drugs using machine learning. They utilized both supervised and unsupervised learning to assemble their data into a data cube and found a strong signal that delineates combinations based on whether they would be better than the standard of care or not. They also found a way to limit the number of growth conditions they make their measurements in using different supervised learning methods such as random forest models. The speaker highlighted that the simpler approach worked better for them to put a path forward on how best to explore the combination space systematically and efficiently using validated in vitro models. Overall, their approach could help reduce the number of in vitro experiments and lead to the best combinations of drugs.

  • 01:25:00 In this section, the speaker thanks the individuals in her lab who have worked on various difficult and messy projects, which include drug synergy versus antagonism studies. These studies ultimately help to provide a larger context for machine learning and deep learning in the life sciences, highlighting that they are a small piece of a much larger equation. The importance of considering this larger context is emphasized as it is not always the right approach, but necessary for understanding and advancing research in the field. Overall, the speaker's talk was very illuminating and provided valuable insights into the intersection of machine learning and pathology.
 

Deep Learning for Cell Imaging Segmentation - Lecture 20



Deep Learning for Cell Imaging Segmentation - Lecture 20 - MIT ML in Life Sciences (Spring 2021)

In this video, the speakers discuss the use of deep learning for cell tracking, which involves determining the movement of cells in time-lapse imaging. They explain that traditional manual tracking methods are costly and time-consuming, and that deep learning methods can significantly speed up the process while also providing higher accuracy. The speakers discuss various deep learning architectures for cell tracking, including U-Net, StarDist, and DeepCell. They also note that one of the challenges in cell tracking is distinguishing between cells that are close together or overlap, and that methods such as multi-object tracking or graph-based approaches can help address this issue. The speakers emphasize the importance of benchmarking different deep learning methods for cell tracking and providing open access datasets for reproducibility and comparison. They also highlight the potential applications of cell tracking in various fields, such as cancer research and drug discovery.

  • 00:00:00 In this section, Juan Casado discusses the concept of image-based phenotyping, which is a method for understanding biological systems through the use of microscopy and other imaging techniques. He explains how images of biological structures, like cells, can be quantified for different phenotypes, including cell size and DNA content, and used to guide decisions about treatments and drug discovery. Casado gives an example of a successful drug candidate for leukemia that was discovered through the precise measurement of cell size using microscopy images, leading to its eventual approval by the FDA. He highlights the potential impact of image-based profiling in the field of biology and drug development.

  • 00:05:00 In this section, the focus is on the challenge of comparing populations of cells that have different characteristics and identifying which treatments are effective. This requires more information and strategies for extracting information from cell images, which is where image-based profiling comes in. This involves extending the morphology of cells or the state of cells using images to extract quantitative information for drug discovery and functional genomics. The two computational problems associated with this approach are cell segmentation and single cell representation learning, where the aim is to identify where single cells are in images without having to spend time and energy adjusting segmentation algorithms for different image types. Ultimately, the goal is to create segmentation algorithms for cells that work as well as phase detectors in natural images.

  • 00:10:00 In this section, the speaker talks about the BioImage Challenge 2018, which was aimed at making computer vision technologies work for segmentation in biology. The challenge involved creating an annotated dataset, splitting it into training and testing partitions, defining a metric of success, and providing feedback to the participants through a scoring system based on intersection over union. Participants were expected to use a supervised machine learning model to learn the relationships between inputs and outputs and generate a segmentation map of the image that they provided as input. The winners were those who were able to segment the final test set more accurately according to the metric used.

  • 00:15:00 In this section, the speaker discusses the top three competitors in a cell imaging segmentation competition, and their use of different architectures for their machine learning models. The third-place team used the Mask RCNN architecture, which decomposes an image into regions and generates candidates that are reviewed by a network to determine if they are real objects or not, before identifying the exact bounding box and mask to separate the object from the background. The second-place team used an image pyramid network, which computes multiple feature maps to generate intermediate outputs and aggregates information from all different resolutions to generate the final output. The speaker notes that although the architecture plays a role in achieving high accuracy for cell segmentation, the way in which regular calibration and cross-validation experiments are run is also crucial.

  • 00:20:00 In this section, the speaker discusses a novel approach to image segmentation. Rather than using binary masks to determine the location of objects in an image, the solution involves predicting distance maps or angle maps that measure distances in different directions from the center of the cell. The outputs were manually engineered to provide more precise measurements of object location, which resulted in second place in the competition. Although this idea was novel at the time, subsequent works have evaluated its value and found it to be robust, especially for crowded images with many objects. The encoder-decoder architecture used was not innovative, but the novelty came from replicating the exact architecture in 32 different models, forming an ensemble, which helped them win the competition.

  • 00:25:00 In this section, the speakers discuss the performance of an ensemble approach versus simpler models for cell image segmentation. They explain that while the ensemble approach can be computationally intensive, simpler models may still be effective in practice. They also discuss the limitations of competitions and note that it would be helpful to analyze individual models within an ensemble in order to prune them down to only the most accurate ones. The speakers then go on to evaluate the improvements that can be made in facilitating biology research through segmentation, showing that optimizing algorithms for specific image types can be time-consuming and accuracy can vary by image type. They also note that imbalances in annotations and difficulty in segmenting certain image types can present challenges in real-world situations.

  • 00:30:00 In this section, the speaker discusses the challenges of parsing different types of imaging techniques, from small fluorescent to the pink and purple images that are harder to segment. There are different approaches to segmenting images such as training one model per image type or using classical algorithms with adjusted parameters. Additionally, there are now pre-trained models available for cell segmentation, such as Nucleizer, CellPose, and Mesmer. However, there are still open challenges in segmentation, such as collecting larger data sets and optimizing the time experts spend on identifying objects. The speaker also briefly touches on the importance of measuring the phenotype of cells using machine learning methods that can learn features beyond classical morphology measurements.

  • 00:35:00 In this section, the speaker discusses the use of machine learning methods in cell imaging segmentation for drug discovery. Perturbation experiments are used where cells are treated with compounds, but batch effects can cause noise and confound understanding of the phenotype. As there is no ground truth, a weakly supervised learning method is used, where a neural network is used to classify the applied compound. The goal is to obtain features to organize the cells in a meaningful way, which can inform whether compounds are similar or not. The evaluation involves observing clusters of compounds that share similar biological effects, with the aim of reducing the search space to useful compounds. Comparison of deep learning features versus classical features shows a significant difference.

  • 00:40:00 In this section, the speaker discusses the use of deep learning for cell imaging segmentation, specifically in determining biologically meaningful connections among compounds and identifying the impact of mutations in cancer. By comparing the original type of a gene to a mutant, researchers can measure the phenotypic similarity between them to determine whether the mutant is driving the cancer or not. However, batch correction remains a challenge in deep learning, as it can influence the features learned from the images. The speaker suggests using domain adaptation, where a neural network is used with two heads for compound classification and batch determination. The negative gradient is then used to destroy potential information associated with the batch, resulting in clearer phenotypic determination. Overall, the speaker concludes that images are a great source of information for biological discovery but also acknowledges the open challenges in representation learning and explainable models.
 

Deep Learning Image Registration and Analysis - Lecture 21



Deep Learning Image Registration and Analysis - Lecture 21 - MIT ML in Life Sciences (Spring 2021)

In this lecture, Adrian Dalock delves into the topic of aligning medical images and the optimization problem behind it. He proposes a novel method called voxel morph, which involves using unlabeled data sets to train neural networks for image registration. The speaker also discusses the challenge of robustness to new data and sequences that neural networks have not seen before and proposes simulating diverse and extreme conditions to train robust models. The speaker compares classical registration models to voxel morph and synthmorph models, with the latter being remarkably robust. Lastly, the speaker discusses the development of a function that generates templates based on desired properties rather than learning a template directly and the potential use of capsule video endoscopy for detecting colon abnormalities.

The speaker in this lecture discusses various machine learning approaches to overcome the lack of medical data, specifically in the context of colonoscopy videos for polyp detection. They introduce a deep learning image registration and analysis architecture that utilizes pre-trained weights and random initialization to address domain shift and improve performance. The lecture also covers weakly supervised learning, self-supervised learning, and weakly supervised video segmentation. The speaker acknowledges the challenges faced in using machine learning approaches in medical data analysis and encourages testing these approaches in real medical procedures to reduce workload.

  • 00:00:00 In this section of the lecture, Adrian Dalock discusses the importance of aligning medical images and the optimization problem behind it. He explains that aligning images to a common reference frame is central to analyzing medical images, as it allows for the identification of structures and diseases, as well as comparison between subjects. However, the traditional alignment step was very time-consuming, taking up to two hours per brain, which hindered the development of sophisticated models. Dalock introduces a significantly faster method, which is less than a minute on a CPU and less than a second on a GPU, and allows for faster and more efficient research in this field. He defines alignment or registration as finding a deformation field that matches up images and has been extensively researched in various domains, including computer vision and computational biology.

  • 00:05:00 In this section, the speaker discusses the evolution of image registration methods, starting with the classical models and progressing to the learning-based methods that emerged three years ago. However, the latter methods, though effective, are hampered by the lack of a ground-truth deformation field to use for supervised data. The speaker proposes a novel method that involves using unlabeled data sets to train neural networks, resulting in more elegant and efficient end-to-end solutions for image registration. The framework involves using the loss functions from classical models to optimize an entire new neural network, resulting in higher accuracy and faster speeds.

  • 00:10:00 In this section, the speaker describes a method for image registration using deep learning techniques, which borrows from classical methods but optimizes a neural network to output deformation fields rather than optimizing the fields directly. The deformation field is applied to all images in a data set, and stochastic gradient techniques are used to optimize the network. The speaker explains how differentiable losses are used to ensure the smoothness of the deformation field, and the results are evaluated by comparing anatomical structures before and after the warping process, as well as measuring volume overlaps. The proposed method, called voxel morph, is able to estimate the output of an optimization procedure and provides an approximation for probabilistic models, offering elegant connections between images, deformation fields and uncertainty estimates.

  • 00:15:00 In this section, the speaker discusses their analysis of training a voxel morph neural network with only a few images, revealing that even with just 10 images, the deformation field output from the network is close to the state of the art. Additionally, the speaker touches on the issue of outlining specific areas of interest, such as the hippocampus in a brain, and how they were able to teach the network to identify this area without actually labeling it by having it perform a "soft segmentation" during training. Lastly, the speaker discusses the challenge of diverse medical images and how training networks on one modality only may limit their ability to work with other modalities, presenting a project that solves this problem.

  • 00:20:00 In this section, the speaker discusses the challenge of making neural networks that are robust to new data and sequences that they have not seen before. They propose simulating diverse and extreme conditions to expose the network to significant variability so that it decides to ignore some outliers, allowing for better generalization to real-world data. To achieve this, they randomly deform images, add different noise patterns, randomly fill in values and intensities and simulate various effects to generate data. They experimented with simulating diverse data for registrations and segmentation papers, and simulating random shapes, which gave them a deformation field that could be used to test the quality of the information.

  • 00:25:00 In this section, the speaker discusses the results of training different models for image registration and analysis. They trained voxel morph models and two versions of the synthmorph model using different metrics for training. The classical models perform well, but the voxel morph models with variability and robustness perform even better. The models that were trained with images of simulated brains or blobs do roughly the same as voxel morph models and better than classical models. However, when it comes to registering between modalities, the models that were trained with same-contrast metrics collapse. Meanwhile, the synthmorph models are remarkably robust, even with real images. However, the model capacity could lead to an issue where the features of real images might not be captured.

  • 00:30:00 In this section of the lecture, the speaker discusses the capacity of machine learning models and how the field is moving towards the use of more parameters. They simulate brain scans with different modalities and compare the performance of classical models, voxel morph, and their method, synthmorph. They found that their method is robust as it is able to completely ignore contrast and only extract the necessary anatomy, which is being done by learning to ignore the response to contrast variation in the features of the network. They also introduce their new method, hypermorph, which learns the effect of hyperparameters on registration fields. The potential of this method is that it only requires training one model and tuning it afterwards, which eliminates the need to train multiple models.

  • 00:35:00 In this section, the speaker discusses a technique called hyper networks, which involves training a small network that takes a hyper parameter value as input and outputs the weights of a larger network that generates deformation fields for image registration. By tuning the hyper parameter value, the deformation field can be adjusted without requiring retraining, and a single hypermorph model can capture a wide range of deformation field variations. This technique can be applied to various machine learning settings beyond image registration and can be useful in allowing for interactive tuning of the model or adjusting it based on validation data. The optimal hyper parameter value varies depending on the data set, the patients' age, and the registration task, among other factors.

  • 00:40:00 In this section of the lecture, the speaker discusses the importance of selecting different hyperparameter values for different regions of the brain when performing image registration. They also compare a model trained on real data with one that was trained on random data, explaining how the former is more susceptible to noise in different regions. They then introduce a project focused on the idea of aligning data to a common reference frame without building a centroidal brain or using a template. Instead, they propose estimating an atlas at the same time as registering images, and the resulting tool is shown to be flexible and able to solve many problems that were previously difficult to solve, such as building separate templates for different populations.

  • 00:45:00 In this section, the speaker discusses the concept of "conditional templates" in deep learning image registration and analysis, which involves learning a function that generates a template based on a desired property (such as age, sex, or genetic information) rather than learning a template directly. By feeding in patient data and age information, the network is able to learn a smooth age-dependent atlas that captures certain effects between different brains, such as changes in ventricle size. The speaker also discusses the potential for genetics-related analysis using similar methods, as well as the use of variational encoders and other machine learning concepts in this field.

  • 00:50:00 In this section of the lecture, the speaker discusses the motivation behind their work on automatic pathology detection for capsule video endoscopy, which is a collaboration between the Norwegian University of Science and Technology and a hospital in Norway. The human colon is susceptible to diseases such as colorectal cancer and ulcerative colitis which erode the smoothness of the colon walls and can lead to bleeding or other complications. Colonoscopies are recommended by doctors for individuals above the age of 50 but may not be accepted by patients. Capsule video endoscopies offer an alternative way to visualize the colon walls and detect abnormalities using a small pill-sized camera that transmits almost 50,000 frames to produce a large amount of data.

  • 00:55:00 In this section, the speakers discuss the challenges of imaging with capsule video endoscopy, in which an ingestible capsule captures images as it travels through the digestive tract. The capsule must be taken on an empty stomach and can miss features in the folds of the colon. Additionally, the capsule can become stuck or face geometric obstacles as it travels through the small intestine, potentially leading to surgery. The resulting video quality is not as good as HD image quality, with limited color and smoothness of transition. Despite these limitations, capsule video endoscopy can aid in diagnosing conditions such as diverticulitis, and doctors look for abnormalities in the video to guide treatment.

  • 01:00:00 In this section of the lecture, the speaker discusses the challenges of using machine learning approaches in medical data analysis, specifically in the context of colonoscopy videos for polyp detection. The main problem is the lack of data due to the expensive and slow nature of medical data acquisition and the difficulty in obtaining labeling by diverse pathologists. The speaker outlines several machine learning approaches to overcome lack of data, such as transfer learning and supervised learning, and explains current deep learning approaches using RGB images, geometric features, and 3D convolutions. Finally, the speaker introduces the wine it approach for polyp detection, which involves using registration to align the colonoscopy images and improve polyp detection performance.

  • 01:05:00 In this section of the lecture, the speaker discusses a deep learning image registration and analysis architecture that utilizes pre-trained weights and random initialization to address domain shift and improve performance in object detection and image segmentation. The architecture consists of two encoders, one pre-trained from ImageNet and the other with randomized weights, along with augmentation to the input images. The learning rates for each encoder depend on the layer they are training on, and binary cross-entropy and the dice loss function are utilized. The architecture is tested on a dataset of videos containing polyps and achieves an F1 score of 85.9 using multiple variations of the same input. Finally, the speaker presents videos showcasing the effectiveness of the architecture.

  • 01:10:00 In this section, the lecturer discusses the challenge of collecting labelled data for an image registration problem and introduces the concept of multiple-instance learning with weak supervision. The assumption is that there's a positive bag with at least one instance of the pathology of interest, while the negative bag always has negative instances. The problem is formulated as finding which frames contain the pathology and can be optimized by predicting the individual contribution from each frame and optimizing the loss on the final video label of the aggregation. It is noted that this problem is challenging due to limited labelled data and the absence of data on individual components, requiring a weakly supervised approach.

  • 01:15:00 In this section, the speaker discusses how they extracted resonance 50 features from videos with pathologies and normal videos, and passed them through residual LSTM blocks that contain bi-directional LSTM with a skip connection. They explain that the goal is to find the alphas which are the contribution of each frame to the final video classification problem. They also discuss exploiting high attention value frames to identify pathologies and separating them apart from the negative classes. The final loss function is a cross-entropy of the video classification and the separation of the bags between positive and negative banks. The speaker then shares how they performed an appellation study to determine where to learn attention, with the best results achieved by attending the final hidden representation and applying it to the final output. The approach was tested against other methods that use metric learning.

  • 01:20:00 In this section, the speaker discusses the use of self-supervised learning in medical imaging and the challenges it poses. They mention that one approach that has found some success is using a jigsaw problem where images are partitioned into patches and reconstructed. However, the issue with medical imaging is that there is no rotation invariant, making it difficult to find meaningful clusters. The speaker suggests that improving video frame localization through domain knowledge, such as understanding how different diseases manifest, could be a useful approach to improving pathology classification.

  • 01:25:00 In this section, the speaker discusses weakly supervised video segmentation and the need to detect where frames are localized in order to provide better explanations in medical settings. They also mention the design of self-supervised pre-test tasks and contrastive learning as new and exciting approaches in this area, with new work being published every day. The speaker acknowledges the icomet project and encourages testing these approaches in real medical procedures to reduce workload. The host expresses appreciation for real practitioners solving medical problems and thanks the speaker for the informative lecture.
 

Electronic health records - Lecture 22



Electronic health records - Lecture 22 - Deep Learning in Life Sciences (Spring 2021)

The emergence of machine learning in healthcare is due to the adoption of electronic medical records in hospitals and the vast amount of patient data that can be utilized for meaningful healthcare insights. Disease progression modeling is discussed utilizing longitudinal data found in disease registries, which can pose challenges due to high-dimensional longitudinal data, missingness, and left and right censorship. The lecture explores the use of non-linear models like deep Markov models to handle these challenges and effectively model the non-linear density of longitudinal biomarkers. Additionally, the speaker discusses the use of domain knowledge to develop new neural architectures for the transition function and the importance of incorporating domain knowledge into model design for better generalization. There is also experimentation with model complexity in regards to treatment effect functions, and the speaker plans to revisit this question on a larger cohort to determine further findings.

  • 00:00:00 In this section, Rahul Krishnan, a senior researcher at Microsoft Research, explains the emergence of machine learning in healthcare due to the digitization of electronic health record data. The adoption of electronic medical record systems in hospitals led to a vast amount of patient data which could be utilized for meaningful healthcare insights. Krishnan highlights the use of disease registries, which are more focused datasets on a single disease, released by non-profit organizations for researchers to study and answer questions. Machine learning techniques such as unsupervised learning are being used to investigate the substructure of these datasets and building tools to aid clinicians. The presentation focuses on disease progression modeling and some of the work that is being done by researchers in this field.

  • 00:05:00 In this section, the speaker discusses disease progression modeling utilizing longitudinal data found in disease registries. Disease progression modeling has existed for decades and attempts to build statistical models that can capture the complex and messy data found in disease registries, including baseline covariates, longitudinal biomarkers, and treatment information. This problem is often posed as unsupervised learning, where models aim to maximize the log probability of observing a patient's longitudinal biomarker sequence conditioned on their baseline information and sequence of interventions. The speaker presents a new approach for disease progression modeling which will be published at ICML this year.

  • 00:10:00 In this section, the speaker discusses the challenges of using electronic health records to model disease progression in the context of multiple myeloma, a rare cancer of the bone marrow. Because the disease is so rare, there are often only a small number of patients to learn from, making it difficult to do good modeling and density estimation. Additionally, healthcare data presents challenges such as high-dimensional longitudinal data with nonlinear variation, missingness, and left and right censorship. The speaker suggests using non-linear models like deep Markov models to handle these challenges and effectively model the non-linear density of longitudinal biomarkers.

  • 00:15:00 In this section, the lecture describes a latent variable model for electronic health records, where the data is generated by the latent variables and observations obtained over time. The model assumes that the choice of medication prescribed by a doctor is dependent on the values of clinical biomarkers obtained from previous observations. The speaker also addresses the issue of missing data, which can be overcome by marginalizing out the missing variables during maximum likelihood estimation. However, for variational inference using an inference network, the model requires approximations to estimate the missing data, and further research is needed to understand how missingness affects the bias of the approximate posterior distribution.

  • 00:20:00 In this section, the speaker explains how a model can be used to predict a patient's medical history by modeling their interactions with a doctor over time. The model uses a latent representation, which changes over time, to predict the patient's medical status. The speaker highlights the challenges in modeling medical data due to non-linearity and the rarity of certain diseases. They explore the use of domain knowledge to develop a new neural architecture for the transition function. The speaker also discusses the use of a global clock and local clocks to track the duration of treatment and elapsed time until a major progression event, respectively. They explain how to approximate the mechanistic effect of drugs and incorporate this knowledge into the model.

  • 00:25:00 In this section, the speaker discusses using pharmacokinetics and pharmacodynamics to approximate the effect of drugs being prescribed for cancer treatment on a patient's tumor. They propose three new neural architectures to model the effect of multiple drugs being given to patients jointly, combining them using an attention mechanism to create a single function. The goal is to do conditional density estimation, using domain knowledge to combat overfitting. The model, called the SSNPK, is applied to a cohort of multiple myeloma patients treated according to the current standard of care, with 16 clinical biomarkers over time, nine indications of treatments, and 16 baseline features.

  • 00:30:00 In this section, the speaker discusses the results of using different models to analyze clinical data, specifically focusing on the use of deep learning and state-space models. They compare the effectiveness of the different models in generalizing to new data, and find that the use of ssnpkpd consistently results in better performance across linear and non-linear baselines. They also conduct an ablation analysis to identify which biomarkers contribute the most to the gains seen in the models, and find that the use of local and global clocks is helpful in modeling the dynamics of the data. Additionally, they use the latent space of the trained model to further explore and understand the behavior of the data over time.

  • 00:35:00 In this section of the lecture, the speaker discusses the results of using the SSNPKPD model for forecasting a patient's future clinical biomarkers based on their baseline biomarkers. The model shows a greater fit to the data compared to a linear baseline, indicating that the latent representations captured by SSNPKPD retain relevant patient history for predicting future clinical biomarkers. The speaker summarizes the main takeaway from the talk, which is the importance of incorporating domain knowledge into model design for better generalization, and highlights the opportunities for future research in combining different data modalities in healthcare. The speaker also notes the ongoing validation of the results in a larger cohort and the possibility of incorporating the model into clinical decision support tools and model-based reinforcement learning frameworks.

  • 00:40:00 In this section, the speaker discusses their experimentation with model complexity in regards to treatment effect functions. They tried variations of the model by creating copies of the treatment effect functions, ranging from three to twelve, and found that there was a point where the additional complexity did not significantly improve performance and even decreased it. However, when they removed some of the treatment effect functions, they found some drop in performance but still outperformed the linear model. The speaker plans to revisit this question of generalization on a larger cohort with the VA to determine the extent of these findings.
 

Deep Learning and Neuroscience - Lecture 23



Deep Learning and Neuroscience - Lecture 23 - Deep Learning in Life Sciences (Spring 2021)

The lecture discusses the interplay between deep learning and neuroscience, specifically in the area of visual science. The goal is to reverse engineer human visual intelligence, which refers to the behavioral capabilities that humans exhibit in response to photons striking their eyes. The speaker emphasizes explaining these capabilities in the language of mechanisms, such as networks of simulated neurons, to enable predictive built systems that can benefit both brain sciences and artificial intelligence. The lecture explores how deep learning models are hypotheses for how the brain executes sensory system processes and the potential applications beyond just mimicking the brain's evolution. Furthermore, the lecture shows practical examples of how neural networks can manipulate memories and change the meaning of something.

This video discusses the potential of deep learning in understanding the cognitive functions of the brain and leveraging this understanding for engineering purposes. The speaker highlights the relevance of recurrent neural networks with their memory and internal dynamics capabilities in this area. The lecture explores the ability of neural systems to learn through imitation and how this can be used to learn representations, computations, and manipulations of working memory. The video also covers the difficulty in finding evidence of feedback learning as a learning condition and the potential of error-correcting mechanisms to tune the system. The lecture concludes by reflecting on the diversity of topics covered in the course and how deep learning can aid in interpreting cognitive systems in the future.

  • 00:00:00 In this section, the speaker discusses the interplay between deep learning and neuroscience, specifically in the area of visual science. He explains how deep learning models can be viewed as scientific hypotheses for how aspects of brain function may work and how neuroscientists and cognitive scientists evaluate the quality of those hypotheses with respect to the data. Carlo's talk focuses on the goal of reverse engineering human visual intelligence, which refers to the behavioral capabilities that humans exhibit in response to photons striking their eyes. He emphasizes the importance of explaining these capabilities in the language of mechanisms, such as networks of simulated neurons, to enable predictive built systems that can benefit both brain sciences and artificial intelligence.

  • 00:05:00 In this section, the lecturer discusses visual intelligence and how the brain estimates what is out there in a scene, such as identifying cars or people; however, predicting what will happen next and other physics-driven problems are still a challenge for scientists to understand. Despite this, scientists have made significant progress in modeling the fundamental visuals that we process in each 200-millisecond glimpse of a scene, which is also known as core object recognition. The lecturer provides examples of tests that measure our ability to recognize objects and compare them to other species, such as computer vision systems and non-human primates like rhesus monkeys.

  • 00:10:00 In this section, the speaker discusses the ability of humans and primates to distinguish between objects. He notes that humans and primates perform similarly on visual recognition tasks, with humans only performing slightly better. Additionally, the speaker discusses the deep learning systems and how they compare to the visual recognition abilities of humans and primates. The speaker then switches to discussing the areas of the rhesus monkey brain involved in visual recognition tasks and highlights the infra-temporal cortex as the highest level area. Finally, the speaker notes the typical time scales for neural activity patterns to emerge in the infra-temporal cortex and how it matches with the time needed for overt behavioral sampling skills.

  • 00:15:00 In this section of the video lecture, the speaker discusses how researchers study the response of individual neurons in the visual cortex of animals like monkeys to images using invasive recording electrodes. By measuring patterns of electrical activity from neurons in response to different images, researchers can quantify the response using mean spike rates. These patterns of activity can be clumped together by similarities in their selectivity, and special areas of clustering for certain types of objects, like faces, have been identified in the visual cortex. The use of chronic recording arrays allows researchers to record from the same neural sites for weeks or months and measure responses to thousands of images.

  • 00:20:00 In this section, the speaker explains an experiment in which neural data was recorded while an animal was fixating or performing a task or observing images. By training linear decoders on small samples of data, patterns emerged that were indistinguishable from those seen in humans and monkeys. This allowed for the development of a powerful set of feature spaces that could be used in brain-machine interface applications to visualize certain percepts. The speaker then discusses the non-linear transformations that occur between the neural activity and the image, suggesting that this area is where deep learning and vision science come together.

  • 00:25:00 In this section, the speaker discusses how deep convolutional networks were initially built based on principles known in neuroscience, such as the concept of edge detection, filtering, output nonlinearities, and gain control. However, as these models were tested against neural data in visual areas of the brain, they fell short and were unable to predict the response patterns of individual neurons in V4. While these models were hypothesis builds for neuroscientists, they were inadequate in explaining how the visual system works. Despite the failure of these early models, they have served as inspiration for ongoing work in separating the learned filters in deep networks from those observed in V1.

  • 00:30:00 In this section, the speaker discusses how the collaboration between neuroscience and deep learning has allowed for the optimization of unknown parameters in artificial neural networks, resulting in models that closely mimic the neural response patterns of the primate brain. The speaker notes that the breakthrough came in implementing a loop that allowed engineers to optimize the micro parameters of the filters in deep convolutional neural networks. By doing this, the models that were produced were viewed as new hypotheses about what might be going on in the visual system, allowing for comparison with biological neural networks in the brain. The speaker goes on to show examples of how these comparisons have been made, resulting in early mechanistic hypotheses about brain function. Overall, this collaboration has allowed for the development of in silico ventral stream neurons that closely mimic those found in the biological ventral stream, leading to greater insight into how the brain processes visual information.

  • 00:35:00 In this section, the speaker explains that the deep learning models they have developed are hypotheses for how the brain executes sensory system processes, specifically in the domain of visual object recognition. They note that these models are not perfect and have some discrepancies, which they aim to optimize and improve upon in the future. The speaker also discusses the broader applications of deep learning in engineering and AI, emphasizing that these models can be used as a tool to guide further scientific understanding and optimization. They conclude by stating the need for more data and models towards more accurate representations of the brain's processes.

  • 00:40:00 In this section, the speaker discusses the potential for innovation in deep learning and artificial intelligence beyond just mimicking the brain's evolution. They suggest that most of the innovation will come from the choice of architecture, and the optimization tools will be available to allow for that optimization. Recurrent questions may give insight into the subconscious elements of cognition, and the anatomy of the brain connects the ideas of recurrence, which may lead to downstream areas that involve more in cognition. The speaker also touches on skip connections, gray areas, and how the work being done now is attempting to approach this issue.

  • 00:45:00 In this section of the video, the speaker discusses the concept of neoteny and how it affects the proportion of hard-coded functions and filters in the visual cortex across different species. As you move up the system, there is more plasticity in the brain, and monkeys have areas up to a certain level, while humans have more brain tissue, allowing for more flexibility. The speaker believes that there is plenty of room for flexibility in the brain, and although it is part of our primate system, part of the brain is beyond that, and that is okay. The next speaker then discusses their work on thinking about brains as recurrent neural networks and how studying the intersection between artificial and real neural systems can help us understand how they work.

  • 00:50:00 In this section, the focus is on how efficient and sparse coding can be used to learn an efficient representational basis in artificial and real neural systems. By studying brain-like behaviors in recurrent networks, principles can be found that expand the capabilities of artificial recurrent networks and help to understand how the real ones work. Recurrent neural networks learn to store and modify internal representations and memories, allowing for them to be able to separate overlapping signals in a way similar to the cocktail party effect. Real neural systems are excellent at storing and manipulating representations, as seen in the brain region called the working memory in recurrent networks. The goal is to find principles that expand the capabilities of artificial recurrent networks and help to understand how the real ones work.

  • 00:55:00 In this section of the lecture, the position of a rat is decoded from neurons called place cells, which track the rat's movement as it moves around in space. The rat can also manipulate its neural representation to plan future trajectories before it even moves. The lecture then explores how neural networks can manipulate memories, such as the songbird's ability to learn to sing by imitating adults. The lecture discusses how neural networks can learn complex processes of manipulating information by observing examples, and introduces the concept of a chaotic attractor as a memory model, and a simple non-linear dynamical system called a reservoir as a neural network model. The control parameter of the reservoir is used to modify the network's representation of whatever memory it has learned, and the lecture provides practical examples of how this control can change the meaning of something.

  • 01:00:00 In this section, the speaker discusses how context modulatory ability impacts learning and capacity for the neural network. They explain that biasing the network with context variables means that more data is needed for training to learn common parameters. The speaker also talks about using the reservoir computing method to store memories in neural networks and how simple schemes of learning to imitate observed inputs are enough to store memories. They then discuss modifying memories inside neural networks by looking at translation of attractors in the x1 direction and changing the value of the context parameter c for each translation.

  • 01:05:00 In this section, the speaker discusses the ability of reservoirs to learn to interpolate and extrapolate transformation operations on its internal representation of attractor manifolds. The team provided four training examples of a Lorenz attractor squeezed in the x1 direction and performed training and feedback. The reservoirs were found to learn to interpolate and extrapolate transformation operations that can be arbitrary, including a stretch or a multi-variation. The team also found that reservoirs can predict the global bifurcation structure of the Lorenz attractor and predict the bifurcation diagrams of several other dynamical normal forms, such as saddle mode and supercritical pitchfork bifurcations. The neural networks can even learn to predict non-dynamical kinematic trajectories, such as in the example of a modified Jansen linkage.

  • 01:10:00 In this section of the lecture, the speaker discusses a method called invertible generalized synchronization, which is a way to formalize the idea of mapping stimuli to neurodynamics in a neural system. The speaker explains that to form a representation, neurons must form a distributed representation instead of individually encoding specific parts of the input stimuli. They must also be able to drive themselves with their own representation, which is the key mechanism behind storing inputs as memories. Finally, the speaker demonstrates that recurrent neural networks can sustain chaotic memories, allowing them to translate and transform memories.

  • 01:15:00 In this section, the speaker discusses the ability of neural systems to learn by imitating seen examples and how this can be used to learn representations, computations, and manipulations of working memory. The conversation then shifts to the question of feedback learning and how it applies to the models presented. While there is evidence of linear separability and reconstructability of terms in certain parts of the visual cortex, the speaker notes the difficulty in finding evidence of feedback learning as it is a pretty extreme learning condition. There is a suggestion of using error-correcting mechanisms to tune the system, but the idea of a fixed set of parameters where the outcome is judged against the expectation of the outside world and the formation of salient memories when expectation deviates greatly is also discussed.

  • 01:20:00 In this section, the lecturer emphasizes the potential of deep learning in understanding the cognitive functions of the brain and engineering them. Recurrent neural networks, with their ability for memory and internal dynamics, are especially relevant in this area. The lecturer encourages thinking of these systems as living and breathing entities, rather than just function approximators. The core of these cognitive systems lies in the RNN, although they can be augmented with convolutional neural networks for inputs and outputs. The hippocampus and the connections it makes to different aspects of the nervous system are cited as a fascinating example of how memories are encoded across an interacting system of co-firing neurons. The lecture concludes by reflecting on the diversity of topics covered in the course and how deep learning can aid in interpreting cognitive systems in the future.
Deep Learning and Neuroscience - Lecture 23 - Deep Learning in Life Sciences (Spring 2021)
Deep Learning and Neuroscience - Lecture 23 - Deep Learning in Life Sciences (Spring 2021)
  • 2021.05.19
  • www.youtube.com
MIT 6.874/6.802/20.390/20.490/HST.506 Spring 2021 Prof. Manolis KellisDeep Learning in the Life Sciences / Computational Systems BiologyPlaylist: https://you...
 

MIT 6.S192 - Lecture 1: Computational Aesthetics, Design, Art | Learning by Generating



MIT 6.S192 - Lecture 1: Computational Aesthetics, Design, Art | Learning by Generating

This lecture covers a variety of topics related to computational aesthetics, design, and art. The role of AI in democratizing access to art creation, design automation, and pushing the boundaries of art is discussed, as well as the challenges in quantifying aesthetics and achieving visual balance in design using high level and low-level representations. The lecturer also highlights the potential of computational design to uncover patterns and convey messages effectively, with examples involving color semantics and magazine cover design. Crowdsourcing experiments are used to determine color associations with various topics and the potential applications of this method in different areas are explored. Overall, the lecture introduces the role of AI in creative applications and the potential to revolutionize the way we create art, design, and other forms of creative expression.

The video discusses the use of computational aesthetics, design, and art to generate creative works using generative models, such as StyleGAN and DALL-E. The lecturer also emphasizes the importance of learning by generating and encourages viewers to break down problems and use data to come up with innovative and creative solutions. However, the speaker also addresses the limitations of generative models, such as biased data and the ability to generalize and think outside the box. Nonetheless, the lecturer assigns students to review the provided code and experiment with the various techniques for generating aesthetically pleasing images while encouraging participation in a socratic debate between Berkeley and MIT on computational aesthetics and design.

  • 00:00:00 In this section of the lecture, the speaker discusses the motivations for implementing AI in art, aesthetics, and creativity. They explain that art is a key aspect of human evolution and communication, and AI may democratize access to art creation, nurture creativity, and push the boundaries of art. With millions of photos uploaded every day and 650 ads exposed per day, AI can help design good designs automatically and understand what makes a good or bad design. Finally, the speaker argues that AI will play a critical role in the future, where AI will create movies, plays, and more every second, leading to the question of whether we want to shape that future.

  • 00:05:00 In this section, the speaker discusses the role of AI in art, aesthetics, and creativity. He explains that convolutional neural networks (CNNs) can be biased towards textures, but this can be debiased by generating different styles and incorporating them into the data. Furthermore, he mentions that in 2018, a painting made using a generative model was sold for half a million dollars. He also addresses the question of whether aesthetics can be quantified, stating that philosophers and artists have been discussing this topic for generations. Lastly, he touches on the goals of the course, which involve learning how to apply AI algorithms to creative applications and solving interesting problems.

  • 00:10:00 In this section of the video, the instructor responds to a question about whether prior knowledge of deep learning is necessary for the course. He explains that while the course will touch upon deep learning, it is not the primary focus and that there are other resources for learning the topic. He then goes on to discuss his previous work on quantifying aesthetics, noting that measuring aesthetics is not a new concept and that there are already established models, like Birkhoff's model from the early 20th century, that can be used to quantify aesthetics in various contexts such as visual design, poetry and even interfaces.

  • 00:15:00 In this section, the speaker discusses the quantification of aesthetics and the challenges in achieving this, using visual balance as an example. Good representations are necessary, both high level and low level. High-level representations can include visual balance and rhythm while low-level representations rely upon features extracted using neural networks. Data is also necessary to quantify aesthetics, including what kind of data is used and where it comes from. The speaker explains how balance is often taught to designers by intuition, but engineers want to quantify it and determine its meaning in design.

  • 00:20:00 In this section, the speaker discusses the notion of visual rightness and balance in design, also known as harmony. He talks about the work of Arnheim, who suggested that placing design elements in specific hot spots may create visual balance. The speaker explores whether this hypothesis can be confirmed through data-driven analysis and studies the salient parts of an image using a saliency algorithm, overlaying its results on the structural net. He uses a crawler to collect over 120,000 images from a photography website to study the patterns of saliency on these images.

  • 00:25:00 In this section, a data set with a saliency algorithm was used to fit a mixture of Gaussians in order to find patterns in aggregated images of varying categories such as portraits, architecture, and fashion. The hot spots of salience were analyzed, with a similarity to Arnheim's theory on center of mass and rule of thirds. However, the results may be influenced by the way photographers crop images, as shown in studies on the validity of rule of thirds.

  • 00:30:00 In this section, the lecturer discusses the topic of computational aesthetics and design. They mention the availability of the AVA dataset which contains annotations for aesthetics, semantics, and photography style. The lecturer then demonstrates how deep learning algorithms can learn and predict aesthetics ratings and suggests that this can be used to enhance and tweak images. The lecture then moves onto discussing the potential of computational design and its importance in uncovering patterns in design and expressing oneself better.

  • 00:35:00 In this section of the lecture, the speaker introduces the concept of computational design and discusses the difference between design and art. The problem in design is given, and the designer's job is to convey a message to solve that problem, while artists define the problem themselves and use artistic techniques to solve it. The principles of design, such as communication over decoration, can be challenging to convey to a machine, but various theories, metrics, and rules, including gestalt and color harmony, can be used to create and recommend content automatically. The speaker also provides an example of automated design software that can layout text and design elements on top of a given background image.

  • 00:40:00 In this section of the video, the speaker discusses how he created an automatic design for magazine covers by choosing complementary colors and studying work done by Itten and Matsuda, along with Kobiashi who studied color combinations for 30 years, and how colors can be associated with words such as romantic, soft, and neat. Based on this work, the speaker created an automatic design system that can give recommendations to users based on the colors they choose and create styles for magazine covers. Additionally, the speaker explored whether data from professional designers could extract patterns in color palettes for magazine covers.

  • 00:45:00 In this section of the video, the speaker discusses their project which involved collecting a data set of magazine covers from 12 different genres in order to simultaneously find the text, genre, and color combinations used on the covers. The speaker used topic modeling to extract different topics, which are a combination of words and colors, and showed how word clouds and color palettes can be used to visualize these topics. The speaker also discussed the use of crowdsourcing to determine if the results of the project were universal or not.

  • 00:50:00 In this section, the speaker discusses a crowdsourcing experiment that they conducted to understand whether different cultures and demographics agree on color associations with various topics. The experiment involved showing a color palette randomly chosen from a topic and then showing different word clouds and asking the subjects to match them. Over 1,000 participants from various countries participated, and the resulting correlation or relevance matrix revealed some interesting patterns. The experiment showed that, for the most part, participants agreed on the color associations with various topics, although there were some exceptions. The speaker also highlighted the potential applications of this method in designing color palettes for different types of products.

  • 00:55:00 In this section of the lecture, the speaker discusses various applications of color semantics in tasks such as color palette recommendation, image retrieval, recoloring, and even web design. She demonstrates how algorithms can be used to recommend colors and magazine covers based on specific concepts or themes, as well as to analyze and visualize patterns in web design over time. The use of convolutional neural networks is also demonstrated in identifying color palettes and website design trends from specific eras.

  • 01:00:00 In this section, the speaker discusses the use of computational design and aesthetics in predicting the year of a design. They explain that it's not just colors that the model takes into consideration, but also high-level features such as typography. The accuracy of the classification was not mentioned, but it was noted to be higher than chance. Computational design has also been used to analyze ads, create logos and icons, and design fashion color palettes.

  • 01:05:00 In this section, the speaker discusses the use of generative models in fashion, product design, and art. He shows examples of datasets that are used to understand fashion elements, such as colors and tags, and mentions colleagues who use similar datasets to recommend product design. The speaker also talks about generative models that can take an input sketch and output a product design or alter an image to look like a different fashion item. Additionally, he touches on topics related to computational art and creativity, including style transfer and content generation tools.

  • 01:10:00 In this section of the video, the professor discusses the use of computational art and artificial intelligence in generating creative works, including image and style transfer, content generation, and generative models for videos. The discussion includes several examples of recent works in these areas, including StyleGAN, DALL-E by OpenAI, and generative models for video pose modification. Despite these advancements, the question remains whether machines can truly be artists or whether creativity and art belong only to humans.

  • 01:15:00 In this section, the speaker discusses their excitement in the direction of learning by generating and shares some results. They explain that learning by generating is interesting because it is a way to train AI to develop algorithms based on how humans learn to solve problems. The speaker also addresses a question about quantifying aesthetics and mentions that one way to bridge the gap between high-level terms in human language and computational terms is to use data and models, incorporating cultural concepts and even asking people for their opinions through crowdsourcing.

  • 01:20:00 In this section of the video, the speaker discusses the importance of using data in machine learning to avoid bias and come up with interesting results. He encourages listeners to think about how to design algorithms or representations that can lead to innovative and creative solutions. The speaker believes that creativity and innovation are essential components of artificial intelligence and cites examples of how they have been used in the design of objects and concepts. He emphasizes that learning by generating is an effective way to develop problem-solving skills and encourages listeners to break down bigger problems into smaller subsets and solve them one at a time.

  • 01:25:00 In this section of the video, the speaker discusses the concept of generalization and thinking outside the box in creativity and AI. The speaker presents the question of whether or not generative models are capable of generalization and out-of-distribution thinking. To explore this topic, the speaker introduces the concept of steerability of generative adversarial networks (GANs) and demonstrates the ability to manipulate images by finding a walk in the latent space of the generator. They show that current GAN models can exhibit transformations like zooming in and out, shifting and rotating. The speaker explains the process of finding a latent vector to manipulate the image and uses this to show the potential of generative models in creativity and innovation.

  • 01:30:00 In this section of the video, the speaker discusses the limitations of generative models such as BigGAN and why they have them. He explains that biases can be introduced to the model, which are also present in the semantics of the classes. This means that a model can generalize, but not as well as a human can. The speaker goes on to show that the model can go out of the distribution of the dataset and transform the way images look to some extent, but only if the underlying dataset is diverse. The paper suggests that one way to overcome the limitations of biased data is to augment it, such as by zooming in on or rotating images.

  • 01:35:00 In this section of the video, the lecturer discusses the use of latent space to generate aesthetically pleasing images through transformations. The transformations can be achieved by walking or steering in latent space to change image color, zooming, rotation, camera-like changes, and more. The lecturer also discusses the use of a neural network to detect image aesthetics, providing feedback on whether a walking direction or transformation generates more aesthetically pleasing images. The lecture encourages students to participate in an upcoming socratic debate between Berkeley and MIT on computational aesthetics and design. Additionally, the lecturer assigns students to review the provided code and experiment with the various techniques for generating aesthetically pleasing images.

  • 01:40:00 In this section of the video, the speaker discusses the repository of their work and encourages viewers to use PyTorch rather than TensorFlow to run the notebooks provided. They also explain the Colab system used to visualize the results of the code, and emphasize the importance of generating images and reporting the results. The speaker also reminds viewers that they can email them with any questions and thanks them for participating in the course.
MIT 6.S192 - Lecture 1: Computational Aesthetics, Design, Art | Learning by Generating
MIT 6.S192 - Lecture 1: Computational Aesthetics, Design, Art | Learning by Generating
  • 2021.01.21
  • www.youtube.com
First lecture of MIT 6.S192: Deep Learning for Art, Aesthetics, and Creativity, by Ali Jahanian.In this lecture, I start introducing the course and discuss C...
 

MIT 6.S192 - Lecture 2: A Socratic debate, Alyosha Efros and Phillip Isola



MIT 6.S192 - Lecture 2: A Socratic debate, Alyosha Efros and Phillip Isola

In this video, Alyosha Efros and Phillip Isola discuss the idea of using images to create shared experiences. They argue that this can help to bring back memories and create a sense of nostalgia.

This video is a debate between two professors at MIT about the role of data in artificial intelligence. Efros argues that data is essential to AI, while Isola counters that data can be a hindrance to AI development.

  • 00:00:00 In this lecture, Alyosha Efros and Phillip Isola discuss the view of generative models as a new type of data. Efros argues that the current era of generative models is just like data, but better. Isola describes how generative models work, and how they can be used to create interesting content.

  • 00:05:00 In this lecture, Alyosha Efros and Phillip Isola discuss the power of generative models. Generative models allow us to create data points that are decorated with extra functionality, such as a latent variable that can be used to modify the image. This opens up a lot of possibilities for creativity and scientific visualization.

  • 00:10:00 The video discuss the idea of manipulating images through latent space. They explain how this can be done by searching for a direction that will map to a meaningful transformation in image space. They give the example of making an image more memorable by zooming in on it. Finally, they discuss how this technique can be used
    to visualize the concept of what it means for something to be memorable.

  • 00:15:00 This video discusses the concept of generative models, which are a type of data that can be manipulated to create new images. The video showcases the ability of these models to compositionally create new images by adding different parts of different images together. The video also discusses the limitations of generative models, such as their bias towards certain objects or their inability to accurately depict certain scenes.

  • 00:20:00 Alyosha Efros and Phillip Isola discuss the concept of data plus plus, which is a way of thinking about data that includes both the data itself and the methods used to generate it. Efros argues that this perspective is useful because it allows for more meaningful interpolation between data points. Isola questions how one chooses the path between two data points, and Efros explains that the model chooses the shortest path, which often looks the most natural.

  • 00:25:00 In this video, Phillip Isola and Alyosha Efros debate the merits of the "Dall-E" algorithm. Efros argues that the algorithm is impressive because it is able to understand language. Isola counters that the algorithm is not actually understanding language, but is instead understanding words andgrams.

  • 00:30:00 The speaker argues that GANs are not really creative because they are only trained on highly curated data. He suggests that bi-directional mapping is the best way to go if you can afford it.

  • 00:35:00 In this lecture, Alyosha Efros and Phillip Isola debate the merits of data-driven vs. model-based approaches to artificial intelligence research. Efros argues that increasingly, models will become the primary interface to data, and that data scientists will need to learn how to work with models instead of data sets. Isola agrees, and adds that the data sets used to train these models are becoming increasingly large and complex.

  • 00:40:00 This video is a lecture by Alyosha Efros and Phillip Isola on the topic of context in art. Efros talks about how a photograph from an artwork by Michael Galinsky called Malls Across America made a deep impression on him, and how the context in which the photograph is viewed can affect its meaning. Isola talks about how a photograph of a girl looking at the sea can bring back memories and sensations for those who were alive during the time period in which it was taken.

  • 00:45:00 This video is a discussion between two professors about the concept of nostalgia, and how it can be used to appreciate art. They use the example of a photo of two friends in front of a door, which is only meaningful to the two of them because of their shared memories. They argue that this type of nostalgia can be found in many different forms, and that it can be a pleasurable experience for those who are able to recall memories.

  • 00:50:00 In this video, Alyosha Efros and Phillip Isola discuss the idea of using images to evoke shared experiences among people from a given city. They argue that this can help to bring back memories and create a sense of nostalgia.

  • 00:55:00 The painting "Olympia" by Edouard Monet was a huge scandal when it was released in 1865 due to its nudity and flattened skin tone. Some believe that the hand placement in the painting was what drove people insane.

  • 01:00:00 This lecture is about how art can be interpreted in different ways, depending on the context in which it is viewed. The example used is the painting "Reclining Venus" by Amedeo Modigliani, which caused outrage when it was first displayed because it was seen as a parody of a famous painting of a nude woman. However, when viewed in the context of other paintings of nude women, it can be seen as a valid work of art.

  • 01:05:00 In the YouTube video "MIT 6.S192 - Lecture 2: A Socratic debate, Alyosha Efros and Phillip Isola", the two discuss the meaning behind paintings by Russian painter Zlotnikov and American painter Hurst. Efros argues that the direction of the paintings is determined by the feelings of freedom and crowdedness that they evoke. Isola counters that the direction is determined by the black square painting by Malevich, which he sees as the ultimate resolution of a particular direction.

  • 01:10:00 Phillip Isola and Alyosha Efros debate the meaning of art, specifically a black square painting by Malevich. Isola argues that the painting is a signifier for nothing, while Efros argues that it is a natural progression for Malevich.

  • 01:15:00 The point of this video is that we may be overestimating the complexity of machines, and that what looks like magic to us may just be the result of simple processes. Braiterberg's book "Vehicles" is used as an example of how complex behaviors can emerge from simple interactions.

  • 01:20:00 In this lecture, Efros and Isola debate the nature of creativity and novelty. Efros argues that both are the result of incremental changes and that the creative process is usually very smooth. Isola counters that novelty is often the result of randomness and luck.

  • 01:25:00 This is a debate between two people about the role of context in art and science. One person argues that context is necessary for art to be meaningful, while the other argues that context is not necessary and that art can be novel without it.

  • 01:30:00 In this lecture, Efros and Isola debate the role of luck in scientific success. Efros argues that luck plays a significant role, while Isola argues that there are ways to plan for greatness.

  • 01:35:00 In this lecture, Alyosha Efros and Phillip Isola debate the role of luck in creativity, with Efros arguing that there must be more to it than just luck. Isola argues that data plus plus (the combination of data and operations) is the key to creativity, and that once you have the right data, the possibilities are endless.

  • 01:40:00 This YouTube video is a debate between Alyosha Efros and Phillip Isola about the differences between working with data and models, and whether or not data will become obsolete. Efros argues that data is already becoming less important as models become more advanced, and that eventually models will surpass humans in intelligence. Isola argues that data is still the gold standard and that models can never do more than the data they are based on.

  • 01:45:00 In this debate, MIT professors Alyosha Efros and Phillip Isola discuss the relationship between art and AI. Efros argues that computation is the best way to think about the relationship, and that there is a strong connection between art and evolution. Isola agrees that there is a connection between the two, but argues that current models are not capable of extrapolating new information from data, and that this is the key to truly creative AI.

  • 01:50:00 It was great to chat with Phillip and Alyosha about art and computation. They both think that art is at the forefront of a new paradigm of thinking and that computation can be used to help explore new ideas.

  • 01:55:00 In this lecture, Alyosha Efros and Phillip Isola engage in a Socratic debate about the role of data in artificial intelligence. Efros argues that data is essential to AI, while Isola counters that data can be a hindrance to AI development.
 

MIT 6.S192 - Lecture 3: "Efficient GANs" by Jun-Yan Zhu



MIT 6.S192 - Lecture 3: "Efficient GANs" by Jun-Yan Zhu

The lecture covers the challenges of training GAN models, including the need for high computation, large amounts of data, and complicated algorithms that require extensive training sessions. However, the lecturer introduces new methods that make GANs learn faster and train on fewer datasets, such as compressing teacher models using the general-purpose framework of GANs compression, differentiable augmentation, and data augmentation. The lecture also demonstrates interactive image editing with GANs and emphasizes the importance of large and diverse datasets for successful GAN training. The codes for running the model are available on GitHub with step-by-step instructions for running the model on different types of data. The lecture concludes by discussing the importance of model compression for practical purposes.

  • 00:00:00 In this section, the speaker introduces the concept of efficient GANs and how expensive GANs are. While GANs have been used for various content creation and creativity tasks, developing new algorithms or performing real-time performance requires high-end GPUs. For the development of the GauGAN project, the researcher required hundreds of high-end GPUs for training, and even after a year's development, the team had to buy an expensive laptop to carry the project around. The cost of training GANs and developing algorithms is expensive, and currently, it is challenging for universities to compete with big companies such as NVIDIA or DeepMind.

  • 00:05:00 In this section, the speaker explains the three main obstacles for more users to utilize GANs effectively, namely the need for high computation, large amounts of data, and a complicated algorithm that requires a lot of training sessions. He explains that GANs are computationally expensive due to the high-quality images and pre-processing steps required to train the model. Additionally, the large datasets and the need for labels further make the training of GANs more challenging. However, he introduces new methods that can make GANs learn faster and training on fewer datasets, which can help content creators and artists with limited access to resources to train and test their own models.

  • 00:10:00 In this section of the lecture, Jun-Yan Zhu introduces a method of compressing teacher models using the general-purpose framework of GANs compression. The goal is to find a student model with fewer filters that can produce the same kind of output as the teacher model. The method involves creating a loss function to ensure that the distribution of the student's zebra output looks very similar to the teacher's output, the student's intermediate feature representation is very similar to the teacher's, and the student's output looks like a zebra according to an adversarial loss. The process also involves a search for the optimal number of channels, which can produce the same results while reducing the model's size and training time. The process of sharing weights across different configurations makes it possible to train multiple configurations without training them individually, thereby reducing the training time.

  • 00:15:00 In this section, Jun-Yan Zhu discusses the process of training and evaluating GAN models through different configurations, along with the use of various loss functions to mimic teacher models and share weights across different configurations. Results were presented for models of different sizes and computational costs, along with the idea of compressing models to achieve real-time performance on mobile devices. The application of this idea to StyleGAN2 was also introduced, showing how low-cost models can be used for image editing before applying the final output from the original model.

  • 00:20:00 In this section, the speaker demonstrates a demo of interactive image editing with GANs. The goal of the demo is to enable users to edit an image in various attributes such as adding a smile or changing the hair color and to get immediate feedback based on their changes. The system employs a smaller model that produces consistent output with the big model to ensure the preview remains informative. Once the edits are finalized, the original model can be run to generate a high-quality output. The interactive editing is faster and provides high-quality results compared to existing non-deep learning content creation software.

  • 00:25:00 In this section of the lecture, Professor Jun-Yan Zhu discusses the challenges of training GAN models, citing the need for large amounts of high-quality data for effective performance. While it is possible to use rendering software or other tools to speed up the process and generate previews, training custom models requires collecting significant amounts of annotated data. Zhu gives the example of training a stylegan2 model on a dataset of only 50 or 100 faces, which resulted in distorted images. The lecture highlights the importance of large and diverse datasets for successful GAN training.

  • 00:30:00 In this section, the speaker discusses the importance of having a sufficient amount of training data in GAN models. They demonstrate that when training on smaller data sets, the discriminator can easily overfit and classify all the images correctly but will have trouble generalizing to real images. This leads to the generator producing many garbage images or collapsing. The speaker emphasizes that if one were to use GANs for their own purposes or on small data sets, overfitting becomes much more severe, and obtaining enough data is crucial for creating efficient GANs.

  • 00:35:00 In this section, the professor discusses the idea of data augmentation to combat overfitting in machine learning, which involves creating multiple versions of a single image to increase the data set without collecting new samples. However, applying this method to GANs training is more complicated because the generated images also have the effect of the same transformation or augmentation applied to the real images, which can lead to replicated artifacts. To avoid this issue, the professor suggests augmenting both real and fake images and only doing so for discriminator training to balance out the differences in augmented data between the generator and discriminator.

  • 00:40:00 In this section, the speaker discusses the concept of differentiable augmentation as an approach to bridge the gap between the generator's and discriminator's objectives in GANs. The main idea is to augment both fake and real images in a differentiable way so that the gradients from the discriminator can be back-propagated to the generator. The speaker demonstrates through examples that differentiable augmentation allows for better results with minimal training data, thus reducing the need for large scale datasets. The speaker concludes that differentiable augmentation is a crucial technique to remember when training GANs.

  • 00:45:00 In this section, the lecturer explains that all the codes for running the model are available on GitHub with step-by-step instructions for running the model on different types of data, even on personal facial images. They also discuss the specific tools available for designers and artists, and the lecturer mentions that David Bau will talk about online tools to visualize and monitor internal units. The model compression process is also discussed, with the goal of developing the ability to compress a model once and deploy it to multiple devices, which is important for practical purposes, as it saves developers time while reducing the time needed for users to access the model.
 

MIT 6.S192 - Lecture 5: "Painting with the Neurons of a GAN" by David Bau



MIT 6.S192 - Lecture 5: "Painting with the Neurons of a GAN" by David Bau

David Bau discusses the evolution of machine learning and the potential for creating self-programming systems. He introduces generative adversarial networks (GANs) and explains how they can be trained to generate realistic images. Bau discusses his process for identifying correlations between specific neurons in a Progressive GAN and certain semantic features in generated images. He demonstrates how he can add various elements to an image, such as doors, grass, and trees, with the help of a GAN. Additionally, he discusses the challenge of adding new elements to a GAN and the ethical concerns surrounding the realistic renderings of the world.

  • 00:00:00 In this section, David Bau discusses the evolution of machine learning, from its roots in statistical analysis to its potential for creating self-programming systems. As an academic researcher, he believes that now is an interesting time to ask questions about the direction of the field and the implications of machine learning models. The main problem he will be addressing in his talk is image generation, and he introduces the process of collecting a dataset of real images and training a generator network to recreate them.

  • 00:05:00 In this section, David Bau introduces generative adversarial networks (GANs) and explains how they can be trained to generate realistic images. He describes how the trick with GANs is to first train a discriminator to classify whether an image is real or fake, and then connect this discriminator to the generator to create images that fool the discriminator. However, he notes that the generator can learn to trick the discriminator with simple patterns that don't resemble realistic images, and so the trick with GANs is to iterate the process and go back and forth between the generator and discriminator to produce more and more realistic images. Finally, he shows examples of images generated by GANs, which are often difficult to distinguish from real images.

  • 00:10:00 In this section, the speaker discusses some of the artifacts seen in GAN-generated images, such as watermarks, and the origins of them coming from the training set. The speaker went on to explain how he found the neurons connected to watermark impressions and how he can turn them off. By turning off the watermark neurons, the output obtained from generator becomes free of any watermark or related artifacts, making it an exciting find, proving that there are switches within networks that control different features of generated imagery.

  • 00:15:00 In this section, David Bau discusses his process for identifying correlations between specific neurons in a Progressive GAN and certain semantic features in generated images. He explains that this was achieved by testing each neuron individually to see where it was activating the most, indicating certain features it was associated with. Through this process, he was able to identify neurons that correlated with trees, building parts like windows and doors, chairs, and even domes. Bau notes that this was achieved without any supervised training or labels and shows how the network has learned to differentiate between diverse examples of these features, representing them in distinct components.

  • 00:20:00 In this section, David Bau discusses the goal of mapping out all the different neurons in a model for generating kitchens, which resulted in catalogs of different types of correlated neurons. Bau found that middle layers of the model had neurons that correlated highly with semantic objects, while later layers had more physical correlations. Bau discovered that the correlations were so striking that it led to interesting applications, including turning on and off different objects in an image generation. Bau demonstrated how turning off some tree neurons removed the trees from the scene and the generator filled in what was behind the trees. Conversely, turning neurons on caused a door to appear in the scene, where the generator filled in the appropriate size, orientation, and style of the door.

  • 00:25:00 In this section of the video, David Bau shows how he can add various elements to an image, such as doors, grass, and trees, with the help of a GAN. By only activating specific neurons that correlate with a particular object or element, he can manipulate the image's semantics. He also discusses the limitations of GANs, such as only being able to edit randomly generated images, which can be solved with an inversion problem that requires learning how to run the model backward.

  • 00:30:00 In this section, David Bau discusses the limitations of using a Generative Adversarial Network (GAN) to generate images, as it may reveal things that the network cannot do. However, it is possible to fine-tune the network weights to generate a very nearby network that hits a target image exactly, while keeping the network relatively unchanged, making editing still possible. Bau demonstrates using this technique for modifying real photos by inverting the photo through the network, obtaining a starting image, fine-tuning the network to output the target image, and then editing the image. The process allows for adding or removing objects, such as domes and doors, that match the architectural style of the image.

  • 00:35:00 In this section of the video, David Bau explains how he used GAN technology to modify images by using a fine-tuned network to overfit on a specific image. By changing the pre-trained weights of the network in a way that tries not to change the core screen layers too much, Bau was able to edit images and create a rough approximation for the target image. However, the network does not generalize this knowledge, meaning it cannot generate meaningful changes for any image other than the target image.

  • 00:40:00 In this section, David Bau discusses the challenge of adding new elements to a generative adversarial network (GAN). Although the system can be trained to generate images of a specific object, it is difficult to teach it new concepts if there is no prior dataset or rule encoded. Bau, therefore, developed a technique to modify the weights of a pre-trained model to accommodate new rules, such as adding trees to the top of towers or drawing Cadillacs in front of buildings, without retraining the model. He demonstrates the application in StyleGAN2, where users can specify a rule and manipulate the output according to their preferences.

  • 00:45:00 In this section, David Bau discusses how he can select a few examples from his generated images and find the shared neurons responsible for their shape using the GAN. Once selected, he can redefine their representation and generate new images by computing the right changes to the GAN's model to turn, for example, the tops of pointy towers into trees. Bau shows that this process is affected by all images of pointy towers in his search results, leading to a completely new representation of the pointy tower images. Additionally, Bau explains that each layer of the GAN can be thought of as solving a simple problem of matching key-value pairs used as a memory for context representation. He notes that the weight matrix is the solution to the least squares problem, and changing a rule in the key-value pair of one layer is also a least squares problem, which can be written in the same way for comparison.

  • 00:50:00 In this section, David Bau discusses a method to change one thing that a network has memorized without changing the entire rule, allowing for the creation of models that represent things that don't exist yet. This is accomplished by finding a key and writing in a new value, using rank one updates in specific directions to change only the key's values. This allows users to change the rules inside a GAN and use them to create things based on their imagination rather than only on the training data. This method can also be employed where there is not enough data, providing a potential path to creating new worlds using machine learning.

  • 00:55:00 In this section, David Bau discusses the potential for his method to change the rules of the world by making them more visible and manipulatable by humans, and allowing people to build a better world. He also addresses a question about whether this method can work with multiple different models or is only successful when taking a hat from within this model and putting it on a horn. He explains that currently, the method is only able to rewire one model, but it is an obvious goal to be able to move a piece of computation from one neural network to another. Lastly, he talks about the ethical concerns surrounding the realistic renderings of the world and how it is already being misused, citing the deep fakes phenomenon and the creation of millions of fake Facebook profiles using face generators.

  • 01:00:00 In this section, David Bau discusses the implications and potential consequences of generating realistic images using deep neural networks. While forensics work on detecting fake images is necessary, he emphasizes that it is more exciting to understand the internal structure and learn how these models are working inside. Transparency in understanding the deep network is essential, as these neural networks are not good at answering the question of why they make certain decisions. Bau's goal is to disassemble the rules applied inside the network to make its decision and to develop a way of asking why, helping to define transparency as a crucial ethical aspect of deep neural networks. Furthermore, Bau's work on scan dissection shows that you can identify neurons that contribute to bad-looking artifacts, which can improve the quality of output in these networks.

  • 01:05:00 In this section, David Bau discusses how some GANs have artifacts or distortions in their generated images that can sometimes be removed or reduced with certain learning methods. He suggests that, while the current generation of GANs may be more advanced than what he experimented with, it would still be worth investigating whether this phenomenon still occurs. David notes that asking the right questions and learning to do so is essential in this field and invites anyone interested in his work to reach out to him.
MIT 6.S192 - Lecture 5: "Painting with the Neurons of a GAN" by David Bau
MIT 6.S192 - Lecture 5: "Painting with the Neurons of a GAN" by David Bau
  • 2021.01.27
  • www.youtube.com
https://people.csail.mit.edu/davidbau/home/More about the course: http://deepcreativity.csail.mit.edu/Information about accessibility can be found at https:/...
 

MIT 6.S192 - Lecture 7: "The Shape of Art History in the Eyes of the Machine " by Ahmed Elgemal



MIT 6.S192 - Lecture 7: "The Shape of Art History in the Eyes of the Machine " by Ahmed Elgemal

Ahmed Elgamal, a professor of Computer Science and founder of the Art and Artificial Intelligence Lab, discusses the use of AI in understanding and generating human-level creative products. Elgamal discusses the scientific approach to art history and the importance of advancing AI to understand art as humans do. He also discusses the use of machine learning to classify art styles, analyzing the internal representations, identifying differences between art styles, and quantifying creativity in art through AI. Elgamal also proposes the concept of primary objects in art history and explores the potential for AI to generate art, recognizing the limitations of current AI approaches in creative pursuits. However, Elgamal also discusses ongoing experiments to push the AI network boundaries to create abstract and interesting art.

Ahmed Elgammal also discusses the results of a tuning test to determine if humans can distinguish art created by a GAN from that of humans, using artworks as a baseline. Humans thought art made by GAN machines was produced by humans 75% of the time, emphasizing the concept of style ambiguity and its importance in connecting computer vision and machine learning with art history and artistic interests.

  • 00:00:00 In this section, Professor Ahmed Elgammal, a professor in the Department of Computer Science at Rutgers University and founder of the Art and Artificial Intelligence Lab, discusses his passion for art and how he realized the importance of combining AI and art. He explains that art is much more than object recognition and involves layers of context, understanding emotions, and historical and social contexts that require cognitive and intellectual abilities similar to those of humans. He believes that understanding and generating human-level creative products is fundamental to show that AI algorithms are intelligent and discusses the question of combining aesthetics and subjectivity with objectivity and science. Professor Elgammal advocates for a scientific approach to art history and stresses the importance of advancing AI to understand art as humans do.

  • 00:05:00 In this section, Ahmed Elgemal discusses the idea that any aspect of art, even the creative and subjective elements, can be studied objectively through the eyes of a machine. He explains that his goal is to understand the implications of looking at art through AI and how it can advance AI and the understanding of art history. Elgemal talks about his work in quantifying the different elements and principles of art and style, including how to characterize the sequence and evolution of art style change over time and what factors influence these changes. He also discusses the limitations of current AI approaches in understanding the concept of style in art.

  • 00:10:00 In this section, the speaker discusses a supervised machine learning problem to classify different art styles, using visual encodings to capture different levels of features. The progress of this type of research is compared from the years of hog's years to deep learning. The machine is able to classify art styles at the same level as a first-year's art history student. The speaker argues that classifying art by the machine is important to understand the characteristic of style and what drives style changes. The machine's internal representations of these styles are difficult to interpret, but studying the relationship between how the machine identifies style and how art historians think about style can provide useful information. For instance, Heinrich Wolfflin's theory about style suggests visual schemas that differentiate the elements of different styles.

  • 00:15:00 In this section, Elgemal discusses the use of machine learning to classify art styles and analyze the internal representation of the machine's classification. They trained several CNN models, including VGGNet and ResNet, to do style classification in a supervised manner. By analyzing the internal representation, they found that a small number of factors can explain most of the variations in Western art history, with the first two modes of variation explaining up to 74% of the variance, regardless of the network used. They also found that there is nothing about object or composition that matters when it comes to classifying art styles. This approach provides a data-driven way of understanding how the machine classifies art and provides insights into the structure of art history.

  • 00:20:00 In this section, the lecturer discusses how although machines are not informed about the timelines of various art styles, they can learn to classify these styles by themselves through the images provided. This is confirmed by the fact that the machine puts art in historical order as there is a 0.7 correlation between the progression of styles and time. The lecturer delves into the two factors that help explain 75% of art history, which are planar versus recessional and linear versus painterly. He notes that art history went through a 360-degree cycle over the last 500 years in western civilization and this is captured in one diagram created from the representation that the machine learned from looking at art styles.

  • 00:25:00 In this section, the speaker discusses the use of AI in determining the differences between art styles. While some styles, such as Renaissance and Baroque, can be distinguished using specific factors, such as color and texture, other styles like Impressionism cannot be identified through these factors. The activation manifolds of the AI networks show how art movements have changed over time, with particular emphasis on the works of Cezanne, who acted as a bridge between Impressionism and early 20th-century styles such as Cubism and Abstraction. Additionally, certain Renaissance artworks are pulled away from the Renaissance cloud, with particular artists such as El Greco and Durer influencing modern art. The talk then transitions to a discussion of quantifying creativity in art through AI.

  • 00:30:00 In this section, Elgemal discusses the development of an algorithm to assess the creativity of a painting given its context and art history. He argues that the ability to assess creativity is critical for machines that create art, and that the algorithm must define creativity in a quantifiable way. Elgemal suggests that there are two main conditions for a product to be called creative: it must be novel compared to prior work, and it has to be of some value, meaning it will become influential. He looks at different ways to describe creativity and explores the limitations of algorithms that assess creativity, arguing that they must consider the context of art history.

  • 00:35:00 In this section, Ahmed Elgamal discusses the limitations of algorithms in art history, including what he calls the "closed world limitation" of the available data and the "artistic concept quantification limitation" of the visual encoding used. He suggests that the parameters of the algorithm can be used to interpret creativity scores and understand how they affect the results. Elgamal proposes a directed graph between paintings with a weight reflecting their visual similarity, and uses this to create a formulation for creativity based on influence and novelty. The resulting formula is an instance of a network centrality problem and can be interpreted as a random walk in a Markov chain with alpha set to one.

  • 00:40:00 In this section, the lecturer discusses how eigenvector centrality can be used to measure network centrality in social networks by inverting the weighted variant of page rank. This can even be extended to separate originality from influence, and the accuracy of the algorithm can be evaluated using sets such as wikiart and archive that were not supervised. The lecturer explains that when tested, the results showed that the machine was able to identify various creative pieces of artwork such as Picassos' Ladies of Algiers as the beginning of cubism.

  • 00:45:00 In this section, Ahmed Elgemal discusses the evaluation of artwork creativity using an archive machine, which came about because of a mistake in the dating of Mondrian's artwork. The method involved taking artwork from the Renaissance or baroque period and moving it to a later period, while also taking modern artwork and moving it back to the Renaissance period. The results showed a consistent drop in creativity when moving Renaissance and baroque artwork forward in time, and an increase in creativity when moving modern artwork back to the Renaissance period. The algorithm used was able to quantify creativity and give a score that captured novelty and influence, validating the algorithm's ability to evaluate artwork creativity.

  • 00:50:00 In this section, Ahmed Elgemal discusses the concept of primary objects in art history and how they can give birth to new styles. He compares prime objects to prime numbers in mathematics, drawing parallels between their unpredictable nature and their ability to influence subsequent work. Elgemal also explores the potential for AI to generate art, discussing Creative Adversarial Networks and their ability to learn about style and deviate from norms. However, Elgemal recognizes that the generator in GANs is limited as it is trained to create samples that fool the discriminator, without any motivation for creativity.

  • 00:55:00 In this section, the speaker discusses how artists have to innovate all the time to push against habituation, but if they innovate too much, it will be hard for people to enjoy it. They aim to push the network to be innovative but keep it within the same distribution to push the boundaries. The speaker explains that they have added style ambiguity loss to the discriminator to see whether the art the generator creates fits within styles or is ambiguous in terms of classification, which will help the machine explore different boundaries. They conducted experiments and concluded that by adding style ambiguity, the machine generated interesting abstract artworks with new compositions and color combinations that were in the distribution of what's appealing.

  • 01:00:00 In this section, Ahmed Elgammal explores the results of a tuning test to determine if a human can distinguish between art created by a GAN and that of a human. The artworks from a famous exhibition serve as a baseline, and it was discovered that humans thought art made by GAN machines was produced by humans 75 percent of the time, compared to 85 percent for abstract art, and only 48 percent for art from the Art Basel collection. Elgammal also discusses the concept of style ambiguity and its ability to allow for the creation of art that belongs to art without a specific style. He emphasizes the importance of connecting computer vision and machine learning with art history and artistic interests.
MIT 6.S192 - Lecture 7: "The Shape of Art History in the Eyes of the Machine " by Ahmed Elgemal
MIT 6.S192 - Lecture 7: "The Shape of Art History in the Eyes of the Machine " by Ahmed Elgemal
  • 2021.01.28
  • www.youtube.com
Abstract: In this talk, I will argue that teaching the machine how to look at art is not only essential for advancing artificial intelligence, but also has t...