Machine Learning and Neural Networks - page 12

 

Lecture 2: Image Formation, Perspective Projection, Time Derivative, Motion Field



Lecture 2: Image Formation, Perspective Projection, Time Derivative, Motion Field

In this lecture, the concept of perspective projection and its relationship with motion are discussed extensively. The lecturer demonstrates how the use of differentiation of the perspective projection equation can help measure motion of brightness patterns in the image and how it relates to motion in the real world. The lecture also covers topics such as the focus of expansion, continuous and discrete images, and the importance of having a reference point for texture when estimating an object's velocity in an image. Additionally, the lecture touches on total derivatives along curves and the issue of equation counting and constraints when trying to recover the optical flow vector field.

The speaker covers various topics such as brightness gradient, motion of an object, the 2D case, and isophotes. One challenge faced in computing an object's velocity is the aperture problem caused by the brightness gradient's proportional relationship, which is resolved by either weighting contributions to different image regions or searching for minimum solutions. The lecture then delves into the different cases of isophotes and emphasizes the importance of computing a meaningful answer as opposed to a noisy one when determining velocity, using the concept of noise gain, which measures the sensitivity of change in the image to the change in result.

  • 00:00:00 In this section, the lecturer discusses perspective projection and motion. Perspective projection involves a relationship between points in the 3D world and the 2D image, which can be represented through suitable coordinate systems. They explain that differentiation of the perspective equation can help in measuring motion of brightness patterns in the image, which can then be used to determine motion in the real world. The lecturer reduces the complexity of the equations by utilizing more easily digested symbols such as velocities in the x and y directions.

  • 00:05:00 In this section, the lecturer explains how to use motion vectors to find the focus of expansion, a point in the image where there is no motion. This point is significant because it allows us to determine the direction of motion simply by connecting it to the origin, and it tells us something about the environment or the motion. The lecturer goes on to show how the pattern of the image will appear if the focus of expansion is at a certain point, and how the vector diagram can be drawn to show the motion field.

  • 00:10:00 In this section of the lecture, the concept of focus of expansion and compression is introduced in the context of image formation and perspective projection. The equation describes vectors radiating outwards from the focus of expansion, which is important in measuring distance and velocity. The ratio of w over z determines the size of the vectors, and the inverse of the focus of expansion is the focus of compression. By taking the ratio of z over w, the time to impact can be estimated, which is useful for landing spacecraft or measuring distance. The idea is then introduced in vector form, although it is not immediately useful.

  • 00:15:00 In this section, the speaker discusses the perspective projection equation and how it can be used to introduce image coordinates. The focus of expansion is introduced as the point where r dot is zero, which corresponds to z. By differentiating each component with respect to time, we can derive equations for motion in 3D and motion in depth. The speaker also uses a result from the book's appendix to transform the equations into a general statement about the flow, allowing for the expression of image motion in terms of world motion.

  • 00:20:00 In this section, the lecturer discusses the concept of image motion and its relationship to the z-axis. The resulting image motion is found to be perpendicular to the z-axis, which is not surprising since the image is only in two dimensions with velocities in the x and y directions. The lecture then explores the concept of radial motion and its effect on image motion, with the conclusion that if the object is moving directly towards or away from the observer, there is no image motion. The lecturer concludes by examining examples of flow fields in which the vectors are not all of the same length, demonstrating that while unpleasant, this can be advantageous.

  • 00:25:00 In this section, the lecturer discusses how understanding the forward process of image formation can help solve the inverse problem of recovering depth from motion fields. The lecturer notes that depth and velocity are the two key factors affecting the appearance of the motion field, and knowing one can help calculate the other. However, recovering both can lead to an ill-posed problem with multiple or no solutions. The lecturer also briefly touches on image brightness patterns, which can be represented as a 2D pattern of brightness values, and color representation using RGB values, which will be discussed later. Lastly, the lecturer explains that images can be represented as either continuous or discrete, with digital images being quantized in space and typically on a rectangular grid.

  • 00:30:00 In this section of the lecture, the professor discusses the difference between continuous and discrete domains in image processing. While in practice images are often represented by arrays of numbers with two indices, using continuous functions can make it easier to understand certain operations, such as taking integrals. Additionally, the professor talks about approximating the x and y derivatives of brightness with difference methods, and the importance of the brightness gradient in image processing. The lecture also touches on 1D sensors and how they can be used for imaging, with motion serving as a means to scan the image. The professor poses the problem of determining the velocity of motion between two frames of an image and gives an example of an optical mouse mapping the surface of a table.

  • 00:35:00 In this section, the lecturer discusses the assumptions made in optical mouse technology, in particular the constant brightness assumption when looking at a surface. He also explains how a small linear approximation of a curve can be used to determine motion by analyzing the change in brightness between frames. The lecturer introduces partial derivative notation as well as the components of the brightness gradient which can be used for edge detection. Finally, the formula delta e = e sub x times delta x is derived and divided through by delta t to calculate motion.

  • 00:40:00 In this section of the lecture, the speaker discusses how to recover motion from a single pixel in a 1D image. The result allows the speaker to recover motion, but this approach does not work for 2D images. The speaker explains that larger ET values indicate faster movements and that there is a problem when EX is zero since division by zero or small values would result in errors due to measurement issues. Additionally, the speaker explains that small or zero EX values result in noisy estimates due to measurement errors.

  • 00:45:00 In this section of the lecture, the speaker discusses the importance of having a reference point with texture when estimating an object's velocity in an image. This type of measurement can be noisy and unreliable unless certain image conditions are met. However, the results can be improved dramatically by using multiple pixels and applying techniques like least squares to reduce the error. By combining multiple pixels, the standard deviation of the measurements can be reduced by the square root of n, which is significant for large images. However, it's important to weight the measurements based on the slope of the texture to avoid contaminating areas with low slope with information from high slope areas. Finally, the analysis is extended to 2D images, and multiple approaches are discussed to get the next result.

  • 00:50:00 In this section, the lecturer explains how video frames can be conceptualized as a three-dimensional volume of brightness values with x, y, and t as axes. The lecture then goes on to describe partial derivatives and how they are derived from differences of neighboring pixels in the x, y, or t direction. The lecturer then explores the concept of total derivatives along curves, specifically related to the brightness gradient of an object in motion. Using the chain rule, the total derivative can be expressed as partial derivatives, allowing for the prediction of how the object's brightness will change over time. Finally, the lecture introduces the concept of finding u and b from image sequences.

  • 00:55:00 In this section, the lecturer discusses the issue of equation counting and constraints when trying to recover the optical flow vector field. In the case of one unknown u and one constraint equation, it is possible to obtain a finite number of solutions. However, with two unknowns u and v and one equation constraint, it appears hopeless. The constraint equation is derived from the assumption that the images don't change in brightness as they move. The lecturer shows that plotting the constraint equation in velocity space reveals it to be a line, which is a significant development in solving the problem. The goal is to pin the point down to a point and get the precise optical flow vector field.

  • 01:00:00 In this section of the video, the speaker discusses the importance of the brightness gradient in determining the motion of an object. The brightness gradient is a unit vector pointing perpendicular to the transition between areas of high and low brightness. The speaker explains that when making a localized measurement, there are not enough equations to determine the motion of an object. However, it is possible to determine the motion in the direction of the brightness gradient. The speaker then moves on to discuss the 2D case and states that multiple constraints need to be used to determine the motion of an object. To demonstrate this, the speaker solves a simple linear equation to recover the values of u and v.

  • 01:05:00 In this section, the lecturer explains how to invert a 2x2 matrix and use it to solve the set of linear equations for image motion. However, in some edge cases, the determinant of the matrix can be zero, which means that the brightness gradients are proportional to each other, resulting in the aperture problem. This problem suggests that contributions to different image regions need to be weighted differently, rather than just averaging the result. To solve this problem, we need to search for the values of u and v that make the equation zero, or as small as possible.

  • 01:10:00 In this section, the speaker discusses a constraint that applies in an ideal case where the correct values of u and v result in an integrand of zero when integrated over the entire image. This can be the basis for a strategy to find the correct values of u and v. The speaker notes that this approach may fail when there is no light or texture in the scene, resulting in zero values for ex and ey. The speaker then explains how the integrand is turned into something always positive by squaring it and minimizing it, leading to a calculus problem of two equations with two unknowns. However, this can fail if the determinant of the two by two matrix is zero, which can occur if ex is zero everywhere or if ex equals ey.

  • 01:15:00 In this section, the speaker discusses the different cases of isophotes, which are lines of equal brightness gradient. The isophotes can be at a 45-degree angle, parallel lines, or curved lines. However, the speaker emphasizes that the most general case is isophotes at some angle because it encompasses all the other cases. They also mention that the only problem arises when isophotes are parallel lines, which can be overcome by looking for areas in the image where the brightness gradient changes a lot, such as corners or areas with high isophote curvature. Finally, the speaker introduces the concept of noise gain and encourages students to send any questions they have about the lecture or the upcoming homework assignment.

  • 01:20:00 In this section, the lecturer discusses the importance of computing a meaningful answer, rather than a noisy one, when determining the velocity of motion. He explains the concept of noise gain, which refers to the sensitivity of the change in the image to the change in the result, and how it impacts the velocity calculation. He then goes on to describe a one-dimensional transformation where the forward function is known and the goal is to invert it in a way that is sensible and not overly sensitive to noise.
Lecture 2: Image Formation, Perspective Projection, Time Derivative, Motion Field
Lecture 2: Image Formation, Perspective Projection, Time Derivative, Motion Field
  • 2022.06.08
  • www.youtube.com
MIT 6.801 Machine Vision, Fall 2020Instructor: Berthold HornView the complete course: https://ocw.mit.edu/6-801F20YouTube Playlist: https://www.youtube.com/p...
 

Lecture 3: Time to Contact, Focus of Expansion, Direct Motion Vision Methods, Noise Gain



Lecture 3: Time to Contact, Focus of Expansion, Direct Motion Vision Methods, Noise Gain

In this lecture, the concept of noise gain is emphasized as it relates to machine vision processes, with a focus on different directions and variations in accuracy. The lecturer discusses the importance of accurately measuring vectors and understanding gain to minimize errors in calculations. The talk covers the concept of time to contact, the focus of expansion, and motion fields, with a demonstration of how to compute radial gradients to estimate time-to-contact. The lecturer also demonstrates how to overcome limitations in frame-by-frame calculations using multi-scale superpixels, with a live demonstration using a web camera. Overall, the lecture provides useful insights into the complexities of machine vision processes and how to measure various quantities accurately.

The lecture discusses various aspects of motion vision and their application in determining time to contact, focus of expansion, and direct motion vision methods. The speaker demonstrates tools for visualizing intermediate results, but also acknowledges their limitations and errors. Additionally, the problem of dealing with arbitrary motions in image processing is tackled, and the importance of neighboring points moving at similar velocities is emphasized. The lecture also delves into the patterns affecting the success of direct motion vision methods and introduces new variables to define time to contact and foe more conveniently. Finally, the process of solving three linear equations and three unknowns to understand how different variables affect motion vision is discussed, along with the parallelization of the process to speed up computation.

  • 00:00:00 In this section, the lecturer discusses noise gain, which refers to the relationship between errors in measurement and errors in estimating quantities related to the environment. He uses an example of an indoor GPS system that uses Wi-Fi access points to illustrate the idea. The accuracy of the system is limited by the measurement of round trip times from the phone to the access point and back with high precision. The lecturer emphasizes that the noise gain analysis of some machine vision process will be different in different directions and will not be a single number. Rather, accuracy can be determined pretty well in one direction, but not in another, depending on how you move around.

  • 00:05:00 In this section of the video, the lecturer discusses the concept of using transponders to determine position and the corresponding errors this can cause. He explains that if two transponders are used and positioned in a line, determining accuracy in a certain direction becomes difficult due to small changes in distance. However, if the transponders are positioned 90 degrees apart, accuracy is improved. Furthermore, the lecturer explains the use of circles as they relate to determining the locus of possible positions with the same amount of error.

  • 00:10:00 In this section, the lecturer explains the concept of forward transformation which takes us from a quantity in the environment that needs to be measured to something that can be observed in an instrument. He explains that the measurement may not be perfect and therefore the noise in the quantity of interest is related to the noise in the measurement by the derivative of the transfer function. The lecturer also highlights the significance of the noise gain, emphasizing that a small value of f prime of x is not good as the resulting uncertainty in the quantity being measured would be large.

  • 00:15:00 In this section, the speaker discusses how to measure vectors and the importance of understanding gain in these measurements. They explain that measuring a vector requires a little more complexity than measuring a scalar quantity, but it can still be done by applying linear transformations. The speaker emphasizes that a crucial aspect of vector measurements is understanding gain, which involves taking into account anisotropy and determining the magnitude of the change in results and measurements. Determining the inverse of the determinant is essential in solving linear equations, and it is crucial to avoid this value being zero or too small to minimize the amplification of errors in the calculations. The speaker provides an example of a two-by-two matrix to explain how to obtain an inverse matrix.

  • 00:20:00 In this section of the lecture, the concept of noise gain is applied to an example involving motion and solving for the variables u and v. It is explained that if the quantity is small, noise will be amplified significantly, and this is due to the fact that the brightness gradients at the two pixels are similar in orientation, providing little difference in information. A diagram of the velocity space is used to show how the two lines intersect and how a small shift in one line can cause a large change in the intersection point, which is not a desirable case. However, not all hope is lost, as it is noted that the noise gain may not be equally high in all directions and it is useful to know which component can be trusted. The lecture then continues to review the constant brightness assumption and constraint equation before moving on to the concept of time to contact.

  • 00:25:00 more complex notation. In this section, the lecturer discusses the optical mouse problem and how to deal with it using a least squares approach. The goal is to find the correct velocity using measurements of ex, ey, and et, but these measurements are usually corrupted by noise, so the minimum of the integral (not zero) will be our estimate of u and v. The lecturer goes over some calculus to determine the minimum and explains the importance of minimizing this integral. They then move on to simple cases where u and v are predictable, such as in the case of focus of expansion, and review the relationship between world coordinates and image coordinates in perspective projection.

  • 00:30:00 In this section, the speaker discusses the relationship between the velocities, distances, and the focus of expansion for motions with zero velocity in the x and y directions. The talk covers the quantity of w of a z, which is the component of motion in the z-direction, and the distance of a speed measured in meters per second or seconds, also known as the time to contact, which is useful in finding how long it will take before one crashes into an object if nothing changes. The speaker then goes on to demonstrate, with a simple example, how the focus of expansion works when someone is moving towards a wall and what the motion field would look like in that scenario.

  • 00:35:00 In this section, the speaker explains that while we might think that finding vectors is the easiest approach to solving the problem of finding the focus of expansion, the reality is that all we have are images that are brightness patterns, and there are no vectors within them. Instead, we need to use the image data of an expanding or shrinking image to solve this problem. The speaker shows a diagram of the vectors showing compression rather than expansion but emphasizes that the focus of expansion is an essential factor in this experiment. The speaker also introduces the idea of the radial gradient, which is the dot product of two vectors: the vector of the brightness gradient and the vector to the optical center of the camera, and this can be used to measure the inverse of the time to contact using brightness derivatives at one point in the image. However, these numbers are subject to noise, and estimating derivatives makes things worse, so this method is not very accurate.

  • 00:40:00 In this section, the lecturer explains how to compute radial gradients and use them to estimate the time-to-contact of an image. The radial gradient is computed by taking the dot product of the image gradient with a radial vector in a polar coordinate system erected in the image. The lecturer then shows how to use least squares to minimize the difference between the computed radial gradient and the theoretical value of zero for a point source of light. This is applied to a simple case of motion along the optical axis, where the estimation of the parameter c gives the time-to-contact.

  • 00:45:00 In this section of the lecture, the professor explains his approach to estimating time to contact using direct motion vision methods. He uses calculus to minimize the mean squared error in the presence of noise and derives the formula for c, which is the inverse of the time to contact. The key is to estimate the brightness gradient using neighboring pixels in the x and y directions, then computing the radial gradient, and finally computing the double integrals over all pixels to get the estimates of g and g squared. With these, the time to contact can be estimated easily using the formula for c. The method is simple and effective, with no need for high-level processing or sophisticated object recognition techniques, making it a direct computation of time to contact.

  • 00:50:00 In this section, the speaker discusses measuring the position of a bus using image analysis techniques. By measuring the number of pixels in the image of the bus and how it changes over time, one can determine the bus's position accurately. However, this process requires a high level of precision and can become challenging when dealing with more complex scenarios. To demonstrate these techniques, the speaker uses a program called Montevision, which processes images to estimate the time to contact and focus of expansion with various objects. The program computes three values to optimize the accuracy of image-based analysis, but as the results are noisy, they require constant improvement to be effective.

  • 00:55:00 In this section, the lecturer discusses a method to calculate time to contact and the limitations of doing so using frame-by-frame calculations. These limitations include image focus changes and the failure of the method to adjust for larger velocities in closer objects. The lecturer demonstrates how to overcome these limitations by using multi-scale superpixels, or grouping pixels together to improve image processing speed and accuracy. Finally, the lecturer shows a live demonstration using a web camera to display the time to contact based on the movement of the camera.

  • 01:00:00 In this section, the lecturer demonstrates a tool that can display intermediate results, whereby the x derivative controls red and the y derivative controls green, giving a three-dimensional effect, akin to rapid variation of a gradient in a topographic map. Furthermore, the radial derivative, g, is demonstrated to go outwards, and when multiplied by the time derivative, e t, can determine motion. However, it is acknowledged that such a tool has limitations and errors which are calculable, and no magic code, making it a fascinating and comprehensible tool.

  • 01:05:00 In this section, the lecturer discusses the problem of dealing with arbitrary motions in image processing. He notes that the problem arises from the fact that u and v, which refer to motion in the x and y directions, respectively, may be different throughout the image. This can lead to a million equations in two million unknowns, making the problem look unsolvable. The lecturer suggests that additional assumptions may be needed to solve the problem, but notes that in most cases, neighboring points in the image are moving at the same or similar velocities, providing additional information. He also cautions that the solution may fail if there is zero radial gradient in the image, and explains what that means.

  • 01:10:00 In this section, the lecturer discusses the patterns that can affect the success of using direct vision motion methods to calculate time to contact. The lecturer explains that some patterns, like an x shape, have gradients changing in different directions and, therefore, provide valuable information for calculating time to contact. However, another pattern, like a pie chart, fails to provide this information as the gradients are consistent in their direction. The lecturer also mentions that the algorithm could pick up non-zero e x e y from tiny specks or fibers that exist even in relatively consistent patterns like a piece of paper. Finally, the lecture introduces two new variables, f u of z and f v of z, that will help define time to contact and foe more conveniently in the equations.

  • 01:15:00 In this section, the speaker discusses the formula for calculating the focus of expansion, which is based on the two parameters a and b, and how f does not show up in the formula. While for many purposes, f is needed to compute distance and velocity, the time to contact computation does not require f. The speaker then formulates a problem as a least squares problem with a finite number of parameters a, b, and c, and proceeds to differentiate the integral to find the derivative of the integrand.

  • 01:20:00 In this section of the lecture, the speaker explains how to solve three linear equations and three unknowns to find out how different variables will affect motion vision. The solution has a closed form, which is beneficial as it allows for conclusions to be drawn quickly, rather than having to recompute with different parameters. There are three accumulators, which differentiate in the horizontal, vertical, and g direction, which all affect the coefficients. The coefficient matrix is symmetrical, which gives an understanding of the stability of the solution.

  • 01:25:00 In this section of the lecture, the speaker discusses parallelizing the process of running through six accumulators in an image and adding to them as you go. This process does not require interactions between pixels and can therefore speed up if run on a GPU. These accumulators do not depend on changes in time as they are just accumulating brightness patterns and texture within the image. The remaining three accumulators do depend on changes in time. Once all accumulators are accounted for, three equations in three unknowns must be solved.
Lecture 3: Time to Contact, Focus of Expansion, Direct Motion Vision Methods, Noise Gain
Lecture 3: Time to Contact, Focus of Expansion, Direct Motion Vision Methods, Noise Gain
  • 2022.06.08
  • www.youtube.com
MIT 6.801 Machine Vision, Fall 2020Instructor: Berthold HornView the complete course: https://ocw.mit.edu/6-801F20YouTube Playlist: https://www.youtube.com/p...
 

Lecture 4: Fixed Optical Flow, Optical Mouse, Constant Brightness Assumption, Closed Form Solution



Lecture 4: Fixed Optical Flow, Optical Mouse, Constant Brightness Assumption, Closed Form Solution

In Lecture 4 of the course on visual perception for autonomy, the lecturer discusses topics such as fixed optical flow, optical mouse, constant brightness assumption, closed form solution, and time to contact. The constant brightness assumption leads to the brightness change constraint equation, which relates movement in the image with brightness gradient and rate of change of brightness. The lecturer also demonstrates how to model situations where the camera or the surface is tilted, and discusses the benefit of multi-scale averaging in handling large motions. Additionally, the lecture explores the use of time to contact in various autonomous situations and compares different control systems for landing in planetary spacecraft. Finally, the lecture touches on the projection of a line and how it can be defined using perspective projection.

The speaker discusses the applications of image processing, including how vanishing points can be used to recover the transformation parameters for camera calibration and how calibration objects with known shapes can determine the position of a point in the camera-centric system. The lecture also covers the advantages and disadvantages of using different shapes as calibration objects for optical flow algorithms, such as spheres and cubes, and how to find the unknown center of projection using a cube and three vectors. The lecture ends by highlighting the importance of taking radial distortion parameters into account for real robotics camera calibration.

  • 00:00:00 In this section, the lecturer talks about image formation and motion tracking. They discuss perspective projection equations and the focus of expansion, which is the point towards which movement is happening. The constant brightness assumption is introduced, which means that in many circumstances, the brightness of an image of a point in the environment will not change over time. The lecturer explains how this assumption leads to the brightness change constraint equation, which relates movement in the image with brightness gradient and rate of change of brightness. The lecture also covers how solving for velocity requires additional constraints and how everything moving at the same speed can be an extreme form of constraint.

  • 00:05:00 In this section of the lecture, the speaker discusses the technique of minimizing error to estimate u and v in optical flow problems where there is a constant u and v for the entire image, as in the case of an optical mouse. This process is highly over-constrained, but we can obtain a linear equation in the unknowns, with a symmetric two by two coefficient matrix. The speaker shows how to compute the derivatives and the conditions under which this method won't work. They also explain a particular type of image where e_x's and e_y's are in the same ratio everywhere, and this condition will hold true.

  • 00:10:00 In this section, the lecturer talks about the isophoto where e x y is constant, which is a straight line with parallel lines that only differ in c. This type of image poses problems for optical mouse systems because they cannot measure the sliding in one direction, making it impossible to determine the other part of it. The lecture then introduces the concept of time to contact, which depends on ratios of fractional parts rather than absolute values, enabling the system to work without calibration. The lecturer also demonstrates how to differentiate the equation, showing that the size of the object is constant, which leads to the derivative of the product being zero.

  • 00:15:00 In this section, the lecturer explains a simple relationship that translates a certain percentage change in size between frames into a certain percentage change in the distance, which directly translates into the time to contact (TTC). The lecturer emphasizes the importance of accurately measuring the image size when estimating TTC using the image size method, as the fractional change in the image from frame to frame is relatively small for a high TTC. The lecturer also discusses the assumptions made in the time to contact relative to a planar surface, noting that the assumption that z is constant still applies.

  • 00:20:00 In this section, the lecturer discusses how to model situations where the camera or the surface is tilted. In the case of a tilted plane, the depth will no longer be constant in the image. The equation for a plane is a linear equation in x and y, which can be a more complicated model to look at. Generally, equations might become too complicated there, and there might not be a closed-form solution. However, it's better to focus first on cases where there is a closed-form solution. If the surface is not planar, we can approximate it by polynomials to set up a least squares problem. Unfortunately, we won't find a closed-form solution, so we need a numerical solution. Nonetheless, we have to be careful introducing more variables because it lets the solution squiggle off in another direction, losing any advantage over modeeling that the surface is planar.

  • 00:25:00 In this section, the speaker discusses the issues with multi-scale implementation in optical flow. Despite the successful implementation, he mentions that the accuracy of the results decreases as the motion in the image gets larger. One way to handle this issue is to work with smaller images, which reduces the motion per frame. The speaker also discusses the benefit of multi-scale averaging, which involves working with smaller and smaller sets of images to handle large motions. The amount of work required increases with the number of subsets, but the total computational effort is reduced. The speaker emphasizes that the process of multi-scale optimization is more complicated than the simple two-by-two block averaging that was used in the previous lecture.

  • 00:30:00 In this section of the lecture, the speaker discusses how working at multiple scales can greatly improve the results of optical flow computations. He explains that subsampling should be done after low-pass filtering to prevent aliasing, and while one could subsample by a less aggressive factor, such as the square root of 2, it is often ignored in favor of the simpler two-by-two block averaging method. The speaker also mentions several interesting applications of optical flow, such as using time to contact to prevent airplane accidents and to improve spacecraft landing on Jupiter's moon, Europa. He explains how a control system can use time to contact measurements to change rocket engine acceleration and bring down a spacecraft more reliably.

  • 00:35:00 In this section, the lecture discusses a simple system for maintaining a constant time to contact during descent, which can be used in various autonomous situations, such as cars or spacecraft. The basic idea is to adjust the force applied to the engine based on whether the measured time to contact is shorter or longer than desired, in order to keep it constant. This method does not depend on any specific texture or calibration, but rather simply relies on the ratio between height and speed. The equation for this system can be solved as an ordinary differential equation, which has a solution proportional to z.

  • 00:40:00 In this section, the lecturer discusses a constant time-to-contact control system and compares it with a more traditional approach for landing in planetary spacecraft. The constant time-to-contact control system is advantageous as it is more energy-efficient since it constantly keeps the time to contact constant and does not require detailed knowledge about the distance to the surface and the velocity. The lecturer shows the computations of the time to contact under constant acceleration and emphasizes that the time to contact is always half of what is observed using a constant height strategy.

  • 00:45:00 In this section, the lecturer discusses the concept of constant acceleration control and how it compares to traditional approaches for estimating distance and velocities. He then introduces the generalization of optical flow, which is called fixed flow, and explains that it assumes that the motion of all parts of the image is the same. However, in cases where there are independent motions or a small number of unknowns, the system can be over-determined. He also discusses the ill-posed problem of under-constrained systems and how a heavy constraint can be used to solve it.

  • 00:50:00 In this section, the lecturer discusses how neighboring points in an image do not move independently, but rather tend to move at similar velocities, which creates constraints for optical flow. However, this constraint is not a straightforward equation and requires more precise tools to solve. If these tools are not available, the image can be divided into smaller pieces where the assumption of constant velocity in that area is less significant. But this division also creates trade-offs between the resolution and uniformity of brightness in those areas. The lecture also touches on the idea of vanishing points and how they can be used for camera calibration or determining the relative orientation of two coordinate systems.

  • 00:55:00 In this section of the lecture, the professor discusses the projection of a line and how it can be defined in various ways, including algebraically and geometrically. He explains that a line in 3D can be defined by a point and a direction using a unit vector, and that different points on the line have different values of s. The professor goes on to explain how this can be projected into the image using perspective projection, resulting in a messy equation with variables x, y, and z. However, by making s very large, the equation can be simplified and the effects of camera calibration and imaging systems can be studied.

  • 01:00:00 In this section, the speaker talks about vanishing points, which result from lines that converge to a point in the image plane. These vanishing points can be used to learn something about the geometry of the image, which can be applied in real-life scenarios such as warning police officers, construction workers, and other people who may be in danger due to an oncoming car. The camera can determine the rotation of its camera-centric coordinate system relative to the road by finding a vanishing point. Parallel lines have the same vanishing point, meaning that if there is a series of parallel lines that form a rectangular shape, three vanishing points are expected.

  • 01:05:00 In this section, the lecturer discusses two applications of image processing: finding the vanishing points to recover the transformation parameters for camera calibration, and using calibration objects with known shapes to determine the position of a point in the camera-centric system. The lecturer explains that finding the vanishing points enables the recovery of the camera's pan and tilt relative to the direction of the road and the horizon. The lecture also covers the need to recover the position of the lens above the image plane and the height of the center projection for accurate camera calibration. The lecturer suggests using a calibration object with a known shape, such as a sphere, to determine the position of a point in the camera-centric system.

  • 01:10:00 In this section, the lecturer discusses the advantages and disadvantages of using different shapes as calibration objects for optical flow algorithms. While spheres are relatively easy to make and obtain, they can be noisy and not very accurate when projecting them into the image plane. On the other hand, cubes have significant advantages due to their right angles and parallel lines, which correspond to the vanishing points. The lecturer explains how finding the vanishing points could help determine the image projections of three vectors pointing in 3D along the lines. This information can be used to calibrate optical flow algorithms more accurately.

  • 01:15:00 In this section, the speaker talks about finding the unknown center of projection, P, by using a calibration object such as a cube and three vectors: A, B, and C. The three vectors are at right angles to each other, which helps to create three equations that solve for the three unknowns of P. However, the second-order terms in the quadratic equations make it possible to have multiple solutions, which is where the Zoot's theorem comes in. Using the theorem, the speaker shows that the maximum number of solutions is the product of the order of the equations. To simplify the equations, the speaker subtracts them pairwise, leading to three linear equations that can be used to find the unknowns.

  • 01:20:00 In this section, we learn that while there are three linear equations, they are not linearly independent, and thus there are only two solutions. The linear equations define planes in 3D space, and when intersected, result in a line that contains the third plane, which doesn't provide any additional information. This technique is helpful for calibrating a camera and finding the position of the center of projection. However, real cameras have radial distortion parameters that need to be taken into account for real robotics camera calibration.
Lecture 4: Fixed Optical Flow, Optical Mouse, Constant Brightness Assumption, Closed Form Solution
Lecture 4: Fixed Optical Flow, Optical Mouse, Constant Brightness Assumption, Closed Form Solution
  • 2022.06.08
  • www.youtube.com
MIT 6.801 Machine Vision, Fall 2020Instructor: Berthold HornView the complete course: https://ocw.mit.edu/6-801F20YouTube Playlist: https://www.youtube.com/p...
 

Lecture 5: TCC and FOR MontiVision Demos, Vanishing Point, Use of VPs in Camera Calibration



Lecture 5: TCC and FOR MontiVision Demos, Vanishing Point, Use of VPs in Camera Calibration

The lecture covers various topics related to camera calibration, including the use of vanishing points in perspective projection, triangulation to find the center of projection and principal point in image calibration, and the concept of normal matrices for representing rotation in an orthonormal matrix. The lecturer also explains the mathematics of finding the focal length of a camera and how to use vanishing points to determine the orientation of a camera relative to a world coordinate system. Additionally, the use of TCC and FOR MontiVision Demos is discussed, along with the importance of understanding the geometry behind equations in solving problems.

The lecture covers various topics related to computer vision, including the influence of illumination on surface brightness, how matte surfaces can be measured using two different light source positions, and the use of albedo to solve for the unit vector. The lecture also discusses the vanishing point in camera calibration and a simple method to measure brightness using three independent light source directions. Lastly, the speaker touches on orthographic projection as an alternative to perspective projection and the conditions necessary for using it in surface reconstruction.

  • 00:00:00 In this section, the speaker demonstrates the use of TCC and FOR MontiVision Demos on a webcam pointed at a keyboard. They discuss the importance of time-to-contact calculations and the factors that affect those calculations. The speaker also discusses the concept of vanishing points in perspective projection and how they can be used in camera calibration. They explain the equation for time-to-contact calculations and how the sign of dzdt affects the image of moving objects.

  • 00:05:00 In this section, the lecturer discusses the concept of a vanishing point in camera calibration, which is the point on the image plane where a special parallel line goes through the center of projection. The other parallel lines also have vanishing points, and as they move further away, their projection onto the image comes closer to the projection of the special line. This concept allows for the determination of relationships between coordinate systems and camera calibration, which is useful for object recognition in computer vision applications. The lecturer provides an example of a world of rectangular objects with sets of parallel lines that define a coordinate system, which can be projected onto the image plane for calibration.

  • 00:10:00 In this section, the speaker talks about vanishing points and their use in camera calibration. The speaker explains that there are three vanishing points which can be determined accurately by extending parallel lines, and these points can be used to find the center of projection. The center of projection is where the relationship between the coordinate system in the object and the coordinate system in the image plane is established. By connecting the center of projection to the vanishing points in the image plane, three vectors can be created, and these vectors can be used to find the point where the directions to the vanishing points are right angles to each other. The speaker notes that the locus of all the places you could be from which the vanishing points will be at right angles to each other is a circle.

  • 00:15:00 In this section, the lecturer discusses the 3D version of TCC and camera calibration. He explains that the constraint on the position of the center of projection is that it lies on a sphere, and how to use spheres to narrow down the possibilities for the center of projection. The lecturer then discusses linear equations and straight lines, as well as parameterizing straight lines through theta and rho. The parameterization is useful as it avoids singularities and provides a two-degree of freedom world for lines.

  • 00:20:00 In this section, the lecturer discusses the representation of planes in three dimensions using linear equations with three unknowns. He explains that there are actually only three degrees of freedom, rather than four, due to a scale factor. This duality means that there is a mapping between planes and points in 3D, similarly to the mapping between lines and points in 2D. The lecturer then introduces the problem of camera calibration, comparing it to the problem of multilateration in robotics, which involves intersecting three spheres.

  • 00:25:00 In this section, the speaker explains how to solve for the intersection point of two spheres in 3D space. The first sphere is defined as having an equation with second order terms, which could result in up to eight possible solutions. However, by subtracting this equation from a second sphere, a linear equation can be obtained instead. By repeating this process for all sphere pairs, three linear equations can be created, with three unknowns which can then be solved. While this seems like a perfect solution, it is important to note that the matrix created by this method is often singular, and therefore non-unique in its solution.

  • 00:30:00 In this section, the speaker discusses the issue of manipulating equations and losing important information in the process. He explains that while it is perfectly fine to derive new equations, one must be careful not to throw away the original equations as they may still contain crucial information needed to solve the problem. He demonstrates this using the example of linear and quadratic equations, and how some equations can be thrown away while others must be kept in order to get the desired number of solutions. The speaker also highlights the importance of understanding the geometry behind the equations, as it can provide valuable insights that may not be immediately evident just from the algebra.

  • 00:35:00 In this section of the transcript, the speaker discusses triangulation and how to find the center of projection and the principal point in image calibration. They explain that the center of projection can be found using three known points which yields three planes, and the center can be found at their intersection. To find the principal point, they drop the perpendicular from the center of projection into the image plane. They also discuss the vanishing points which can be used to detect if an image has been modified or cropped.

  • 00:40:00 In this section, the lecturer discusses the use of vanishing points in photogrammetry and camera calibration. He explains how vanishing points can be used to determine the authenticity of images and explores the various hoaxes related to exploration. He then delves into the mathematics of finding the third component of a vector and solving a quadratic equation to determine focal length. He goes on to explain a special case where focal length can be determined without the need for solving a quadratic equation. The video is part of a lecture series on the technical aspects of computer vision.

  • 00:45:00 In this section, the speaker discusses the application of vanishing points in camera calibration specifically for determining the orientation of a camera relative to a world coordinate system. The speaker explains that by identifying features such as the curb and road markings in the image, which are supposedly parallel, they can produce a vanishing point that can be recognized in the image. The speaker also explains that in the ideal case where all three vanishing points are available, the edges of the rectangular object being captured by the camera can be used to define the x and y axes and subsequently determine the rotation between the camera coordinate system and the world coordinate system.

  • 00:50:00 In this section, the speaker explains the process of finding the unit vectors in the object coordinate system measured in the camera coordinate system. The unit vectors must be at right angles to each other and are then used to compute the TCC and FOR MontiVision Demos. The transformation matrix represents the orientation of one coordinate system relative to the other, and the speaker says that they will be doing more of this in the future.

  • 00:55:00 In this section, the lecturer discusses the concept of a normal matrix, where the rows are perpendicular to each other, and the magnitude of each row is one. The purpose of this is to represent rotation in an orthonormal matrix. By determining the direction of the coordinate axes in the object, it is relatively easy to go back and forth between two coordinate systems, which is particularly useful for camera calibration. Finally, the lecture touches on the concept of brightness, where observed brightness is dependent on material surface, light source, incident and emergent angles, and azimuth angles.

  • 01:00:00 In this section of the video, the speaker discusses the concept of illumination and how it affects the apparent brightness of surfaces. They explain that the power that a surface gets from a light source is affected by the angle at which the surface is tilted relative to the light source direction, which can be calculated using the cosine of the angle. The speaker then introduces the idea of a matte surface, which reflects light in various directions but has the special property that it appears equally bright from any direction. They go on to discuss how to determine the orientation of such a surface by measuring its brightness with two different light source positions.

  • 01:05:00 In this section, the speaker discusses the non-linearity involved in solving for n, which is a unit vector. By using measurements of brightness, cosine theta i can be estimated, and the cone of possible directions of the surface normal can be determined. If two separate measurements are taken, two cones of directions are created, and only the intersection of those cones, consisting of two possible directions, gives a normal direction. However, the constraint that it has to be a unit normal means that those two possible directions must now be intersected with a unit sphere to make a final determination. The speaker explains that by using albedo, which defines the reflectivity of a surface, a linear equation problem can be created to determine how bright something is in the image plane. The albedo value ranges from zero to one and indicates how much of the energy going into an object is reflected back versus how much is absorbed and lost.

  • 01:10:00 In this section, the lecture discusses the use of the vanishing point (VP) in camera calibration. The lecture introduces a three-vector that encapsulates the unknowns and solves for the albedo and unit vector through the matrix multiplication of the vector with the light source positions. However, this method is limited when the light sources are coplanar, meaning they are in the same plane, or if two rows of the matrix are the same, in which case it is impossible to invert the matrix. The lecture also notes the implications of these constraints for astronomers, as they need to ensure light sources are not in the same plane.

  • 01:15:00 In this section, the speaker discusses a simple method to measure brightness using three independent light source directions, which can be pre-computed and efficiently implemented. It is suggested that exploiting the three sets of sensors in a camera (RGB) can be useful for this purpose. A lookup table can be built to calibrate surfaces based on the known shape of a sphere and its surface orientation can be calculated to measure brightness in three images. However, real surfaces do not follow this simple rule and a lookup table can be used to invert the numerical values for surface orientation. Lastly, the speaker touches on orthographic projection as an alternative to perspective projection.

  • 01:20:00 In this section, the speaker explains the conditions necessary for using orthographic projection in reconstructing surfaces from images. He shares that the assumption is based on the range in depth being very small compared to the depth itself, allowing for the constant magnification which is required for this projection. The orthographic projection is used for simplification in the process of reconstructing surfaces from images.
Lecture 5: TCC and FOR MontiVision Demos, Vanishing Point, Use of VPs in Camera Calibration
Lecture 5: TCC and FOR MontiVision Demos, Vanishing Point, Use of VPs in Camera Calibration
  • 2022.06.08
  • www.youtube.com
MIT 6.801 Machine Vision, Fall 2020Instructor: Berthold HornView the complete course: https://ocw.mit.edu/6-801F20YouTube Playlist: https://www.youtube.com/p...
 

Lecture 6: Photometric Stereo, Noise Gain, Error Amplification, Eigenvalues and Eigenvectors Review



Lecture 6: Photometric Stereo, Noise Gain, Error Amplification, Eigenvalues and Eigenvectors Review

Throughout the lecture, the speaker explains the concepts of noise gain, eigenvalues, and eigenvectors when solving systems of linear equations in photometric stereo. The lecture discusses the conditions for singular matrices, the relevance of eigenvalues in error analysis, and the importance of linear independence in avoiding singular matrices. The lecture concludes with a discussion of Lambert's Law and surface orientation, and highlights the need to represent surfaces using a unit normal vector or points on a unit sphere. Overall, the lecture provides insight into the mathematical principles underlying photometric stereo and highlights the challenges of accurately recovering the topography of the moon from earth measurements.

In Lecture 6 of a computational photography course, the speaker discusses how to use the unit normal vector and the gradients of a surface to find surface orientation and plot brightness as a function of surface orientation. They explain how to use the p-q parameterization to map possible surface orientations and show how a slope plane can be used to plot brightness at different angles of orientation. The speaker also discusses how to rewrite the dot product of the unit vector of the light source and the unit normal vector in terms of the gradients to find the curves in pq space where that quantity is constant. The lecture ends with an explanation of how cones created by spinning the line to the light source can be used to find conic sections of different shapes.

  • 00:00:00 In this section of the video, the lecturer discusses noise gain in the 1D case, where there is one unknown and one measurement, and explains that if the curve has low slope, a small error can be amplified into a large area. Moving on to the 2D case, the discussion shifts to eigenvectors and eigenvalues, which are characteristic of a matrix and indicate if the vector obtained from multiplying the matrix is pointing in the same direction as the vector that was used to multiply the matrix. The lecturer provides details on how to find these vectors and how many there are, stating that the size and scale of the vectors don't matter, and that there can be more than one eigenvector.

  • 00:05:00 In this section, the speaker discusses the concept of a singular matrix and its relevance in solving systems of linear equations. A singular matrix is one in which the determinant is zero. For an n-by-n real symmetric matrix, the determinant is an nth order polynomial in lambda, with n roots. This means that in the case of a homogeneous set of equations, there are multiple solutions, rather than a unique solution, if the determinant is zero. This is important when dealing with multi-dimensional problems such as optical mouse recovery, where the error in certain directions may be different from other directions. Thus, a more nuanced picture is needed beyond just identifying a small determinant as problematic.

  • 00:10:00 In this section of the lecture, the speaker discusses homogeneous equations and their interesting properties, including the condition for a set of homogeneous equations to have a non-trivial solution. The determinant of the matrix is also discussed, as well as the eigenvalues and eigenvectors. The eigenvectors will be special directions in which the property of the eigenvalues holds, and they will be orthogonal. The eigenvalues will determine how much the error will be amplified, which is important for measuring error in practice. Though finding eigenvalues and eigenvectors for large matrices is often done using software, it is useful to understand the process at a basic level.

  • 00:15:00 In this section, the speaker discusses eigenvectors and eigenvalues in solving homogeneous equations for a 2x2 case. To find eigenvectors, the speaker shows that the solutions should be perpendicular to the rows of the matrix. The result gives four eigenvectors pointing in the same direction for different values of lambda, and they can be normalized to get unit eigenvectors. The technique can be extended to an n by n matrix, which provides n eigenvectors and corresponding eigenvalues to discuss error amplification.

  • 00:20:00 In this section, the lecturer explains how to extend the dot product notation to matrices and shows that if the eigenvalues are all different, then all of the eigenvectors are orthogonal. He also mentions that if some of the roots are the same, this doesn't force the eigenvectors to be orthogonal, but he can select two out of all possible eigenvectors that are orthogonal to each other. This helps in constructing a basis for the vector space. The lecturer also talks about how to think of vectors as column vectors or skinny matrices and shows how the dot product can be written in both ways.

  • 00:25:00 In this section, the lecturer discusses eigenvectors and how they can be used to re-express any vector in terms of them. By taking an arbitrary vector measurement and multiplying the matrix by that measurement to obtain unknown variables, different components can be magnified by different amounts along the special directions of the eigenvectors. This is known as the error gain. However, they are also dealing with inverse problems where the inverse matrix is used, and so the lecturer introduces the dyadic product of n vectors to apply the idea.

  • 00:30:00 In this section, the speaker talks about eigenvectors and eigenvalues, and how they can be used to rewrite a matrix in various ways. They explain that these terms are all dependent, but the eigenvectors themselves are not, so they can be factored out. They go on to discuss how this approach can be used to check the properties of the eigenvalues, and why this is important in solving a vision problem. Specifically, they explain that the matrix used to solve this problem often multiplies components of the signal by 1 over lambda i, so if lambda i is small, it can create an ill-posed problem that is not stable.

  • 00:35:00 In this section, the lecturer discusses eigenvectors and eigenvalues in the context of error analysis. He explains that if one of the eigenvectors has a small eigenvalue, even a small error in measurement can result in a large change in the result. The direction of the isophote corresponds to the eigenvector with a small eigenvalue, making it difficult to detect accurate motion, whereas the gradient direction is more forgiving. The lecturer then moves on to discuss photometric stereo, a technique for recovering surface orientation by taking multiple pictures of an object under different lighting conditions. He explains that the albedo parameter is used to describe how much light the surface reflects and that it can help constrain surface orientation.

  • 00:40:00 In this section, the lecturer explains the process of using different light sources to obtain three measurements so that a problem with three unknowns and three measurements can be introduced. This allows for the disambiguation of the orientation of the image by using linear equation solving methods, which results in a simple and cheap way to compute the solution. The lecturer notes that finding the two solutions arises from a quadratic, which can be avoided by using the dot product notation to convert the unit vector into an arbitrary 3-vector. Additionally, the video mentions the importance of linearly independent rows to avoid singular matrices.

  • 00:45:00 In this section of the lecture, photometric stereo, error amplification, and eigenvalues and eigenvectors are discussed. The redundancy of measurements when the sum of the light sources is zero is explored, and it is shown that if three vectors in three-dimensional space are coplanar, then the method will fail. However, if they are not coplanar and are placed at right angles to each other, the results will be more reliable. The lecture also references the use of photometric stereo to create topographic maps of the moon based on different illuminations from the sun.

  • 00:50:00 In this section of the lecture, the professor discusses the challenges of trying to obtain the topography of the moon from earth measurements. Although it is possible to take measurements at different positions in the moon's orbit, this method does not work because the vectors are nearly coplanar. The professor also talks about the lambertian assumption, which assumes that an object has a perfectly diffuse and uniform reflectance, but notes that it is not the case with the surface of the moon. However, this assumption is useful for comparing two illumination intensities, which can be achieved by illuminating one side with one source and the other side with another source and then balancing it so that the two sides appear equally bright when looked at from the same angle.

  • 00:55:00 In this section of the lecture, the professor discusses the experiments conducted by Lambert which led to the discovery of Lambert's Law, which explains how surfaces reflect light when illuminated from different angles. The law states that the brightness is proportional to the cosine of the incident angle. The discussion also highlights the need to talk about surface orientation and how it can be represented using a unit normal vector or by points on a unit sphere. The professor mentions that this phenomenological model is a postulated behavior and not an exact representation of real surfaces. The section ends by introducing a Taylor series expansion.
  • 01:00:00 In this section of the video, the speaker discusses the relationship between the unit normal notation and the gradient notation in computational problems. They explain how to switch back and forth between the two notations and give examples of how this is helpful for solving problems in different domains, such as Cartesian coordinates and polar coordinates. The speaker also shows how to find tangents in a surface and explains how to use the direction of those tangents to find the relation between the unit normal and p and q, which represent the gradients on the surface.

  • 01:05:00 In this section, the lecturer discusses how to map all possible surface orientations using the unit normal vector of the surface, and how this information is useful for machine vision. The cross product of two tangent vectors lying in the surface gives the direction of the unit normal vector, which can then be normalized to get the direction of the surface. By projecting the surface orientations into a 2D plane using the p-q parameterization, one can visualize all possible surface orientations. Points on this plane correspond to different p and q values and therefore different surface orientations, including the floor and any surface above the floor with the same orientation. The lecturer notes that although machine vision can recover surface orientation, patching together these orientations to make a complete surface is a separate, but over-determined problem.

  • 01:10:00 In this section of the video, the speaker explains how a slope plane can be used as a tool to plot brightness as a function of surface orientation in machine vision. Each point on the plane corresponds to a particular surface orientation, and the brightness values can be determined experimentally from a patch of material at different angles of orientation. However, a single measurement of brightness can't recover two unknowns, and multiple measurements are needed to pin down the orientation of the surface element. This concept is then related to photometric stereo and the Lambertian surface, wherein brightness is proportional to the cosine of the incident angle, and isophotes are looked for in the slope plane.

  • 01:15:00 Here he discussing rewriting the direction to the light source in a different way to fully perform the same transformation on the unit vector as on n. This introduces a point where the incident light rays are parallel to the surface normal, called psqs, which is in the plane and gives the brightest surface for the Lamborghini. By rewriting the n dot s in a specific form, they can determine the curves in pq space where that quantity is constant. After multiplying it all out, they are left with a second-order equation in p and q, which corresponds to a conic section. Examples given are parabola and ellipse.

  • 01:20:00 In this section, the speaker discusses a diagram that can be used for graphics, where a surface is plotted along with a diagram that contains a set of isofoads for various types of surfaces, including parabolas, ellipses, circles, lines, points, and hyperbolas. The brightness of the surface is read from the diagram and used as a gray level or color in the plotted image. The unit normal can be obtained from the surface and used to determine the point on the isofoads. The diagram changes when the light source is moved, so it is important to determine the point of intersection of two sets of isofoads to get a unique solution. Three light sources are used instead of two, as having two light sources may result in finite solutions instead of a single solution.

  • 01:25:00 In this section, the speaker explains how the line to the light source can be spun around to create cones and different angles, creating nested cones. These cones can be cut by a plane, resulting in conic sections that aren't always ellipses, but also hyperbolas and even parabolas. The speaker also clarifies that cosine theta cannot be negative in practice and leaves the question of where the curve turns from being a closed curve to open as a puzzle for future homework problems. The lecture concludes with a reminder to sign up on Piazza for homework and announcement updates.
Lecture 6: Photometric Stereo, Noise Gain, Error Amplification, Eigenvalues and Eigenvectors Review
Lecture 6: Photometric Stereo, Noise Gain, Error Amplification, Eigenvalues and Eigenvectors Review
  • 2022.06.08
  • www.youtube.com
MIT 6.801 Machine Vision, Fall 2020Instructor: Berthold HornView the complete course: https://ocw.mit.edu/6-801F20YouTube Playlist: https://www.youtube.com/p...
 

Lecture 7: Gradient Space, Reflectance Map, Image Irradiance Equation, Gnomonic Projection



Lecture 7: Gradient Space, Reflectance Map, Image Irradiance Equation, Gnomonic Projection

This lecture discusses gradient space, reflectance maps, and image irradiance equations. The lecturer explains how to use a reflectance map to determine surface orientation and brightness for graphics applications, and how to create a numerical mapping from surface orientation to brightness using three pictures taken under different lighting conditions. They also introduce the concept of irradiance and its relationship to intensity and radiance, as well as the importance of using a finite aperture when measuring brightness. Additionally, the lecture touches on the three rules of how light behaves after passing through a lens, the concept of foreshortening, and how the lens focuses rays to determine how much of the light from a patch on the surface is concentrated into the image.

In this lecture, the speaker explains the equation for determining the total power delivered to a small area in an image, which takes into account solid angles and cosine theta. They relate this equation to the f-stop in cameras and how aperture size controls the amount of light received. The speaker also discusses image irradiance, which is proportional to the radiance of objects in the real world, and how brightness drops off as we go off-axis. They move on to discuss the bi-directional reflectance distribution function, which determines how bright a surface will appear depending on the incident and emitted direction. The lecturer explains that reflectance can be measured using a goniometer and that realistically modeling how an object reflects light is important. They also explain the concept of the Helmholtz reciprocity for the bi-directional reflectance distribution function. The lecture then moves on to discuss applying gradient space to surface material models and reminds students to keep updated on homework information.

  • 00:00:00 In this section, the concept of gradient space is introduced to explore what determines brightness in an image. The brightness is usually dependent upon illumination and geometry, like surface orientation, thus making it necessary to mention the orientation of the surface patch for determining the brightness. Mention is also made of unit normals, and p and q, which are just convenient shorthands for the slopes in the image. The brightness of a Lambertian surface is debatable, depending on the orientation of the surface in question. Many matte surfaces are approximations of a Lambertian surface, and such approximations can seem handy. However, most cosmic and microscopic situations are not appropriate for such approximations.

  • 00:05:00 In this section of the lecture, the speaker discusses the concept of the reflectance map, a diagram that shows how bright a surface is supposed to look based on its orientation. This diagram can be used to determine surface orientation and brightness for graphics applications. The speaker then goes on to explain how this concept can be extended to non-Lambertian surfaces and how to build a lookup table for determining brightness based on surface orientation. Additional information and constraints can be used to further refine the estimation of surface orientation.

  • 00:10:00 In this section, the lecturer discusses how to use a calibration object, such as a sphere, for image calibration. By taking an image of a lit-up sphere from all sides and fitting a circle to it, one can estimate the center and radius of the image. For spheres, there is a convenient relationship where a point to the surface and a unit vector are parallel, making it easy to determine the surface orientation. This method can also be used for the Earth, with some modifications to the definition of latitude. By computing p and q using the formula from the previous lecture, one can determine n and surface orientation for each point in the image.

  • 00:15:00 In this section, the lecture discusses the process of building a numerical mapping from surface orientation to brightness in three pictures taken under different lighting conditions. The goal is to use this information to calculate the surface orientation when later taking three images of an object under the same lighting conditions. The lecturer explains the implementation of this process, which involves creating a three-dimensional array in the computer where each box has p and q values. The images are then quantized to discrete intervals and used to put information into the array. The lecture also addresses issues such as quantization effects and empty cells that may never get filled in.

  • 00:20:00 In this section, the speaker explains Gradient Space, which is a 2D space being mapped into a 3D space without actually filling that space. Instead, a surface is formed in that space, and we can address points on that surface using p and q. When we go from two images to three, we introduce the albedo factor, which scales linearly with e1 e2 e3. Calibration objects are painted white, and measurements are made, generating definitions of the surface for rho equal to one. However, for other rows, we can fill in the cubes and generate other surfaces. The lookup table where entries are placed includes p qand row, a 3D to 3D lookup table. If something goes wrong, it is reflected as some other value than one for the albedo rho, indicating an error or an unexpected blockage of one of the three light sources. The method helps recognize shadow casting or, for reflective surfaces that are too close or are placed as in overlapping donut shapes, segmenting and breaking down the image into parts.

  • 00:25:00 In this section of the lecture, the speaker discusses ways to segment cast shadows and areas of high reflection using gradient space and reflectance maps. There is a methodical way of filling in table values with corresponding voxel values. The speaker also introduces the concept of irradiance, which is the power per unit area of a light source hitting a surface. This concept is not very useful in the context of image processing, as we are not exposing the sensor directly to the illumination. The speaker explains that there is terminology for the quantity of emitted power divided by area, but it is useless for image processing.

  • 00:30:00 In this section, the speaker explains the concept of intensity and its meaning in terms of measuring how much radiation is going in a certain direction using a point source. The solid angle is defined to normalize the measurement, and its units are measured in steradians, which is similar to radians in 2D but projected into three space. The solid angle allows for the measurement of a set of directions in any shape, where the possible directions around the speaker equals four pi steradians. Additionally, the speaker touches on the importance of accounting for cases where the surface area is inclined relative to the center of the sphere due to the object's foreshortening phenomenon, such as when the lens of a camera is tilted relative to an off-center subject.

  • 00:35:00 In this section of the video, the concept of intensity and radiance are explained. Intensity is defined as power for a solid angle, while radiance is power per unit area per unit solid angle. Radiance is the more useful quantity when it comes to measuring what reaches an observer or camera from a surface. In the image plane, brightness is measured as irradiance, which is the brightness we measure in terms of the radiance of the surface.

  • 00:40:00 In this section, the lecturer discusses the relationship between measuring energy and power, and how they are proportional to each other. He also talks about the importance of using a finite aperture when measuring brightness, and the problems that arise when using the pinhole model. The lecturer introduces the ideal thin lens and its three rules, including the central ray being undeflected, and the ray from the focal center emerging parallel to the optical axis. He explains how lenses provide the same projection as the pinhole while giving a finite number of photons, and the penalty for using them at a certain focal length and distance.

  • 00:45:00 In this section, the video explains the three rules of how light behaves after passing through a lens. Rule number one claims that any ray from the focal center, after going through the lens, will be parallel to the optical axis. Rule number two states that a parallel array from the right will go through the focal center. Finally, rule number three is a combination of the first two rules. The video uses similar triangles to derive the lens formula, which allows for determining the focus and length of the lens. Although lenses are impressive analog computers that can redirect rays of light, they cannot achieve a perfect redirect due to the physical limitations of the lens.

  • 00:50:00 In this section, the video discusses how lenses deal with rays coming from various directions, and how trade-offs exist between different kinds of defects, such as radial distortion. The video also explains the concept of irradiance and object radiance, and how a diagram of a simple imaging system can be used to determine how much power is coming off an object patch and how much ends up in an image patch through illumination. Additionally, the video notes the assumption that flat image planes and lenses are used in cameras.

  • 00:55:00 In this section of the lecture, the speaker discusses how to relate the foreshortening effect of the unit vector on the surface of an object to the incident light onto the image sensor. He writes down a formula for solid angle and takes into account the foreshortening effect by multiplying by cosine alpha and dividing by f secant alpha squared. He then relates the irradiance in the image to the total energy coming off that patch and the area delta i. Finally, he talks about how the lens focuses the rays and how the solid angle that the lens occupies when viewed from the object determines how much of the light from that patch on the surface is concentrated into the image.
  • 01:00:00 In this section of the lecture, the speaker explains the equation for the total power delivered to a small area in an image, which takes into account the solid angle and cosine theta. The power per unit area is then found by dividing the total power by the area, which is what is actually measured. The speaker also relates this equation to the f-stop in cameras, which determines how open the aperture is and therefore controls the amount of light received. The aperture size is usually measured in steps of square root of 2, and the image irradiance goes inversely with the square of the f-stop.

  • 01:05:00 In this section, the speaker discusses how image irradiance, which is the brightness in the image, is proportional to the radiance of objects in the real world. The brightness of the surface radiance is proportional to the brightness in the image irradiance, making it easy for us to measure brightness in the image. However, brightness drops off as we go off-axis, represented by cosine to the fourth alpha, which must be taken into account when using a wide-angle lens. Although this effect is not very noticeable, it can be compensated for in the image processing chain. This formula justifies the idea of measuring brightness using gray levels in the image and shows that it has something to do with what is in the real world.

  • 01:10:00 In this section, the lecturer explains the concept of bi-directional reflectance distribution function, which determines how bright a surface will appear depending on the incident and emitted direction. The lecturer reveals that the reflectance ratio is not as simple as saying white reflects all light coming in, and black reflects none of it. The lecturer also discussed the customary use of polar and azimuth angles to specify the direction of light coming in or light going out. The bi-directional reflectance distribution function is essential in determining reflectance, and it measures the power going out divided by the power going in.

  • 01:15:00 In this section of the lecture, the speaker discusses reflectance, which is defined as how bright an object appears when viewed from a specific position divided by how much energy is being put in from the source direction. The speaker explains that reflectance can be measured using a goniometer, which is an angle measurement device that helps explore a four-dimensional space. The speaker notes that many surfaces only require the difference between two angles to accurately measure reflectance, making the process simpler for certain objects. Realistically modeling how an object reflects light is important, and measuring reflectance allows for this realistic modeling rather than just approximating with a well-known model.

  • 01:20:00 In this section, the professor discusses materials that require the full four-dimensional model to calculate their appearance, such as iridescent items with microstructures that produce color through interference, and semi-precious stones like tiger eyes, which have tightly packed microstructures on the scale of the wavelength of light. The professor also introduces the concept of the Helmholtz reciprocity for the bi-directional reflectance distribution function, which states that if you interchange the incident and emitted light, you should get the same value, making data collection easier.

  • 01:25:00 In this section, the speaker discusses a technique used by a professor during a debate. The speaker initially thought the professor was highlighting their lack of knowledge by referencing a book in German, but later realized it was just a debating technique. The lecture then moves on to discuss applying gradient space to surface material models to determine surface shade on objects such as the moon and rocky planets in our solar system. The speaker also reminds students to keep up to date on any extensions or important information regarding the homework through Piazza.
Lecture 7: Gradient Space, Reflectance Map, Image Irradiance Equation, Gnomonic Projection
Lecture 7: Gradient Space, Reflectance Map, Image Irradiance Equation, Gnomonic Projection
  • 2022.06.08
  • www.youtube.com
MIT 6.801 Machine Vision, Fall 2020Instructor: Berthold HornView the complete course: https://ocw.mit.edu/6-801F20YouTube Playlist: https://www.youtube.com/p...
 

Lecture 8: Shading, Special Cases, Lunar Surface, Scanning Electron Microscope, Green's Theorem




Lecture 8: Shading, Special Cases, Lunar Surface, Scanning Electron Microscope, Green's Theorem

In this lecture, the professor covers several topics related to photometry and shading. He explains the relationship between irradiance, intensity, and radiance and how they are measured and related. The lecture also introduces the bi-directional reflectance distribution function (BRDF) to explain how illumination affects the orientation and material of a surface. The lecturer further discusses the properties of an ideal lambertian surface and its implications for measuring incoming light and avoiding confusion when dealing with Helmhotz reciprocity. The lecture also covers the process of converting from gradient to unit vector and how it relates to the position of the light source. Finally, the lecture explains how measuring brightness can determine a surface's steepness or slope direction.

The lecture covers various topics related to optics and computer vision. The professor discusses using shape from shading techniques to obtain a profile of an object's surface to determine its shape. He then switches to discussing lenses and justifies the use of orthographic projection. The lecturer also talks about removing perspective projection in machine vision by building telecentric lenses and demonstrates various tricks to compensate for aberrations due to glass's refractive index variation with wavelengths. Finally, the speaker introduces the concept of orthographic projection, which simplifies some of the problems associated with perspective projection.

  • 00:00:00 In this section, the lecturer reviews key concepts from the previous lecture on photometry. He defines irradiance, intensity, and radiance and explains how they are measured and related. He then introduces the relationship between the radiance of a surface and the irradiance of the corresponding part of an image, which can be used to talk about brightness both out in the world and inside a camera. The lecturer explains how this relationship is affected by the aperture on the lens, which limits the solid angle and area of the image.

  • 00:05:00 In this section, the focus is on determining the radiance of a surface in relation to the amount of illumination, geometry, and material. The bi-directional reflectance distribution function (BRDF) is introduced to explain how illumination affects the orientation and material of a surface. The BRDF is a function of the incident direction and emitted direction of light, which can be calculated by computing the total output power divided by the total input power. In addition, the BRDF has to satisfy a constraint, wherein it must come out the same if the directions to the source and the viewer are interchanged. While some models of surface reflectance violate this constraint, it is not critical to human or machine vision, making it a shortcut in reducing the number of measurements needed to be taken.

  • 00:10:00 In this section of the lecture, the professor discusses the properties of an ideal lambertian surface: it appears equally bright from every viewing direction, and if it's an ideal lambertian surface, it also reflects all the incident light. The professor explains this simplifies the formula since it's not going to depend on two of the four parameters. He then discusses how to deal with distributed sources like the lights in a room and integrating over a hemisphere of incident directions. The professor explains we need to integrate over all emitted directions and how to calculate the area of the patch by using the polar angle and azumith. Finally, he mentions that the f term is constant.

  • 00:15:00 In this section, the lecture discusses the concept of shading and the reflection of light on a surface. The lecture highlights that the light falling on a surface depends on the incoming radiation and the angle of incidence. It is said that all of the light gets reflected, and the power deposited on the surface is e cosine theta i times the area of the surface. Therefore, when the reflected light is integrated, it is equal to the incoming light. The lecture calculates the constant value of f for the inversion surface and concludes that f is 1 over pi for the lambertian surface. It is noted that the reflected energy is not radiated equally in all directions, and it is explained how foreshortening impacts the power emitted from a surface.

  • 00:20:00 In this section of the lecture, the professor discusses the concept of a Lambertian surface, which is a surface that radiates light equally in all directions. However, when dealing with a surface that is large and at an angle from the light source, the area of the surface element shrinks, and as a result, the power per unit area becomes infinite. To avoid retinal damage, the surface radiates less in certain directions, but the power per unit area stays constant. This condition means that the surface actually radiates more in certain areas and less in others, resulting in a ratio of one over pi instead of one over 2 pi. The lecture then goes on to explain how to use this knowledge to measure incoming light and avoid confusion when dealing with Helmhotz reciprocity.

  • 00:25:00 In this section, the lecturer introduces a type of surface that is different from a Lambertian surface and is quite important in many applications. This type of surface is one over the square root of cosine theta i times cosine theta e, and it satisfies Helmholtz reciprocity. The radiance of this type of surface is affected by foreshortening, and it is used to model the surfaces of the lunar and rocky planets as well as some asteroids. The lecture explains how to determine the isophotes of this surface, which are nested circles in 3D space, but are projected as ellipses in the image plane, giving insight on brightness contour maps.

  • 00:30:00 In this section, the speaker discusses the difficulty in finding the way to shade a certain material in 3D space. They explain that the previous method used in a lab won't work for this material, so a new approach is needed. The speaker then demonstrates using unit normals to find the constant values of all points on the surface, which must be perpendicular to a fixed vector. He then shows that this implies that all unit vectors on the surface with the same brightness must lie in a plane, revealing useful information about the material. Finally, the speaker uses spherical coordinates to try and gain a better understanding.

  • 00:35:00 In this section, the lecturer discusses how to choose a coordinate system when dealing with the shading of the lunar surface, as having a good system in place can prevent an algebraic mess. They recommend using a coordinate system where the sun and earth are at z=0, simplifying calculations to only one unknown. The lecture also briefly touches on the appearance of the full moon, where the disk should be uniformly bright, but due to its non-Lambertian microstructure, it does not look completely spherical. The Hakka model is a good one to predict this type of behavior. Finally, the lecture dives into the formula for n dot s over n dot v, ultimately arriving at a simplified version using spherical coordinate vectors.

  • 00:40:00 In this section, the lecturer discusses the relationship between the brightness and azimuth of the lunar surface. They explain that all points on the surface with the same brightness have the same azimuth, and that lines of constant longitude are isophodes. This is very different from a lambertian surface. Despite the moon having an albedo equal to coal, it appears very bright in the sky due to the lack of comparison objects to measure its reflectance. However, we can use photometric stereo to determine the surface orientation of the moon, and potentially even its shape, by taking multiple pictures of the surface under different illumination conditions. The Hopkin model is used to describe surface orientation in terms of the gradient.

  • 00:45:00 In this section, the lecturer discusses the process of converting from gradient to unit vector and how it relates to the position of the light source. They explain that the square root is necessary to ensure satisfaction of Helmholtz, and in taking the ratio of certain dot products, a linear equation is obtained for the isophotes that can be plotted in pq space. The lecturer notes that while these lines are not equally spaced due to the square root, they are parallel, and there is one line where brightness is zero, indicating a 90-degree turn away from the incoming radiation. Overall, this section covers the mathematical concepts underlying the calculation of isophotes and the relationship between position and brightness of light sources in a given space.

  • 00:50:00 In this section, the lecturer discusses the advantages of linear shading in photometric stereo, which enables easy solving of various problems. With two different lighting conditions, the two linear equations intersect, and the point of intersection is the surface orientation. The lecturer notes that there is no ambiguity with Lambertian shading, a problem with the previous method, where there were up to four solutions. The lecturer also demonstrates that the first spatial derivatives rotate the same way as the coordinate system, and this is beneficial in determining the surface orientation in a particular direction without knowing the entire orientation of the surface.

  • 00:55:00 In this section, the lecturer explains how measuring brightness can determine a surface’s steepness or slope direction, allowing researchers to gather a profile of a surface by measuring the brightness or reflectivity of points vertically and horizontally. The process requires an initial condition to start, which is measuring the surface's brightness and incrementally finding z. However, the accuracy of the measurement can be affected by variation in reflectivity and inaccuracies in measuring brightness.

  • 01:00:00 In this section, the professor discusses how to obtain a profile of an object's surface to determine its shape using shape from shading techniques. He explains how, by running a profile across an object, he can get the shape of the profile as long as he knows the initial value. However, he cannot get the profile's absolute vertical position if he does not know the initial value. He then applies this technique to the moon to get various profiles of the surface to explore the shape of the object. The professor also talks about heuristics to stitch together 3D surfaces from the profiles. Later, he switches topics to talk about lenses and justifies the use of orthographic projection.

  • 01:05:00 In this section, the lecturer discusses how compound lenses, consisting of multiple elements, compensate for aberrations through carefully designed arrangements. He notes that glass's refractive index varies with wavelengths, causing chromatic aberrations, but compound lenses of different materials can compensate for this. The lecturer explains how thick lenses can be approximated using nodal points and principal planes, and how a neat trick of making t (thickness between nodal points) negative can result in a short telephoto lens. This technique can significantly reduce the length of a telephoto lens while maintaining its long focal length and small field of view.

  • 01:10:00 In this section, the lecturer demonstrates two tricks to remove perspective projection in machine vision. The first trick involves moving one of the nodes to infinity, which reduces the effect of varying magnification with distance. By building a telecentric lens with a far distant center of projection, the cone of directions becomes more parallel, and the magnification remains constant regardless of distance. The second trick involves moving the other node, which changes the magnification when the image plane is not in exactly the right place. To achieve a sharp image, the lens needs to be focused by changing the focal length of the glass or moving the lens relative to the image plane.

  • 01:15:00 In this section of the lecture, the speaker discusses the issues with the cosine to the fourth law and changing magnification when the center of projection is not at plus infinity. He explains how moving the nodal point way out and using double telecentric lenses can eliminate these issues, as it causes the radiation to reach a particular sensor perpendicular to the sensor. Additionally, the speaker discusses the need for little lens lids to concentrate the incoming light into a smaller area and avoid aliasing, which can occur when there are high frequency components in the signal. Finally, the speaker mentions the relevance of low-pass filtering and the importance of only sampling signal twice the bandwidth of the signal to reconstruct it perfectly.

  • 01:20:00 In this section, the lecturer discusses how low pass filtering with block averaging can reduce aliasing problems when using a lenslet array to measure light from a large area. This method works well if the light comes in perpendicular to the sensor, which is achieved by using telecentric lenses. However, the lecture then goes on to explain that in certain cases, such as when the changes in depth in a scene are smaller than the depth itself, it is more convenient to use orthographic projection. This enables a linear relationship between x and y in the world and x and y in the image, allowing for measurement of distances and sizes of objects independent of how far away they are.

  • 01:25:00 In this section, the speaker introduces the concept of orthographic projection, which is useful for practical applications with telecentric lenses and simplifies some of the problems that will be discussed. They note that while some may think this method only works for Lamborghini, it actually works for everything, but the equations get messy for other versions. The speaker explains that the kind of reconstruction they will address next can be done under perspective projection, but it is complicated and not very insightful. However, by changing to orthographic projection, many of these problems become clearer.
 

Lecture 9: Shape from Shading, General Case - From First Order Nonlinear PDE to Five ODEs



Lecture 9: Shape from Shading, General Case - From First Order Nonlinear PDE to Five ODEs

This lecture covers the topic of shape from shading, a method for interpreting the shapes of objects using variations in image brightness. The lecturer explains the process of scanning electron microscopy, where a secondary electron collector is used to measure the fraction of an incoming electron beam that makes it back out, allowing for the estimation of surface slope. The lecture also discusses the use of contour integrals, moments, and least squares to estimate surface derivatives and find the smallest surface given measurement noise. The speaker derives five ordinary differential equations for the shape from shading problem and also explains the concept of the Laplacian operator, which is used in image processing operations.

In this lecture on "Shape from Shading," the speaker discusses various approaches to solve equations for the least square solution to shape from shading. The lecturer explains different techniques to satisfy the Laplacian condition, adjust pixel values, and reconstruct surfaces using image measurements and slope computations from different points. The lecture covers the topics of initial values, transform of rotating, and inverse transform through minus theta. The lecturer concludes with a discussion of the generalization of these equations for arbitrary reflectance maps and the importance of examining scanning electron microscope images to provide concrete examples of shading interpretation.

  • 00:00:00 In this section of the lecture, the professor introduces shape from shading, which is the method for recovering the shapes of objects using image brightness measurements. He explains how this method differs from photometric stereo, which requires multiple exposures. The professor also discusses different types of surface materials and their reflecting properties, including hapke, a model for the reflection from rocky planets, and a third model for microscopy. He presents a comparison between electron microscopy methods and explains why scanning electron microscopes produce images that humans find easy to interpret due to their specific variations in brightness, which become brighter as you approach the edges.

  • 00:05:00 In this section, the lecturer discusses the importance of shading in images, which plays a significant role in interpreting the shape of objects. The lecturer presents images of a moth's head and an ovoid football-like shape that have variations in brightness depending on their surface orientation, allowing us to easily interpret their shapes. Interestingly, despite the non-lambertian surface of the football-like object, humans are still able to interpret its shape accurately. The lecture then delves into the workings of scanning electron microscopes, which use a beam of accelerated electrons to create images of the object's surface.

  • 00:10:00 In this section, the process of creating shaded images using scanning electron microscopy is described. Electrons at several kilo electron volts hit an object and some bounce off as backscatter, but most penetrate and create secondary electrons by losing energy and bumping electrons off ionizing things. Some of the secondary electrons come out of the object and are gathered by an electrode to scan the object in a raster like fashion. The current measured here is then used to modulate a light beam in a display, which can be magnified through deflection to get thousands to tens of thousands of magnification, making it more powerful than optical microscopy.

  • 00:15:00 In this section of the lecture, the speaker explains the process of measuring a surface's orientation using a secondary electron collector. The collector measures the fraction of the incoming beam that makes it back out, with highly inclined surfaces resulting in more current due to more secondary electrons escaping. By plotting a reflectance map, brightness versus orientation, the slope of the surface can be determined, but not its gradient, leaving two unknowns and one constraint. This problem is an example of the shape from shading problem, where the goal is to estimate the surface shape from a pattern of brightness.

  • 00:20:00 In this section of the lecture, the speaker discusses the use of a reflectance map to determine the slope or gradient of a surface. They explain that this method can be used for various surfaces and not just for certain types. The discussion also covers needle diagrams and how they can be used to determine surface orientation and shape. The speaker explains that while this is a simple problem, it is over-determined as there are more constraints than unknowns. This allows for a reduction in noise and a better result. The lecture ends with a demonstration of integrating out p to determine the change in height from the origin.

  • 00:25:00 In this section, the speaker discusses how to integrate the known data to estimate heights anywhere along the x-axis or y-axis, which can be combined to fill in the whole area. However, the p and q values used are subject to measurement noise, meaning there is no guarantee that measuring p and q in different ways will lead to the same answer. To solve this problem, a constraint on p and q must be put in place; p and q must satisfy this constraint for any loop, and the large loop can be decomposed into small loops that cancel each other out to make sure the constraint is true for the large loop as well.

  • 00:30:00 In this section, the lecturer discusses the relation between a contour integral and an area integral in the context of measuring the derivatives of a surface with photometric exterior or other vision methods. The lecture shows how the slope can be estimated based on the center of a stretch, where the slope is pretty much constant, and uses Taylor series expansion to derive an equation that relates the derivatives of the surface z of x y. It is said that finding the exact z of x y that gives the measured p and q is impossible, but a more elegant way is presented to find a least squares approximation.

  • 00:35:00 In this section of the lecture, the speaker discusses the benefit of reducing computations from all pixels to just the boundary of a region in machine vision. The speaker uses the example of computing the area and position of a blob through contour integrals and moments, which can be efficiently calculated by tracing the outline instead of counting pixels. The lecture goes on to apply Green's theorem to match the contour integral to the computation of moments.

  • 00:40:00 In this section, the lecturer discusses how to find the smallest possible surface given our measurements. Ideally, we would find a surface where its x and y derivatives match the p and q that we got from the image, respectively. However, due to measurement noise, this will not be possible, so instead, we will try to make it as small as possible by solving a least squares problem. Z is a function with infinite degrees of freedom, so we cannot use ordinary calculus. Instead, we can differentiate with respect to each of the finite number of unknowns on a grid and set the result equal to zero to obtain many equations.

  • 00:45:00 In this section of the lecture, the speaker discusses the process of finding a value of z for every grid point to minimize the error between observed values and estimated derivatives in both the x and y directions. To do this, the speaker explains that they need to differentiate and set the result equal to zero for all possible values of i and j, which results in a set of linear equations that are solvable using least squares. However, the speaker warns of a potential problem if the identifier names i and j are not replaced with other names, which can result in getting the wrong answer. Despite having a large number of equations, the equations are sparse, making them easier to solve.

  • 00:50:00 In this section, the speaker goes over the process of using first-order nonlinear partial differential equations to derive five ordinary differential equations for the shape from shading problem. They explain the steps of differentiation for the terms inside a square, matching terms, and considering various values of k and l. The lecturer simplifies the final equation and separates the terms to identify the x and y derivatives of p and q respectively. The goal is to ultimately find a solution for all points in the image.

  • 00:55:00 In this section, the speaker explains the computational molecule diagram, which is a graphic way of estimating derivatives in machine vision. He uses this to show how to derive the Laplacian operator which is used heavily in image processing operations. He explains that the Laplacian is rotationally symmetric and there are derivative operators that are very useful in edge detection that are also rotationally symmetric.

  • 01:00:00 In this section, the speaker discusses a discrete approach to solving equations for the least square solution to shape from shading, rather than using the calculus of variation. The resulting equations, although having many variables, are sparse which makes the iterative solution possible. The speaker explains how to solve these equations using an iterative approach that involves computing local averages of neighbor pixels and adding a correction based on image information. The speaker notes that while iterative solutions are easy to propose, showing that they converge is difficult, but textbooks suggest that they do.

  • 01:05:00 In this section, the lecturer discusses an approach to satisfy the Laplacian condition by adjusting pixel values using a simple equation with sparse terms. This approach is related to solving the heat equation and can be done efficiently in parallel, making it stable even with measurement noise. The technique can be applied to photometric stereo data to reconstruct a surface in a least squares way, providing a reasonable solution that matches experimental data. However, the lecturer acknowledges that this approach is not directly useful beyond photometric stereo and that there are more challenging problems to solve, such as single image reconstructions.

  • 01:10:00 In this section, the lecturer discusses a simple case of the reflectance map with parallel straight lines as isophotes. The parallel lines make it possible to rotate to a more useful coordinate system and maximize the information in one direction while minimizing it in another. The lecture provides the relationship between p, q, p prime, and q prime, the angle theta given by a triangle, and the inverse transform of rotating through minus theta. Ultimately, the lecture analyzes the general case with squiggly lines and discusses the concept of shape from shading.

  • 01:15:00 In this section, the lecturer talks about how to reconstruct a surface using image measurements and slope computations from different points. The lecture also covers the idea that the approach of adding a constant to z's height and finding changes did not adjust the Laplacian of z in any way, implying that differences in height do not provide much information, but only relative depth. However, the lecturer notes that an initial value for z is required to obtain a reconstruction.

  • 01:20:00 In this section, the speaker discusses the challenge of having potentially different initial values for each row in the computation of solutions for a surface's shape with Shape from Shading. While it would be easy to deal with an overall change in height, having different initial values for each row requires a different initial curve that can be mapped back to the original, unrotated world. The speaker suggests using an initial curve, which is some function of eta, to explore the surface by moving along these curves, computing them independently, and then altering the speed at which to explore the solution.

  • 01:25:00 In this section, the speaker explains that by multiplying with a constant, the equations become simpler, and the movement in the x and y direction is proportional to q s and p s respectively, while in the z direction, there is a straightforward formula. The lecture concludes with a discussion about the generalization of these equations for arbitrary reflectance maps and the importance of examining scanning electron microscope images to provide concrete examples of shading interpretation.
Lecture 9: Shape from Shading, General Case - From First Order Nonlinear PDE to Five ODEs
Lecture 9: Shape from Shading, General Case - From First Order Nonlinear PDE to Five ODEs
  • 2022.06.08
  • www.youtube.com
MIT 6.801 Machine Vision, Fall 2020Instructor: Berthold HornView the complete course: https://ocw.mit.edu/6-801F20YouTube Playlist: https://www.youtube.com/p...
 

Lecture 10: Characteristic Strip Expansion, Shape from Shading, Iterative Solutions



Lecture 10: Characteristic Strip Expansion, Shape from Shading, Iterative Solutions

In this lecture, the instructor covers the topic of shape from shading using brightness measurements in the concept of image formation. This involves understanding the image irradiance equation, which relates brightness to surface orientation, illumination, surface material, and geometry. They explain the method of updating p and q variables by using two separate systems of equations that feed into each other, and tracing out a whole strip using the brightness gradient. The lecture also discusses the challenges of solving for first-order non-linear PDEs, and different methods of stepping from one contour to another as you explore the surface. Finally, the instructor discusses the implementation of the characteristic strip expansion and why a sequential approach may not be the best method, recommending parallelization and controlling the step size.

In Lecture 10, the professor discusses various methods for solving shape-from-shading problems, including using stationary points on the surface and constructing a small cap shape around it to estimate the local shape. The lecturer also introduces the concept of the occluding boundary, which can provide starting conditions for solutions, and discusses recent progress in computing solutions for the three-body problem using sophisticated numerical analysis methods. Additionally, the lecture touches on the topic of industrial machine vision methods and the related patterns that will be discussed in the following lecture.

  • 00:00:00 In this section, the instructor provides announcements regarding the first quiz and proposal submission for the term project. The term project involves implementing a solution to a machine vision problem, and the students should submit a short proposal by the 22nd. The instructor then talks about the change of pace in covering industrial machine vision, where they will look at patents instead of published papers or textbooks. In the process, the students will learn about patent language, which is essential for entrepreneurs involved in startups. Finally, the instructor provides examples of student projects such as implementing subpixel methods for edge detection or time to contact on an android phone.

  • 00:05:00 In this section, the lecturer discusses the different aspects of image formation, focusing specifically on the concept of shape from shading using brightness measurements. This requires an understanding of the image irradiance equation, which relates brightness to surface orientation, illumination, surface material, and geometry. The reflectance map is used to simplify this equation and serves as a way of summarizing the detailed reflecting properties, although it is derived from the bi-directional reflectance distribution function (BRDF). The lecture goes on to explain how this concept was applied to the reflecting properties of the moon and other rocky planets, resulting in a set of equations that allow for determination of surface orientation in certain directions.

  • 00:10:00 In this section, the speaker discusses the rule for taking a small step in the image to correspond with a small step in height using orthographic projection. He explains that this simplifies the math and ties into the assumption of a telecentric lens and a faraway light source, which makes Lambertian assumptions possible. The overall process involves solving three ordinary differential equations numerically with forward Euler method and feeding in the brightness through the Hapka type surface. The speaker shows how to express this in terms of p and q and then derive the equation for the image of radiance.

  • 00:15:00 In this section, the speaker discusses the direct relationship between the measured quantity of surface brightness and the solution needed for a specific surface. He explains that there is a constant called rs, which is dependent on the source position, that is used to simplify the solution. The technique involves taking the brightness, squaring it, multiplying it by rs, and subtracting one with the derivative in the z direction. The speaker also explains how to obtain an initial condition for the differential equations and how a curve can be defined using parameters. The method is then generalized to tackle the general case where the slope cannot be locally determined.

  • 00:20:00 In this section, the lecturer discusses constructing a solution using a characteristic strip expansion. To do so, one needs to calculate the change in height to know how z is going to change. They presume that we start off with x, y, and z, along with the surface orientation, p and q, and updates rules for x, y, and z, and the change in the height of z is given by an equation. Updating p and q as we go is necessary, resulting in a characteristic strip carrying surface orientation, which is more information than just having a curve. The lecturer explains how to update p and q by using a two-by-two matrix and the second partial derivatives of height, which correspond to curvature.

  • 00:25:00 In this section, the lecturer discusses how to calculate the curvature matrix for a 3D surface, which is more complicated than for a curve in the plane. The curvature matrix requires a whole matrix of second-order derivatives called the Hessian matrix. However, using higher-order derivatives to continue the solution would lead to more unknowns. Therefore, the image irradiance equation is needed, particularly the brightness gradient, as changes in surface orientation correspond to curvature that affects image brightness. By looking at the common matrix H in both the curvature and brightness gradient equations, calculating H would allow for an update in x, y, z, p, and q, completing the method.

  • 00:30:00 In this section, the lecturer discusses the concept of solving for h using two linear equations. H appears in both of these equations, but since we have two equations and three unknowns, we can't solve for h. However, by using a specific delta x and delta y, we can control the step size and pick a particular direction to compute delta p and delta q. The lecturer also explains that the direction may change as the surface is explored. By plugging this into the equation, we can find how to change p and q to solve the problem.

  • 00:35:00 In this section, the lecturer discusses the five ordinary differential equations required to solve for the z variable in the image irradiance equation, and introduces a method for generating a strip using the brightness gradient to update the p and q variables. The lecturer goes on to explain the interesting part of the solution involving two systems of equations that feed into each other, and how they determine the gradient direction and can be used to trace out a whole strip. Ultimately, the partial differential equation is reduced to simple, ordinary differential equations using p and q to make the equation look less intimidating.

  • 00:40:00 In this section, the speaker discusses the challenges of first-order non-linear PDEs in solving brightness in the context of shape from shading. This is a departure from the typically second-order and linear PDEs found in physics, which means a special method is required for solving these types of PDEs. The general case for any R of P and Q is discussed and then applied to two specific surface properties: hapke and the scanning electron microscope. The update rules for X and Y are shown to be proportional to PS and QS, respectively.

  • 00:45:00 In this section, the lecturer explains the method to update the x, y, and height axes using the characteristic strip expansion and shape from shading with iterative solutions. The method involves differentiating with respect to p and q to calculate the update for x and y and using prp plus qrq to update the height axis. The lecture notes that this method can be used on scanning electron microscope images and also touches on the concept of base characteristics, which involves projecting the characteristic strips onto the image plane to explore as much of the image as possible.

  • 00:50:00 In this section, the speaker discusses the implementation of the characteristic strip expansion and why a sequential approach may not be the best method. Due to the independent solutions found along each curve, a process can be run along each curve, making the computation parallelizable. The speed of the computation, which needs to have a reasonable step size, is discussed, and a simple case where the step size is controlled by constant z is examined. By dividing by the PRP and QRQ in the equation for z, the rate of change becomes one, resulting in constant solutions along each curve with contours at increasing values of z.

  • 00:55:00 In this section of the lecture, the speaker discusses different ways of stepping from one contour to another as you explore the surface. They mention the option of stepping in constant size increments in the z direction, or having constant step size in the image, which requires dividing all equations by a constant factor. Another option is stepping in constant size increments in 3D, where the sum of the squares of the increments is 1, and finally, the possibility of stepping in isophodes in contours in the image of contrast or brightness. However, some of these methods may have issues, such as different curves running at varying rates or dividing by zero, so it is essential to take note of these limitations.

  • 01:00:00 In this section of the lecture, the professor discusses the dot product of the two gradients in the image and reflactance map, but doesn't go into too much detail. Moving from contour to contour in the image allows for easier tieing together of neighboring solutions, and crude numerical analysis methods can provide sufficient results. The professor then goes on to discuss the recent progress in computing solutions for the three-body problem and how sophisticated numerical analysis methods are being used to solve equations that would otherwise be difficult if not impossible to solve analytically.

  • 01:05:00 In this section, the lecturer discusses the challenge of needing an initial curve to explore a surface, along with its orientation, using optical machine vision methods. Fortunately, there is an image irradiance equation that gives one constraint on the orientation of the curve, and we know the curve is in the surface, which allows us to compute the derivatives and solve a linear equation. This means that we can find the orientation and get rid of the need for an initial strip on the object if we can find special points on the object where we know the shape, orientation, etc.

  • 01:10:00 In this section, the speaker discusses the concept of the occluding boundary, which is the place where an object curls around, such that the part on one side is visible, and the other is not. If we construct a surface normal at that point, it will be parallel to a vector constructed along the occluding boundary, which gives us starting conditions to begin our solutions. However, we can't use the ratios from the occluding boundary to solve the equations since the slope is infinite. The speaker also introduces the concept of stationary points, which are unique, global, isolated extremums, and result from the brightest points on an object's surface when it is illuminated. These points provide us with the orientation of the surface at that spot, which is valuable information for solving shape from shading problems.

  • 01:15:00 In this section, the lecturer discusses the stationary points on the reflectance map and image, which correspond to extrema or minima depending on the imaging technique used. However, stationary points do not allow for the direct start of the solution because there is no change in the dependent variables. The solution can only move away from the stationary point when trying to construct an approximation of the surface to start the solution. The idea is to construct a small plane using the orientation of the stationary point and then make a radius to start the solution. By doing so, the solution can get away from the stationary point and start iterating towards a better solution.

  • 01:20:00 In this section of the lecture, the speaker discusses the concept of stationary points on curved surfaces in relation to shape from shading. The idea is to find a unique solution for the curvature of a surface which has a stationary point. The speaker explains that these points are important in human perception and can affect the uniqueness of a solution. The lecture then goes on to explain the process of finding the curvature of a surface using an example, where it is assumed that the surface has a sem type of reflectance map and has a stationary point at the origin. The gradient of the image is found to be zero at the origin, confirming the presence of an extremum at that point. However, the gradient cannot be used to estimate local shape because it is zero at the origin, thus requiring a second derivative.

  • 01:25:00 In this section, the speaker explains how taking the second partial derivatives of brightness can provide information about the shape and how to recover it, by estimating the local shape from stationary points and constructing a small cap shape around it. Additionally, the speaker introduces the topic of industrial machine vision methods and the related patterns that will be discussed in the subsequent lecture.
Lecture 10: Characteristic Strip Expansion, Shape from Shading, Iterative Solutions
Lecture 10: Characteristic Strip Expansion, Shape from Shading, Iterative Solutions
  • 2022.06.08
  • www.youtube.com
MIT 6.801 Machine Vision, Fall 2020Instructor: Berthold HornView the complete course: https://ocw.mit.edu/6-801F20YouTube Playlist: https://www.youtube.com/p...
 

Lecture 11: Edge Detection, Subpixel Position, CORDIC, Line Detection (US patent 6408109)



Lecture 11: Edge Detection, Subpixel Position, CORDIC, Line Detection (US patent 6408109)

This YouTube video titled "Lecture 11: Edge Detection, Subpixel Position, CORDIC, Line Detection (US 6,408,109)" covers several topics related to edge detection and subpixel location in machine vision systems. The speaker explains the importance of patents in the invention process and how they are used in patent wars. They also discuss various edge detection operators and their advantages and limitations. The video includes detailed explanations of the mathematical formulas used to convert Cartesian coordinates to polar coordinates and determine edge position. The video concludes by discussing the importance of writing broad and narrow claims for patents and the evolution of patent law over time.

In Lecture 11, the speaker focuses on different computational molecules for edge detection and derivative estimation, with an emphasis on efficiency. Sobel and Roberts Cross operators are presented for calculating the sum of the squares of gradients, with variations in formula and technique discussed. To achieve subpixel accuracy, multiple operators are used, and techniques such as fitting a parabola or using a triangle model are presented to determine the peak of the curve. Additionally, the lecture discusses alternatives to quantization and issues with gradient direction on a square grid. Overall, the lecture stresses the importance of considering many details to achieve good performance for edge detection.

  • 00:00:00 In this section, the lecturer introduces the topic of industrial machine vision and its importance in manufacturing processes, including the use of machine vision for alignment and inspection in integrated circuit manufacturing and pharmaceutical label readability. The lecturer explains the purpose of patents as a way to gain a limited monopoly in using an invention in exchange for explaining how it works to benefit society long term. The structure and metadata of a patent are also discussed, including the patent number and title, the patent date, and the use of patents as ammunition in patent wars between companies. The lecture then briefly describes a patent by Bill Silver at Cognex, a leading machine vision company, on detecting and sub-pixel location.

  • 00:05:00 In this section, the lecturer discusses the process of edge detection in digital images, where focus is given to the transition between different brightness levels. The lecturer notes that finding edges to sub-pixel accuracy is crucial in the conveyor belt and integrated circuit worlds as it significantly reduces the bits needed to describe something. The lecture further explains that this process can be achieved with a higher pixel camera, but it's costly, and therefore software that can perform it at lower costs would be beneficial. The lecturer also explains that a 40th of a pixel can be achieved, which is a significant advantage, but it comes with challenges. The lecture concludes with a discussion of patent filing and how the process has changed over time, including the arcane language used in the documents, and the delay experienced in submitting a patent application.

  • 00:10:00 In this section of the video, the speaker discusses various technical papers and patents related to edge detection in machine vision, which dates back to the 1950s. The first famous paper on this topic was by Roberts in 1965, which used a simple but misleading edge detector. The speaker also mentions other papers and patents related to edge detection, discussing the advantages and disadvantages of various edge detection operators, including Sobel's operator, the Roberts cross edge detector, and Bill Silva's alternate operators for hexagonal grids. The speaker emphasizes the importance of edge detection in various applications and the ongoing efforts of engineers and researchers to improve edge detection algorithms.

  • 00:15:00 In this section, the lecture explains the advantages and disadvantages of using hexagonal grid cameras in terms of resolution and rotational symmetry but notes that the extra trouble of working with a hexagonal grid was too much for engineers to handle. The lecture then goes on to discuss converting from Cartesian to polar coordinates using the formula for the magnitude of the gradient and its direction rather than the brightness gradient itself, despite the expense of taking square roots and arc tangents. The lecture then explores alternate solutions, such as using look-up tables or the CORDIC method, which is a way of estimating the magnitude and direction of a vector using iterative steps to reduce the difference with minimal arithmetic operations required.

  • 00:20:00 In this section of the lecture, the speaker discusses edge detection and subpixel position algorithms. They explain how to locate where a gradient is large and use non-maximum suppression to find the maximum direction of the gradient. The speaker also talks about quantizing the directions of the gradient and notes that looking further afield can lead to a larger range of directions. To find the actual peak of the gradient, a parabola is fit to the data and differentiated to find the peak. Finally, the lecture discusses the expected behavior of brightness when working with a model of the world based on Mondrian.

  • 00:25:00 In this section, the video discusses techniques for achieving subpixel accuracy in edge detection. One approach involves quantizing directions and finding the peak, but there can be ambiguity about which point to choose along the edge. Another method is to perform a perpendicular interpolation to find the edge point with the greatest proximity to the center pixel. However, the actual edge position may not fit the assumed models, which can introduce bias. The video suggests a simple correction to calibrate out the bias and improve accuracy.

  • 00:30:00 In this section, the lecturer discusses ways to improve edge detection accuracy in machine vision systems. The patent he is examining suggests using different powers of "s" to remove bias and increase accuracy based on the specific system being used. The direction of the gradient also affects bias and requires compensation for even higher accuracy. The overall diagram of the system includes estimating brightness gradients, finding magnitude and direction, non-maximum suppression, and peak detection to interpolate position and compensate for bias using the closest point to the maximum on the edge. The invention provides an apparatus and method for subpixel detection in digital images and is summarized in a short version at the end of the patent.

  • 00:35:00 In this section, the speaker discusses the process of patenting an invention and how it relates to patent litigation. They explain how inventors often create both an apparatus and method in order to cover all bases and how this can result in unnecessary claims. The speaker describes a case in which a Canadian company, Matrox, was accused of violating a patent through their software implementation of what was in the patent. Expert witnesses were brought in to analyze the code and in the end, the conclusion was that it was all software and not patentable. The section also covers the importance of making a patent as broad as possible and thinking of all possible modifications, which can make patents written by lawyers difficult to read.

  • 00:40:00 In this section of the video, the speaker goes over formulas and a detailed explanation of how to convert Cartesian coordinates to polar coordinates. They also explain the different formulas used for finding peaks in parabolas and triangular waveforms. The video then goes into patents and the process of claiming what you think you came up with to protect it. The speaker reads out the first claim, which is an apparatus for detecting and subpixel location of edges in a digital image, and breaks down the different components that make up the claim, including a gradient estimator, a peak detector, and a subpixel interpolator. The importance of having multiple claims is also discussed, as it protects against future claims and infringement.

  • 00:45:00 In this section of the lecture, the speaker discusses how to write and structure claims for patents. He explains that the first claim in a patent is usually a broad claim, followed by narrower claims that are more specific to ensure that even if the broad claim is invalidated, the narrower claims may still stand. The speaker then goes on to examine the claims in the patent for gradient estimation, highlighting some of the conditions that need to be met for each claim to be valid. Finally, he explains how patent law has evolved over time with regards to the length of a patent’s validity and the rules surrounding priority claims.

  • 00:50:00 In this section, the video discusses edge detection in machine vision. The Mondrian model of the world is introduced, which involves condensing images into just discussing the edges to find where something is on a conveyor belt or line up different layers of an integrated circuit mask. Edge detection is defined as a process for determining the location of boundaries between image regions that are different and roughly uniform in brightness. An edge is defined as a point in an image where the image gradient magnitude reaches a local maximum in the image gradient direction or where the second derivative of brightness crosses zero in the image gradient direction. The video also touches on multi-scale edge detection and explains the downside of having infinite resolution for an image.

  • 00:55:00 In this section of the lecture, the speaker discusses edge detection and the problems with measuring an edge that is perfectly aligned with a pixel. To combat this, the speaker explains the use of a Laplacian edge detector, which looks for zero crossings and draws contours, making it easier to locate the edge. However, this method can lead to worse performance in the presence of noise. The speaker also covers the concept of an inflection point and how it relates to the maximum of the derivative, which can be used to define the edge. The lecture also covers brightness gradient estimation and the use of operators at 45-degree angles to reference the same point.

  • 01:00:00 In this section of the lecture, the speaker discusses edge detection and the estimation of derivatives using different computational molecules. Two operators used by Roberts are introduced, which can be used in calculating the sum of squares of gradients in the original coordinate system. The concept of Sobel operators is also mentioned, and the estimation of the derivative using an averaging technique is discussed. The lowest order error term of the estimation is shown to be second order, making it not very reliable for curved lines. The higher-order terms are also introduced to improve accuracy.

  • 01:05:00 In this section, the lecturer describes using an operator to approximate a derivative for edge detection, allowing for a higher order error term that can work for a curved line as long as its third derivative isn't too large. By averaging two values and finding an estimate of the derivative, even derivatives that are offset by half a pixel can be used. Comparing two operators with the same lowest order error term, one with a smaller multiplier is found to be advantageous. However, applying the operator to estimate both the x and y derivative leads to inconsistencies, which can be dealt with by using a two-dimensional operator. This approach is also useful for computing the derivatives of the y direction for a whole cube of data in fixed optical flow.

  • 01:10:00 In this section, the speaker emphasizes the importance of efficiency in operators when performing edge detection with millions of pixels. By arranging computations cleverly, the operator can be reduced from six operations to four. The speaker mentions the Roberts Cross operator and Urbain Sobel, who replicated the operator in a particular way by doing an average over a 2x2 block to reduce noise but also blur the image.

  • 01:15:00 In this section of the video, the lecturer discusses how to avoid the half pixel offset problem in edge detection by using multiple operators. The discussion includes formula variations and implementation preferences. The lecture also explains the next steps, including the conversion from Cartesian to polar coordinates for the brightness gradient, gradient magnitude direction quantization, and scanning for maximum values. Subpixel accuracy is not achievable due to the pixel quantization issue. The lecturer explains how to keep only the maxima by ignoring non-maxima in the image.

  • 01:20:00 In this section, the video discusses the need for asymmetrical conditions in edge detection and a tie breaker for situations where g zero equals g plus or equals g minus. To find the peak of the curve, the video describes fitting a parabola to the edge with a tie-breaker, and it is shown that the s computed this way is limited in magnitude to half. Another method shown is a little triangle model, which assumes that the slopes of the two lines are the same and estimates the vertical and horizontal positions, resulting in the formula for s. Both methods are for achieving subpixel accuracy, and the video suggests that the triangle model may seem odd but is effective in certain circumstances.

  • 01:25:00 In this section, the lecturer discusses the shape of an edge in the case of defocus, specifically how it affects the method of recovering the actual edge position. He also talks about alternatives to the quantization of gradient direction and how it can be problematic, particularly on a square grid where there are only eight directions. This problem shows that there are many details to consider if one wants good performance, such as finding a good way to compute derivatives.
Lecture 11: Edge Detection, Subpixel Position, CORDIC, Line Detection (US 6,408,109)
Lecture 11: Edge Detection, Subpixel Position, CORDIC, Line Detection (US 6,408,109)
  • 2022.06.08
  • www.youtube.com
MIT 6.801 Machine Vision, Fall 2020Instructor: Berthold HornView the complete course: https://ocw.mit.edu/6-801F20YouTube Playlist: https://www.youtube.com/p...