You are missing trading opportunities:
- Free trading apps
- Over 8,000 signals for copying
- Economic news for exploring financial markets
Registration
Log in
You agree to website policy and terms of use
If you do not have an account, please register
AMD Developer Central: OpenCL Programming Webinar Series.2. Introduction to OpenCL
2- Introduction to OpenCL
This video provides a detailed introduction to OpenCL, a platform for parallel computing that can use CPUs and GPUs to accelerate computations. Programs written in OpenCL can be executed on different devices and architectures, allowing for portable code across different platforms. The video discusses the different execution models in OpenCL, including data parallelism and task parallelism, and also covers the different objects and commands used in OpenCL, such as memory objects, command queues, and kernel objects. The video also delves into the advantages and limitations of using OpenCL, such as the need for explicit memory management and the potential for significant performance improvements in parallel programs.
AMD Developer Central: OpenCL Programming Webinar Series. 3. GPU Architecture
3 - GPU Architecture
This video provides an overview of GPU architecture, taking note of the origins and primary use of GPUs as graphics processors. GPUs are designed for processing pixels with a high degree of parallelism, in contrast to CPUs designed for scalar processing with low latency pipelines. The architecture of GPUs is optimized for graphics-specific tasks, which may not be suitable for general-purpose computation. The speaker explains how the GPU maximizes the throughput of a set of threads rather than minimizing the execution latency of a single thread. The architecture of the GPU engine block is also discussed, including local data shares, wave fronts, and work groups. The video explores various GPU architecture features that help increase the amount of packing the compiler can do, including issuing dependent operations in a single packet and supporting dependent counters with global beta share. Although the GPU and CPU core designs may be similar, their workloads will need to converge for them to have similar designs.
In this video on GPU architecture, the speaker delves into the concept of barriers and their function. When a work group contains multiple wavefronts in a GPU, barriers are used to synchronize these wavefronts. However, if only one work wavefront exists in a group, barriers are rendered meaningless and will be reduced to non-operations.
AMD Developer Central: OpenCL Programming Webinar Series. 4 OpenCL Programming in Detail
4 - OpenCL Programming in Detail
In this video, the speaker provides an overview of OpenCL programming, discussing its language, platform and runtime APIs. They elaborate on the programming model that requires fine-grained parallelization, work items and groups or threads, synchronization, and memory management. The speaker then discusses the n-body algorithm and its computationally order n-squared nature. They explain how OpenCL kernel code updates the position and velocity of particles in Newtonian mechanics, introduces cache to store one particle position and how the kernel updates the particle position and velocity using float vector data types. The speaker also delves into how the host code interacts with OpenCL kernels by setting the parameters and arguments explicitly, transferring data between the host and GPU, and enqueuing kernel execution for synchronization. Finally, the video explores how to modify the OpenCL code to support multiple devices, synchronize data between the GPUs, and set device IDs for half-sized arrays representing them.
The second part discusses various aspects of OpenCL programming. It covers topics such as the double-buffer scheme for synchronizing the updated particle position between two arrays, OpenCL limitations and the difference between global and local pointers in memory allocation. Additionally, it highlights optimization techniques for OpenCL programming, including vector operations, controlled memory access, and loop unrolling, along with tools available for analyzing OpenCL implementation, such as profiling tools. The presenter recommends the OpenCL standard as a resource for OpenCL programmers and provides URLs for the standard and the ATI Stream SDK. The video also addresses questions on topics such as memory-sharing, code optimization, memory allocation, and computation unit utilization.
AMD Developer Central: OpenCL Programming Webinar Series. 5. Real World OpenCL Applications
5 - Real World OpenCL Applications
In this video, Joachim Deguara talks about a multi-stream video processing application he worked on, with a key focus on performance optimization. The video covers various topics such as decoding video formats, using DMA to transfer memory between the CPU and GPU, double buffering, executing kernels, using event objects to synchronize and profile operations, OpenCL-OpenGL interop, processing swipes in videos, and choosing between OpenCL and OpenGL when processing algorithms. Joachim also discusses various sampling and SDKs available for OpenCL applications, but notes that there is currently no sample code available for the specific application discussed in the video.
AMD Developer Central: OpenCL Programming Webinar Series. 6. Device Fission Extensions for OpenCL
6 - Device Fission Extensions for OpenCL
In this video, the speaker covers various topics related to device fission extensions for OpenCL. They explain the different types of extensions and how device fission allows large devices to be divided into smaller ones, which is useful for reserving a core for high-priority tasks or ensuring specific work groups are assigned to specific cores. They discuss the importance of retaining sequential semantics when parallelizing vector pushback operations, using parallel patterns to optimize the process, and creating native kernels in OpenCL. The speaker also demonstrates an application that utilizes device fission for OpenCL and discusses memory affinity and the future of device fission on other devices.
AMD Developer Central: OpenCL Programming Webinar Series. 7. Smoothed Particle Hydrodynamics
7 - Smoothed Particle Hydrodynamics
This video discusses Smoothed Particle Hydrodynamics (SPH), a technique for solving fluid dynamics equations, specifically the Navier-Stokes equations. The video explains the different terms in the equations, including the density, pressure, and viscosity terms, and how they are approximated using a smoothing kernel. The numerical algorithm used for SPH, as well as the use of spatial indexing and Interop, is also discussed. The speaker explains the process of constructing a spatial index and neighbor map and how the physics are computed. The video invites viewers to download and use the program and discusses the limitations of the simulation. The speaker then answers questions from the audience about GPU performance, incompressible behavior, and using cached images.
AMD Developer Central: OpenCL Programming Webinar Series. 8. Optimization Techniques: Image Convolution
8 - Optimization Techniques: Image Convolution
In this video Udeepta D. Bordoloi discuss the optimization techinques in image convolution.
AMD Developer Inside Track: How to Optimize Image Convolution
How to Optimize Image Convolution
This video discusses various methods for optimizing image convolution including using local data share, optimizing constants, and using larger local areas to improve efficiency. The speaker emphasizes the importance of minimizing processing time in image convolution to improve overall performance and highlights a new method of reusing data using local memory. The video offers suggestions for optimization steps such as using obvious or textures, using thought force, and using the passing options to the counter. A step-by-step article about optimizing image convolution techniques is available on the developer AMD website.
AMD Developer Central: OpenCL Technical Overview. Introduction to OpenCL
AMD Developer Central: OpenCL Technical Overview. Introduction to OpenCL
In this video, Michael Houston provides an overview of OpenCL, an industry standard for data parallel computation targeting multi-core CPUs, mobile devices, and other forms of silicon. OpenCL aims to unify previously competing proprietary implementations such as CUDA and Brook+ which will simplify development for independent software vendors. It offers a breakdown between code that runs on the device and code that manages the device using a queuing system designed for feedback with game developers. OpenCL is designed to work well with graphics APIs, creating a ubiquitous computing language that can be used for various applications such as photo and video editing, as well as for artificial intelligence systems, modeling, and physics. The presenter also discusses the use of OpenCL for Hollywood rendering and hopes to see more work in this area.
AMD Developer Central: Episode 1: What is OpenCL™?
AMD Developer Central: Episode 1: What is OpenCL™?
This video provides an introduction to OpenCL and its design goals, which focus on leveraging various processors to accelerate parallel computations instead of sequential ones. OpenCL enables the writing of portable code for different processors using kernels, global and local dimensions, and work groups. Work items and work groups can collaborate by sharing resources, but synchronization between work items in different work groups is not possible. Optimal problem dimensions vary for different types of processing, and it’s important to choose the best dimensions for the best performance. OpenCL can fully utilize a system's capabilities through expressing task and data parallelism together using the OpenCL event model.