
You are missing trading opportunities:
- Free trading apps
- Over 8,000 signals for copying
- Economic news for exploring financial markets
Registration
Log in
You agree to website policy and terms of use
If you do not have an account, please register
Accelerating ML Inference at Scale with ONNX, Triton and Seldon | PyData Global 2021
Accelerating ML Inference at Scale with ONNX, Triton and Seldon | PyData Global 2021
In the video "Accelerating ML Inference at Scale with ONNX, Triton and Seldon | PyData Global 2021," Alejandro Saucedo of Seldon Technologies discusses the challenges of scaling machine learning inference and how to use ONNX and Triton to optimize and productionize models. Using the GPT-2 TensorFlow model as a use case, the session covers pre-processing, selecting optimal tokens, and deploying the model using Tempo and the Triton inference server. Saucedo emphasizes the need to abstract infrastructure complexities and facilitate easy deployment while ensuring reproducibility and compliance. The talk concludes with collaborations with open-source projects for end-to-end training and deployment components.
AI Show Live - Episode 62 - Multiplatform Inference with the ONNX Runtime
AI Show Live - Episode 62 - Multiplatform Inference with the ONNX Runtime
In the "Multiplatform Inference with the ONNX Runtime" episode of the AI Show Live, hosts showcase how to deploy a super resolution model and an object detection model on multiple platforms using the ONNX Runtime framework. They discuss pre-processing and post-processing steps for both mobile and web platforms, demonstrate the benefits of using a single solution, explain the process of converting a PyTorch model to an ONNX model, and showcase how to preprocess data for inference with the ONNX Runtime. Additionally, they demonstrate the implementation of the BERT natural language processing model using Onnx Runtime in C#. The code and open-source models are available for customization for users' solutions.
In second part of the AI Show Live, the presenters cover a variety of topics related to running inference with the ONNX Runtime. They demonstrate the process of text classification using an example from the ONNX inference examples and explore the installation of packages and tools needed to build BERT classification models in C#. They also discuss the use of IntelliCode with VS 2022 and walk through the steps of preparing for model inference, including creating tensors, configuring the ONNX Runtime inference session, and post-processing the output. Additionally, they touch on the importance of consulting model documentation and selecting the correct tokenizer for accurate results.
Applied Machine Learning with ONNX Runtime
Applied Machine Learning with ONNX Runtime
Jennifer Looper, a Principal Education Cloud Advocate at Microsoft, discusses the convergence of app building, machine learning, and data science in this video. She recommends building smart apps for the web and explores various JavaScript APIs, including ml5.js, Magenta.js, PoseNet, and Brain.js, for incorporating machine learning technology into apps. Looper emphasizes the usefulness of scikit-learn for classic machine learning and recommends it as a powerful tool without the heavy solution of neural networks. She also discusses the Onnx Runtime, which optimizes training and inferencing by defining a common set of operators for building machine learning and deep learning models, and sources data from Kaggle to explain the process of performing a basic classification task using supervised machine learning. The speaker then demonstrates how to build a recommendation engine using machine learning models and suggests visiting Microsoft's online resources for learning more about machine learning. She concludes that Onnx Runtime is suitable for beginners as part of their curriculum or for anyone who wants to learn more about machine learning.
Bring the power of ONNX to Spark as it never happened before
Bring the power of ONNX to Spark as it never happened before
In this video, Shivan Wang from Huawei explains how to bring the power of ONNX to Spark for inference. He discusses the challenges in deploying DL models on Spark and how the Spark community has initiated a proposal called Spip to simplify the process. The speaker also discusses Huawei's AI processor, Ascent and the Ascent AI ecosystem which includes multiple Ascent processor models and Atlas hardware. He suggests adding Con as a new execution provider in the next runtime to use ONNX models on Ascent hardware directly, without the need for model translation. Finally, he mentions that the POC code for bringing the power of ONNX to Spark is almost complete and welcomes interested users to leave a message to discuss and potentially provide resources for testing purposes.
Builders Build #3 - From Colab to Production with ONNX
Builders Build #3 - From Colab to Production with ONNX
The video illustrates the process of deploying a project from Colab to production by using ONNX. The presenter covers various aspects such as pre-processing signals, modifying code for deployment, creating a handler on AWS Lambda, accepting audio input on a website, uploading a function to S3, and deploying dependencies for ONNX. Despite encountering some difficulties, the speaker successfully deploys their model with AWS and suggests that they can use a browser load base64 file object or sound file read bites for future steps.
Additionally, the video showcases the use of the SimCLR model for contrastive learning in audio, building a catalog of songs by feeding them into the model, and training it with PyTorch to attain zero loss and recall at k=1. The presenter discusses the challenges of using PyTorch in production and proposes ONNX as a solution. The video demonstrates how to export and load the PyTorch model in ONNX format and execute inference. It also shows how to process audio files using Torch Audio and Numpy libraries and troubleshoots issues when setting up a PyTorch model for deployment. The video offers insights on how to shift models from development in Colab notebooks to production environments.
Combining the power of Optimum, OpenVINO™, ONNX Runtime, and Azure
Combining the power of Optimum, OpenVINO™, ONNX Runtime, and Azure
The video showcases the combination of Optimum, OpenVINO, ONNX Runtime, and Azure to simplify the developer's workflow and improve the accuracy and speed of their models. The speakers demonstrate the use of helper functions, ONNX Runtime, and the OpenVINO Execution Provider to optimize deep learning models. They also show how to optimize hugging face models using quantization in the Neural Network Compression Framework and illustrate the training and inference process using Azure ML, Optimum, ONNX Runtime, and OpenVINO. The demonstration highlights the power of these tools in improving the performance of models while minimizing the loss of accuracy.
Faster Inference of ONNX Models | Edge Innovation Series for Developers | Intel Software
Faster Inference of ONNX Models | Edge Innovation Series for Developers | Intel Software
The OpenVINO Execution Provider for ONNX Runtime is discussed in this video. It is a cross-platform machine learning model accelerator that allows for the deployment of deep learning models on a range of Intel compute devices. By using the OpenVINO toolkit, which is optimized for Intel hardware, and setting the provider as the OpenVINO Execution Provider in the code, developers can accelerate inference of ONNX models with advanced optimization techniques. The video emphasizes the simplicity of the modification required to utilize the tools discussed.
Faster and Lighter Model Inference with ONNX Runtime from Cloud to Client
Faster and Lighter Model Inference with ONNX Runtime from Cloud to Client
In this video, Emma from Microsoft Cloud and AI group explains the Open Neural Network Exchange (ONNX) and ONNX Runtime, which is a high-performance engine for inferencing ONNX models on different hardware. Emma discusses the significant performance gain and reduction in model size that ONNX Runtime INT8 quantization can provide, as well as the importance of accuracy. She demonstrates the end-to-end workflow of ONNX Runtime INT8 quantization and presents the results of a baseline model using PyTorch quantization. Additionally, Emma discusses ONNX Runtime's ability to optimize model inference from cloud to client and how it can achieve a size of less than 300 kilobytes on both Android and iOS platforms by default.
Fast T5 transformer model CPU inference with ONNX conversion and quantization
Fast T5 transformer model CPU inference with ONNX conversion and quantization
By converting the T5 transformer model to ONNX and implementing quantization, it's possible to decrease the model size by 3 times and increase the inference speed up to 5 times. This is particularly useful for deploying a question generation model such as T5 on a CPU with sub-second latency. Additionally, the Gradio app offers a visually appealing interface for the model. The T5 transformer model from Huggingface is utilized, and the FastT5 library is used for ONNX and quantization. Implementing these optimizations can result in significant cost savings for production deployments of these systems.
Azure AI and ONNX Runtime
Azure AI and ONNX Runtime
The text covers various aspects of machine learning and its deployment. It discusses the evolution of data science, the challenges of framework compatibility, the use of Azure AI and ONNX Runtime for model deployment, the creation of ML environments, and the limitations of ONNX Runtime. The speaker emphasizes ONNX's standardization and its support for multiple frameworks, making it easier to optimize for different hardware. The video also mentions the absence of a benchmark for hardware preferences and the need for using multiple tools to overcome the limitations of ONNX.