Quantitative trading - page 8

 

Rama Cont and Francesco Capponi: "Cross-Impact in Equity Markets"



Rama Cont and Francesco Capponi: "Cross-Impact in Equity Markets"

Rama Cont and Francesco Capponi delve into the concept of cross-impact in equity markets through their analysis of order flow and price data. They assert that cross-impact signifies that the price of an asset is influenced not only by its own order flow but also by the order flow of other assets. While previous theoretical studies have attempted to derive the consequences of cross-impact effects and extend single asset optimal trade execution models to multiple assets, Cont and Capponi propose a more streamlined approach to explain correlations between asset returns and order flow.

They argue that a comprehensive matrix of price impact coefficients is not necessary to account for these correlations. Instead, they contend that the observed correlations can be attributed to the fact that market participants often engage in trading multiple assets, thereby generating correlated order flow imbalances across assets. To identify the significance of cross-impact coefficients and the main drivers of execution costs, the presenters suggest using a principal component analysis (PCA) on the correlation matrices of returns and order flow imbalances.

Cont and Capponi propose a parsimonious model for cross-impact in equity markets, focusing on a stock's own order flow balance and the correlation of order flow imbalances. They find that a one-factor model for order flow imbalance is sufficient to explain the cross-correlations of returns. This model can be utilized for portfolio execution and transaction cost analysis, with the presenters recommending the use of a reliable model for single asset impact coupled with a good model for common factors in order flow across assets.

The speakers stress the importance of establishing a causal model and interpretation for the equation. They express their readiness to share additional materials and updates, emphasizing their commitment to furthering understanding in this area of research.

  • 00:00:00 In this section of the video, Rama Cont and Francesco Capponi discuss the concept of cross-impact in equity markets. They investigate this concept by analyzing order flow and price data from equity markets, and explain that market impact, or the execution of trades moving the price of an asset, contributes to the cost of execution. They also show that price movements are driven by the aggregate imbalance between supply and demand, and define the notion of order flow imbalance as a useful tool for building impact models.

  • 00:05:00 In this section, Rama Cont and Francesco Capponi discuss the linear impact of order flow imbalance in centralized order book markets. This aggregate imbalance between supply and demand is what drives the price, which can be seen as a regression model with the impact coefficient reflecting the opposite notion of liquidity. The coefficient has a highly correlated inverse relationship with the depth of the order book, and extracting the impact coefficient can be done through a covariance calculation. While the study was previously done for single stocks, market participants are also interested in correlations across multiple assets, and positive correlations have been found between order flow imbalance and returns on different securities.

  • 00:10:00 In this section, Rama Cont and Francesco Capponi discuss the concept of cross-impact and its theoretical and empirical studies. They explain that cross-impact refers to the fact that the price of an asset is not influenced solely by its own order flow but also by the order flow of other assets. Empirical studies have documented a positive correlation between the order flow of one asset and the price moves of another asset, at least in the homogeneous asset class. Theoretical studies have tried to derive the consequences of such cross-impact effects and have extended the model for single asset optimal trade execution to multiple assets where the model includes cross-impact effects. However, this leads to a large number of cross-impact coefficients that need to be estimated.

  • 00:15:00 In this section, the presenters discuss the concept of cross-impact and its relevance in explaining observable market phenomena. They question whether a full matrix of price impact coefficients is necessary to explain the correlations between asset returns and order flow in a market and whether a more parsimonious approach is possible. They also draw analogies to action at a distance in physics and discuss the need for a underlying mechanism that links assets together in order to establish causal impact. The aim is to design a multi-asset impact model that only includes necessary coefficients and avoids unnecessary complexity.

  • 00:20:00 In this section, the speakers argue that the concept of cross-impact is unnecessary to explain the co-variations in price moves and order flow imbalance observed in equity markets. The observed correlations can be explained by the fact that market participants often trade in multiple assets generating correlated order flow imbalances across assets, which in turn lead to correlations in the returns of different assets. The speakers present a causal model diagram that shows that the price of an asset is driven by the order flow imbalance, which is the algebraic sum of all buy and sell orders generated by entities and multi-asset trading strategies. They argue that the single asset impact model is sufficient to explain these correlations and no additional cross-impact model is needed.

  • 00:25:00 In this section, the traditional view of supply and demand driving the price for each asset, creating correlations in the order flow balance, is compared to the cross-impact model that posits a mechanism that influences the return of a stock from the distance. These assumptions can be tested with the available data on order flow and returns by conditioning on variables in the diagram and performing conditional regressions. The construction of the impact models for multiple assets and the inherent identification problem it poses is discussed. A linear model with two variables, returns and OFI, is used to create matrices of theta and beta coefficients. The net order flow for a stock is defined as the inflow to the bid queue minus the outflow from the ask queue.

  • 00:30:00 In this section, Rama Cont and Francesco Capponi discuss the covariance of returns with the order flow imbalance and how it relates to the beta matrix and cross-impact. They emphasize that there is no need to have off-diagonal elements in beta to get off-diagonal elements in the covariance matrix, as the covariance can come from either the correlation of order flows or the cross-impact matrix. The example of two stocks with no correlation in the order flows but with cross-impact coefficients highlights the importance of knowing the correlation of order flows to identify the cross-impact coefficient. The covariance matrix is affected by the correlation and cross-impact coefficients in the model, which can be observed with numbers in different scenarios.

  • 00:35:00 In this section, Rama Cont and Francesco Capponi discuss the difference between modeling the correlation and order flow across stocks and understanding the cross-impact impacting and modeling that. They explain that simply observing a non-zero correlation between order flow of one asset and the return of another asset does not imply that you need a non-zero cross-impact coefficient in the model. They also present examples of low correlation and order flow, high cross impact, and vice versa to show that it's impossible to infer the cross-impact just from these covariances. Finally, they discuss the data they analyzed, which includes the net order flow, order flow imbalances, and returns of 67 stocks from the NASDAQ 100 over two and a half years, and explain how they redefined and normalized the returns and order flow imbalances.

  • 00:40:00 In this section of the video, the speakers examine the relationship between the correlation of stock returns and the order flow imbalances of different stocks. By plotting the correlation of stock returns against the order flow imbalances of other stocks, the speakers demonstrate that the vast majority of stock pairs have a correlation very close to equality, suggesting that the correlation between returns and order flow imbalances could simply be due to the correlation between different balances. To test whether a multivariate market impact model is needed, the speakers use a regression analysis and find that the cross-impact coefficients are very close to zero, indicating that even if they were identifiable, they only have a tiny impact on total impact.

  • 00:45:00 In this section, Rama Cont and Francesco Capponi propose a different approach to identify the significance of cross-impact coefficients and the main drivers of execution costs. They suggest using a principal component analysis (PCA) on the correlation matrices of returns and order flowing balance and using a factor model for the order flow imbalance. The first principal component of the factor model is used to test the remaining significance of any crossing impact coefficient and the residual of the regression is interpreted as the idiosyncratic order flow due only to action in that stock. The approach aims to disentangle the contribution of the idiosyncratic component of a stock's own order flow from the common component due to cross trading.

  • 00:50:00 In this section of the video, Rama Cont and Francesco Capponi discuss the correlation between the first principal components of returns and order flow balances of ETFs that track the Nasdaq 100 and S&P 500. They find that the first principal component of returns has a correlation of 91% with the overall return on the ETF that tracks the Nasdaq 100. Similarly, the first principal component of order flow balance has a correlation of 82% with the order flowing balance on the ETF QQQ that tracks the same index. They also observe that the first principal components of both the returns and order flow imbalance are related to the overall market movement. This leads them to explain their two-step approach to take out commonality in order flow and enhance returns.

  • 00:55:00 In this section, Rama Cont and Francesco Capponi discuss the cross-impact in equity markets, which refers to how a stock's own order flow imbalance and the commonality of order flow among stocks impact a stock's return. They show that the self-impact coefficient is a major determinant of a stock's return, while the cross-impact coefficient is very small and almost all become negative once you take the principal component into account. They then test how much cross-impact terms contribute to explaining the execution cost of returns and assess whether they are statistically and economically significant, while also questioning their stability over time.

  • 01:00:00 In this section, Rama Cont and Francesco Capponi discuss the impact of cross-impact and its significance in equity markets. They conclude that while the statistics may be significant, economically, the magnitude is small, and there is hardly any distinguishable difference in explanatory power when including all the other order flow imbalances in the regression. They argue for a more parsimonious way of modeling impact and suggest using only a stock's own order flow balance and the correlation of order flow imbalances to model impact. They also emphasize the importance of stability over time and analyze the subsamples to ensure that the cross-impact coefficients are stable.

  • 01:05:00 In this section, Rama Cont and Francesco Capponi summarize their findings on cross-impact models in equity markets. They argue that the phenomenon of positive covariation between returns and order flow balance across different stocks can be explained without introducing high-dimensional models with many coefficients. A simple one-factor model for order flow in imbalance suffices to explain these patterns of cross-correlations of returns. They suggest that a better approach to building multi-asset impact models is to focus on building models of common factors in order flow, such as a linear factor model or principal component analysis of order flow. Deploying a single-asset impact model relating order flow to its own return is sufficient for explaining the amplitude of execution cost in portfolio execution.

  • 01:10:00 In this section, Rama Cont and Francesco Capponi discuss the practical applications of their model, specifically in the context of portfolio execution and transaction cost analysis (TCA). The model allows for the quantification of execution costs, taking into account the commonality in order flows between assets. The difference in execution costs between a single asset and a portfolio is linked to the commonality factor. The model can be used to measure portfolio-level execution costs and helps to better understand the impact of trading portfolios. They suggest using a good model for single asset impact coupled with a good model for the common factors in order flow across assets.

  • 01:15:00 In this section, the speakers discuss the use of the first principal component of returns in equation 12. They note that there is a high correlation between using the principal component of OFI and using returns, but they argue that they wanted to follow their causal analysis and model the commonality in order for imbalances to explain returns. They emphasize the importance of having a causal model and interpretation for the equation. The speakers thank the audience for their attention and express their willingness to share further materials and updates.
Rama Cont and Francesco Capponi: "Cross-Impact in Equity Markets"
Rama Cont and Francesco Capponi: "Cross-Impact in Equity Markets"
  • 2020.10.13
  • www.youtube.com
Title: "Cross-Impact in Equity Markets" Joint Work with Francesco CapponiAbstract: The empirical finding that market movements in stock prices may be correl...
 

Adam Grealish: "An Algorithmic Approach to Personal Investing"



Adam Grealish: "An Algorithmic Approach to Personal Investing"

Adam Grealish, Director of Investing at Betterment, provides insights into the company's algorithmic approach to personal investing and its goal-based strategy. Betterment utilizes a robo-advisory model, leveraging algorithms and minimal human intervention to deliver investment advice and management to its customers.

Grealish highlights three key factors that determine investment outcomes: keeping costs low, tax optimization, and intelligent trading. While all factors are important, Betterment places a strong emphasis on the first three. The company employs the Black Litterman optimization technique to construct globally diversified portfolios and continuously monitors target weights across its vast customer base of half a million individuals. Tax optimization, including strategies like tax-loss harvesting, asset location, and lot sorting, offers opportunities to outperform the market.

In the second part of his discussion, Grealish distinguishes Betterment's approach from traditional automated financial advisors. Unlike the "one-size-fits-all" approach of traditional robo-advisors, Betterment's algorithmic approach considers individual factors such as goals, time horizon, and risk tolerance. This customization allows for personalized portfolios tailored to each investor's unique situation. Betterment also offers additional features like tax-loss harvesting and tax-coordinated portfolios to maximize tax efficiency and increase returns.

Grealish further delves into the specifics of Betterment's investment strategies. The company encourages long-term allocation stability, adjusting portfolios only once a year to move toward the target allocation. They utilize trigger-based rebalancing algorithms to manage drift from the target allocation and minimize risks. Betterment's portfolios are constructed using broad market cap-based ETFs, optimizing exposure to risky asset classes with associated risk premiums.

Cost optimization is a significant aspect of Betterment's investment philosophy. The company takes advantage of the trend of decreasing fees on ETFs, reviewing the entire universe of ETFs on a quarterly basis. The selection process considers factors beyond expense ratio, including tracking error and trading costs, resulting in low-cost portfolios for Betterment's customers.

Tax optimization is another crucial element of Betterment's strategy. Grealish explains the importance of tax management and outlines three effective strategies: tax-loss harvesting, asset location, and lot sorting. Tax-loss harvesting involves selling securities at a loss to realize capital losses for tax purposes, while asset location maximizes after-tax returns by allocating assets across accounts strategically. Lot sorting entails selling lots with the largest losses first to optimize tax benefits.

Grealish acknowledges the impact of investor behavior on investment outcomes. Betterment combats negative behavior by implementing smart defaults, using automation, and encouraging goal-based investing. The company employs intentional design and data analysis to prompt users to take action when they deviate from their financial goals.

In terms of future developments, Grealish discusses the potential uses of AI in the fintech space. Betterment is exploring AI applications in automating financial tasks like robo-advising and cash management. The company aims to make financial services that were previously limited to high-net-worth individuals and institutions accessible to a broader audience. However, the complexity of individualizing tax preparation poses challenges in this area.

Overall, Adam Grealish provides valuable insights into Betterment's algorithmic approach to personal investing, emphasizing goal-based strategies, cost optimization, tax management, and behavior mitigation.

  • 00:00:00 Adam Grealish introduces Betterment, which is an online automated investment advising platform that uses a goal-based approach to investment management. Its aim is to deliver high returns to customers through optimal investment strategies. Betterment has a direct-to-customer business, a white-label technology platform for financial advisors, and a 401k business. The term "roboadvisor" accurately describes Betterment's approach as it provides digital financial advice through algorithms executed by software with minimal human intervention.

  • 00:05:00 Adam Grealish, the Director of Investing for Betterment, explains their approach to investing, which is based on algorithms and mathematical modeling. The Betterment platform offers a completely hands-off investment management experience without the need for human interaction, as well as access to human advisors for those who want it. According to Grealish, the key factors that determine investment outcomes are keeping costs low, tax optimization, trading intelligently, asset allocation and security selection. However, Betterment focuses primarily on the first three as they are considered the most deterministic in achieving financial goals, while placing less emphasis on asset allocation and security selection. They use the Black Litterman optimization technique to create a globally diversified portfolio and achieve the optimal return for their investors.

  • 00:10:00 In this section, Adam Grealish discusses how they help investors choose how much risk to take based on specific investment goals and time horizons. The app provides recommendations on how much risk to take, with projections on what that might look like over time. They then manage target weights through daily monitoring, doing so across half a million customers with up to 800,000 individual portfolios monitored on a daily basis. Rebalancing is viewed primarily as a risk management tool and is done in a tax-efficient manner when cash flows arise, when dividends are paid, or when taking fees out of the account. Grealish discusses a paper by Bushi (2012) that highlights the benefits of rebalancing a portfolio with uncorrelated securities over time. Finally, they differentiate goals based on liquidation profiles and build out a glide path based on how long the horizon is.

  • 00:15:00 Adam discusses how their algorithmic approach to personal investing works. They encourage investors to keep their allocation for a long time, adjusting it only once a year, to move towards their target allocation. The team adjusts their clients’ target allocation on a monthly basis, which allows marginal dollars to get closer to the correct risk target without having to do a rebalancing trade that entails potential risks. Their portfolios are entirely based on broad market cap-based ETFs, and they optimize their exposure to risky asset classes with a risk premium associated with them. The team employs a trigger-based rebalancing algorithm measuring the drift from a target allocation, and when it gets too far away, they rebalance, managing risk. Lastly, Grealish notes that there is a big disconnect between people who know a lot about finance and knowing a lot about personal finance.

  • 00:20:00 In this section, Adam Grealish discusses the trend of decreasing fees on ETFs, which has been advantageous for Betterment as it is an independent robo-advisory firm that is not tied to any individual fund family. Betterment has a quarterly fund selection process where the entire investable universe of ETFs is reviewed, and they are ranked not only on their expense ratio but also on other factors such as tracking error and trading costs. Betterment focuses on the total annual cost of ownership or the "taco score," which is determined by the cost to hold, cost to trade, and other factors. The process results in a low-cost portfolio for Betterment.

  • 00:25:00 In this section of the video, Adam Grealish discusses various aspects of Wealthfront's investment approach. He explains that their expected returns are generated through reverse optimization from cap m, and they use a Monte Carlo simulation engine that operates at the tax lot level to test their tax strategies. Grealish also notes that disintermediating the fund industry by holding individual securities is an interesting idea that may lead to more tax harvesting opportunities and personalization but has operational costs associated with it. Additionally, he explains how Wealthfront weighs the costs to hold and trade investments to provide an accurate measure of their total cost.

  • 00:30:00 Adam Grealish, CEO of Betterment, discusses the importance of managing taxes in retail investing and outlines three strategies for effective tax management: tax loss harvesting, asset location, and lot sorting. Tax loss harvesting involves selling securities at a loss to realize capital losses for tax purposes and buying correlated securities to maintain market exposure. Betterment aims to maximize losses harvested while maintaining target risk allocations and avoiding wash sales, which occur when an investor sells a security at a loss and buys a substantially identical security within 30 days. Grealish also notes that tax management presents opportunities for outperforming the market and can result in substantial tax avoidance in certain situations.

  • 00:35:00 Adam advises against blindly switching back into a primary security after 30 days to avoid increasing your tax liability, as you could realize one dollar in long-term losses but then four dollars in short-term capital gains, leading to negative tax arbitrage. He also highlights that the qualified dividend lower tax rate only kicks in after a 60-day period, and switching back too quickly can harm your tax efficiency. Grealish recommends choosing a secondary security with high correlation to the primary, comparable fees, and sufficient liquidity to ensure tax efficiency. When it comes to harvesting, Grealish suggests setting a threshold where expected benefits should be greater than transaction costs and opportunity costs, which can be determined using options theory, particularly if securities have high volatility. Grealish's back test shows an annual offset of close to 2 percent, but he warns that blindly following this strategy may not always be optimal.

  • 00:40:00 In this section, Adam Grealish discusses the benefits of tax loss harvesting and gives advice on how to apply it effectively in a personal account. Tax loss harvesting can be an effective way to manage risk, and back tested results show that it drives after-tax alpha. However, users need to consider transaction costs and the opportunity cost of future wash sales when applying this strategy for personal accounts. Asset location is another strategy which can maximize after-tax returns. By allocating assets across accounts to preserve the target allocation and the risk of the portfolio, users can boost their after-tax returns.

  • 00:45:00 Adam Grealish discusses tax treatments for different security types and provides an algorithmic approach to personal investing. He explains how to optimize investing in three accounts by moving the inefficient assets to tax-advantaged accounts and the efficient ones to taxable ones. This involves considering the growth rates, dividend yields, liquidation taxes, and qualified dividend income ratios of the assets and setting up the problem as a linear programming one. This algorithmic approach to investing adds about 50 basis points annually to a non-optimized strategy.

  • 00:50:00 In this section, Adam Grealish talks about tax lot management and how Betterment helps its users sort all of their lots and sell the biggest losses first before moving onto gains and selling the smallest ones first. He also highlights the importance of losses for tax purposes and how they can be used against capital gains, written off against income, or carried forward. Grealish then discusses the issue of tax rate uncertainty and how Betterment addresses it through its black litterman process by incorporating after-tax outperformance as a view and specifying a level of confidence around it. They then do a robust optimization on their posterior returns and construct an optimal portfolio out of that while revisiting their capital market assumptions and strategic asset location on an annual basis. Finally, he elaborates on the increased allocation of muni bonds in their taxable portfolio due to their higher after-tax expected performance.

  • 00:55:00 Adam Grealish discusses the topic of behavior and how it affects retail investors. He explains how investors tend to buy when the market goes up and sell when it drops, which leads to underperformance and decreased wealth. To combat this, robo-advisors set smart defaults, use automation, and encourage goal-based investing to promote better behavior. Adam also mentions studies that quantify the annualized underperformance due to investor behavior, typically ranging from 1-4%.

  • 01:00:00 Adam discusses Betterment's approach to combatting bad investing behavior through intentional design and data analysis. He notes that around three-quarters of their accounts are not engaged in market timing, and the company closely monitors customer activity. Betterment uses color design to indicate when a customer is off track to meet their financial goal, prompting them to take action to get back on track. During times of market uncertainty, the company relies on its platform to test different messaging and interventions and found that notifying customers about negative market trends caused alarm and led to negative outcomes. Instead, interventions and messaging within the app proved to be more effective in reducing negative outcomes and increasing customer deposits.

  • 01:05:00 In this section, Adam Grealish, Chief Investment Officer of Betterment, discusses the extent to which algorithmic investing is motivated by the desire to collect assets and whether it is ethical. He points out that the system primarily affects individuals who are off-target in their goals or on the margins of being on track, and says that there are better ways to extract assets if that were the company's goal. Other strategies he discusses include changing savings and deposits or altering one's goal plan. Grealish also describes Betterment's approach to mitigating behavioral biases, such as its "tax impact preview" feature that shows clients potential tax liabilities and has proved effective at reducing the likelihood of rash decision-making.

  • 01:10:00 Adam discusses the potential uses of AI in the fintech space. He believes that some of the first places AI will be seen is in automating peripheral pieces of finance like robo-advising and cash management. Betterment, for instance, is exploring the use of AI to map an external account to a proxy ticker and use transaction data for advising people on how much cash they should have in their checking account. Grealish also suggests that in the long-term, Betterment aims to put a financial advisor at the center of everyone's financial life and make things that were only available to ultra-high net worth and institutional investors broadly available, including tax preparation. However, individualizing tax preparation would make the problem space much more complex.

  • 01:15:00 Adam Grealish from Betterment explains that state-specific municipal bonds are not on the Betterment platform because it is not always obvious that being in-state is the best option, and it is a little bit like an off-menu item. While the Betterment platform allows you to link external accounts for any other real estate holdings and manually track your net worth, resource-intensive risk-return assessments of other funds are also not available. Betterment focuses on thinking about asset classes rather than precluding an asset class for tax reasons and is unique in the robo-advisory space due to its structure as an independent advisor and its push into customers' daily transactions, becoming a more full-service financial advisor. The company runs some of its research computations on AWS, although it is not a high user of AWS or existing public APIs yet.

  • 01:20:00 In this section, Adam Grealish discusses the trading process for Betterment. While they considered internalization of order flows for their customers, this option was ultimately not pursued due to its classification as an alternate trading venue. Betterment instead has its trading desk, with trades executed via Apex, who also clears for them. Customers are not charged transaction costs, only the flat platform fee, which keeps trading infrequent. Betterment's ETFs are comprised of equities and bonds while offering tax savings within bond funds. Additionally, Betterment tracks all returns net of their expected return, as this can be broken down into realized and expected returns.
Adam Grealish: "An Algorithmic Approach to Personal Investing"
Adam Grealish: "An Algorithmic Approach to Personal Investing"
  • 2020.09.17
  • www.youtube.com
In this talk, Adam Grealish of Betterment will explore how technology can be used to improve investor outcomes. Technology and automation can play a signific...
 

Miquel Noguer i Alonso: "Latest Development in Deep Learning in Finance"



Miquel Noguer i Alonso: "Latest Development in Deep Learning in Finance"

In this comprehensive video, Miquel Noguer i Alonso explores the potential of deep learning in the field of finance, despite the inherent complexities and empirical nature of the industry. Deep learning offers valuable capabilities in capturing non-linear relationships and recognizing recurring patterns, particularly in unstructured data and financial applications. However, it also presents challenges such as overfitting and limited effectiveness in non-stationary situations. To address these challenges, the integration of factors, sentiment analysis, and natural language processing can provide valuable insights for portfolio managers dealing with vast amounts of data. It is important to note that there is no one-size-fits-all model, and deep neural networks should not replace traditional benchmark models. Additionally, Alonso highlights the significance of BERT, an open-source and highly efficient language model that demonstrates a deep understanding of numbers in financial texts, making it particularly valuable for financial datasets.

Throughout the video, Alonso shares important insights and discusses various aspects of utilizing deep learning models in finance. He explores transforming financial data into images for analysis using convolutional neural networks, leveraging auto-encoders for non-linear data compression, and applying memory networks for time series analysis. Collaboration between domain experts and machine learning practitioners is emphasized as a critical factor for effectively addressing finance-related problems using deep learning techniques.

Alonso delves into the challenges encountered when working with deep learning in finance, such as the dynamic nature of the data generating process and the need to develop models that can adapt to these changes. He highlights concepts from information theory, complexity, and compressing information to find the most concise representation. The Universal Approximation Theorem is discussed, emphasizing the ability of deep neural networks to approximate any function with arbitrary precision, but generalization is not guaranteed. The speaker recommends further exploration of research papers on regularization, intrinsic dimensions of neural networks, and over-parameterized neural networks.

The speaker also touches upon the idea of an interpolating regime, where deep neural networks can uncover larger function classes that identify interpolating functions with smaller norms. They discuss the qualitative aspects of deep neural networks, emphasizing the varying importance of different layers and their role in time series prediction. However, it is stressed that linear models still serve as benchmarks, and the results of deep learning models should be compared against them.

Alonso provides insights into the performance of deep learning models in finance, showcasing the results of using long short-term memory networks with multiple stocks and demonstrating their superiority over other neural networks. Deep learning models are shown to outperform linear models in selecting the best stocks in the S&P 500, resulting in better information ratios out-of-sample. The speaker underscores that deep learning consistently performs well and can be a reliable choice when selecting a model.

Factors play a crucial role in deep learning models for finance, enabling exploration of non-linear relationships with returns. The utilization of non-linearity distinguishes this approach from pure time series exercises. The speaker also emphasizes the importance of parameter selection during the training period and cautions against assuming that using more data always leads to improved accuracy. It is important to note that these models do not incorporate costs or real-life considerations, as they are primarily for research purposes based on historical data.

The speaker clarifies the focus of their paper, highlighting that the intention is not to claim that deep neural networks are superior but rather to emphasize the need for them to be used alongside traditional benchmark models. The significance of capturing non-linear relationships and understanding recurring cycles is discussed, along with the need to consider parameters such as the learning window. Deep neural networks may provide unique insights in specific scenarios by capturing second or third order effects that linear models may overlook. However, it is stressed that there is no universal model, and deep neural networks should complement existing benchmark models rather than replacing them.

The application of natural language processing, specifically sentiment analysis, in finance is also explored. Given the vast amount of information generated in the markets, big data tools are essential for investigating and analyzing high-dimensional spaces. Machine learning, particularly deep learning, proves valuable in dealing with these challenges. Language models can be leveraged for tasks like sentiment analysis, which can provide insights into market momentum. Scraping the internet has proven to be an efficient approach for detecting information changes that may indicate shifts in the market. Overall, natural language processing offers valuable insights for portfolio managers dealing with large volumes of data.

In the video, the speaker delves into the two approaches to sentiment analysis in finance. The traditional method involves counting the frequency of positive and negative words, while the more advanced approach utilizes deep learning and word embeddings to grasp the contextual and semantic meaning of words. The speaker highlights the effectiveness of the bi-directional encoder representation from transformers (BERT), a cutting-edge language model that offers a more accurate and efficient representation of words. BERT's ability to understand numbers in financial texts is particularly crucial for accurate financial analysis. Other function approximators like multi-layer perceptrons, memory networks, and covnets are also mentioned as useful tools in finance.

Additionally, the speaker discusses the concept of transforming financial data into images and employing convolutional neural networks for analysis. This approach proves especially beneficial for unsupervised learning problems. The use of auto-encoders for non-linear data compression and memory networks for time series analysis is introduced. Memory networks can be suitable for analyzing time series data if the environment is sufficiently stable. Furthermore, the speaker touches upon the use of transformer models for language processing in finance and provides insights into their implementation using TensorFlow.

Regarding the implementation of open-source deep learning models in finance, the speaker emphasizes that while specific training for financial applications may be required, it is an achievable goal due to the abundance of open-source code available. Collaboration between domain experts and machine learners is crucial for solving finance-related problems, as there are numerous opportunities for leveraging machine learning in the field. The speaker notes that while handcrafted natural language processing approaches are currently utilized in finance, deep learning models have yet to be widely adopted in the industry.

The video also delves into traditional methods of handcrafted control in finance, where individuals use dictionaries to describe entities such as JP Morgan while ensuring there are no typos. The effectiveness of various machine learning algorithms, such as long short-term memory networks and BERT, is discussed. BERT is considered the state of the art in published research. The potential of machine learning for cross-sectional investments is also explored, suggesting the use of factors or returns to assist machines in interpreting flat returns or factors.

Addressing the difficulty of finding optimal values in deep learning, the speaker acknowledges that it can be an NP problem. Human data scientists with experience and intuition must make heuristic choices based on their expertise. The challenge of understanding and interpreting deep neural networks is highlighted, as even mathematicians struggle to formulate equations to explain their exceptional performance. Qualitative analysis is often employed in such cases. However, over time and after working with various datasets, data scientists can develop an intuition for selecting the most appropriate parameters for specific situations.

  • 00:00:00 Miguel Noguer i Alonso discusses the application of deep learning in finance. He notes that deep learning has been successful in other areas such as image recognition and language models, but it is complicated to see how it can be successfully applied in finance due to the empirical and noisy nature of the industry. Despite the complexity, there are exciting possibilities for the use of deep learning in unstructured data and financial applications. The Education Finance Institute is collaborating with universities and firms to research the use of AI in finance.

  • 00:05:00 In this section of the video, Miquel Noguer i Alonso discusses the potential of using machine learning models in finance and the lack of research being conducted in this area. He goes on to highlight the various fields of machine learning that can be used in finance, including supervised, unsupervised, and reinforcement learning. Noguer i Alonso encourages researchers to focus on building more tools for unsupervised learning as there is currently limited research in this area. He concludes by stating that there is no place in finance where machine learning cannot be utilized for purposes such as predicting credit losses and organizing data sets.

  • 00:10:00 The speaker introduces deep learning as an engine for impossible learning, regression problems and unsupervised learning through the use of non-linear functions. The neural network is explained as a non-linear function with a large number of parameters, which has led to warnings from statisticians and engineers about its feasibility. However, the pioneers of deep learning found the right combinations of activation functions, number of layers, and neurons that make it work against statistical expectations. The speaker also discusses the various architecture of deep learning, such as convolutional neural networks, recurrent neural networks, and transformers.

  • 00:15:00 The speaker discusses the pros and cons of deep learning in finance. On the plus side, deep learning models are better at capturing non-linearity and the expressive nature of datasets, and show efficiency in multivariate time series. They are also competitive with boosting trees, one of the best techniques for categorical and numerical data. However, the main cons are overfitting due to the large number of parameters in deep learning models and their lack of effectiveness in non-stationary situations, which is a big issue in finance as time series keep changing. The speaker notes that current models do not provide good solutions to this problem.

  • 00:20:00 Miquel Noguer i Alonso explains the challenges faced by deep learning in finance, particularly in the changing nature of the data generating process and how to create models that can work within it. One solution he suggests comes from information theory; the idea of complexity and compressing information to the shortest program possible. He also discusses the Universal Approximation Theorem and how it guarantees deep nets can approximate anything with arbitrary precision, but it is not guaranteed that they will generalize. He encourages readers to read a paper by Sun that argues that regularization is not sufficient for generalization, and recommends papers on the intrinsic dimensions of neural networks and over-parameterized neural networks.

  • 00:25:00 In this section, the speaker talks about a new regime called interpolating regime where deep nets might be able to write some certificates by having a huge number of gravity returned, which might lead to the discovery of large function classes that find interpolating functions with smaller norms. The idea is to find simpler things with that number of features. They also discuss the qualitative aspects of the models, such as how all the layers are not created equal and the role of deep neural networks in time series prediction. However, the benchmark models for them are still the linear models, and they need to compare the results with the benchmarks.

  • 00:30:00 The speaker discusses the performance of deep learning models in finance. They demonstrate the results of using long short-term memory networks with 30 stocks instead of just one, and note that the absolute error is lower in comparison to other neural networks. The speaker also shows how deep learning models outperform linear models in selecting the best stocks in the S&P 500, resulting in better information ratios out-of-sample. Overall, deep learning is found to be consistently close to the best models and is a good choice when choosing a model blindly.

  • 00:35:00 The speaker discusses the use of factors in deep learning models for finance. Factors such as quality, value, and momentum are used to investigate non-linear relationships with returns. The difference between this method and a pure time series exercise is the use of non-linearity. The speaker also discusses the importance of training period parameters, noting that using more data does not necessarily mean better accuracy. The model does not include costs or real-life considerations, as it is purely for research purposes and based on past data.

  • 00:40:00 In this section, the speaker discusses the paper they are updating and clarifies that the claim in the paper is not that deep nets are better but rather that they need to be run alongside traditional benchmark models. Additionally, the speaker explains that deep nets are useful in capturing nonlinear relationships and learning right cycles. However, the parameters, such as the window in which networks learn, also need to be considered. Furthermore, deep nets may be telling us different things in some repair regimes due to learning second or third order effects that a linear model may miss. The speaker also emphasizes that there is no one-size-fits-all model and that deep nets should not replace traditional benchmark models.

  • 00:45:00 Miguel Noguer i Alonso discusses the use of natural language processing in finance, specifically sentiment analysis. With the vast amount of information being generated in the markets, big data tools are needed to investigate, and machine learning, especially deep learning, can be useful for dealing with high-dimensional spaces. Language models can be used for tasks such as sentiment analysis, which can be a precursor of momentum in finance. Scraping the internet has also proven to be an efficient way to search for information changes that may indicate market shifts. Overall, natural language processing can provide useful insights for portfolio managers when dealing with large amounts of data.

  • 00:50:00 In this section, the speaker discusses the use of sentiment analysis in finance and the two ways in which it can be done: the traditional method of counting the frequency of positive and negative words and the more advanced method of using deep learning and word embeddings to understand the context and semantics of the words. The most advanced model is the bi-directional encoder representation from transformers, which allows for a more efficient and accurate representation of the words. This technology can be useful in things such as weather management and supply chain problems.

  • 00:55:00 In this section, Miquel Noguer i Alonso discusses the latest development in deep learning in finance with a focus on bi-directional complicated architecture, BERT, and the importance of numbers in language models. BERT is an open-source, highly efficient language model that can be used to train on financial datasets, which can save time and human effort. It performs better than other models and is particularly good at understanding numbers in financial texts, which is crucial for accurate analysis. The multi-layer perceptrons, memory nets, and covnets are other function approximators that are useful in finance.

  • 01:00:00 Miguel Noguer i Alonso discusses the idea of transforming financial data into images and using convolutional neural networks to analyze them, which may be particularly useful for unsupervised learning problems. He also introduces the concept of auto-encoders, which can be used for non-linear compression of data, and memory networks, which may be suitable for time series analysis if the environment is stable enough. Finally, Noguer i Alonso mentions the use of transformer models for language processing in finance and how to implement these models in TensorFlow.

  • 01:05:00 In this section of the video, Miquel Noguer i Alonso, the Director of Financial Innovation and Senior Lecturer in Finance at the ESADE Business School, discusses the feasibility of implementing open source deep learning models in finance. He explains that there is a lot of open source code available, and while it may require training specifically for financial applications, it is not an unreachable goal. Alonso also emphasizes the importance of collaboration between domain experts and machine learners to solve finance-related problems, as there are many opportunities for machine learning in finance. Additionally, he notes that while there are handcrafted NLP approaches being used in finance, deep learning models are not widely adopted in this industry yet.

  • 01:10:00 The speakers discuss traditional methods of handcrafted control in finance, which involve people using dictionaries to describe things like JP Morgan and ensuring there are no typos. They go on to discuss the use of machine learning in finance and the effectiveness of various algorithms, such as short long-shot memory networks and BERT, which they suggest is currently the state of the art in published research. The speakers also discuss the potential for using machine learning for cross-sectional investments and suggest using factors or returns to help the machine make sense of flat returns or factors.

  • 01:15:00 In this section, Noguer and Alonso discuss the difficulty in finding optimal values in deep learning and how it can be an NP problem, requiring the skill and intuition of a human data scientist to make heuristic choices based on experience and intuition. They highlight the challenges in understanding and interpreting deep nets, as even mathematicians struggle to create equations to understand why it works so well, and instead must fall back on qualitative analysis. Despite these challenges, after working with several datasets, data scientists can develop an intuition on the best parameters to use for a given situation.
Miquel Noguer i Alonso: "Latest Development in Deep Learning in Finance"
Miquel Noguer i Alonso: "Latest Development in Deep Learning in Finance"
  • 2020.03.19
  • www.youtube.com
Title of Seminar: "Latest Developments in Deep Learning in Finance"Date of Seminar: 9/24/19Speaker Bio: Miquel Noguer i Alonso (Artificial Intelligence Finan...
 

Gordon Ritter: "Reinforcement Learning and the Discovery of Arbitrage Opportunities"



Gordon Ritter: "Reinforcement Learning and the Discovery of Arbitrage Opportunities"

In this video, Gordon Ritter explores the application of reinforcement learning in the context of financial markets, specifically focusing on discovering arbitrage opportunities within derivatives trading. He emphasizes the significance of complex multi-period planning and strategy when faced with uncertainty. Ritter demonstrates the use of value functions to guide the search for optimal policies and proposes a reward function that combines single-period increment with a constant multiplied by the square of the deviation from the mean.

Ritter discusses the process of creating a simulation that includes an arbitrage opportunity without explicitly instructing the machine where to find it. He highlights the use of stochastic simulations to model financial markets and suggests that with enough data, an agent trained through reinforcement learning can identify market arbitrage. However, he acknowledges the limitations of reinforcement learning, such as overfitting and the challenges in handling unforeseen scenarios. Further testing, such as exploring gamma neutrality trading strategies, is proposed to expand the capabilities of trained agents.

The video includes an analysis of the performance of a reinforcement learning agent compared to a baseline agent in derivatives hedging. The trained agent demonstrates significant cost savings while maintaining a similar range of realized volatility, showcasing its ability to make trade-offs between cost and risk. Ritter discusses the relevance of value functions in reinforcement learning for derivatives trading, as derivative prices themselves can be seen as a form of value function.

Ritter also highlights the importance of constructing appropriate state vectors and action spaces in reinforcement learning. Including relevant information in the state vector and defining appropriate actions are essential for effective decision-making. He presents the use of Ornstein and Limbic processes as a means to model mean-reverting dynamics, which can potentially lead to arbitrage opportunities.

Additionally, the video discusses the challenges of using short-term returns for trading opportunities and the limitations of finite state spaces. Ritter suggests employing continuous state spaces and function approximation methods, such as model trees and neural networks, to address these challenges and improve the estimation of value functions.

Finally, Ritter acknowledges that while reinforcement learning can be a valuable tool in discovering arbitrage opportunities, it is not a guaranteed approach in real-life trading. He concludes by highlighting the potential of reinforcement learning to uncover profitable trades through stochastic systems but cautions against expecting it to find arbitrage opportunities if they do not exist in the market. The limitations of reinforcement learning, including overfitting and its inability to handle unforeseen scenarios, are also recognized.

  • 00:00:00 In this section, Gordon Ritter talks about learning through experience in order to optimize rewards over time in an uncertain environment. He provides examples of how robots can navigate through a room and how gazelles learn to walk by sending signals to their leg muscles. He also mentions that the best Go player in the world is now an agent trained by reinforcement learning methods, which is the focus of his talk. Ritter emphasizes the importance of complex multi-period planning and strategy in the presence of uncertainty, and how reinforcement learning can be applied in finance to discover arbitrage opportunities.

  • 00:05:00 Gordon Ritter explains the concept of reinforcement learning, which is the process of an agent interacting with the environment and taking action to optimize a reward signal. The agent observes the state of the environment and determines if their actions resulted in a positive or negative reward. Reinforcement learning involves the use of value functions to structure the search for optimal policies to maximize the expectation of long-term reward. Ritter notes that the idea of value functions is familiar to those with a background in mathematical finance.

  • 00:10:00 In this section of the video, Gordon Ritter discusses the concept of reinforcement learning, specifically the Hamilton-Jacobi Bellman equation which is used to find the value function of an optimal policy. However, he notes that in real-world scenarios, sometimes it is not feasible to explicitly solve the equation. Ritter then introduces the action value function, which is used to find the expected long-term gain of taking a particular action in a given state and following a policy thereafter. The goal of reinforcement learning is to find the optimal policy by finding the cue or action value function that corresponds to it. Ritter then poses the question of whether artificial intelligence can discover an optimal dynamic trading strategy in a realistic scenario, taking into account the costs of trading such as bid-offer spread and commissions. He suggests that if there were an arbitrage in the market, an agent produced by reinforcement learning would be able to find it with enough data.

  • 00:15:00 Gordon Ritter discusses the use of reinforcement learning to discover arbitrage opportunities in financial markets. Ritter argues that, unlike traditional methods that rely on consistent arbitrage-free prices, reinforcement learning can be used to find out if there are any arbitrage opportunities in a given dynamical system. This approach can be used to train the algorithm to find strategies with high Sharpe ratios, which can be used to identify statistical arbitrage, which is not a pure arbitrage but a good trading strategy. Ritter claims that such an approach is similar to AlphaGo Zero, which learned to play Go with zero human guidance and beat human champions.

  • 00:20:00 Gordon Ritter explains the assumptions used when maximizing expected utility of wealth and how it is mathematically equivalent to maximizing the mean variance quadratic form. He clarifies that a quadratic function cannot be a utility function and explains the reward signal he uses to train rational agents to act like von Neumann Morgenstern investors. He suggests combining the increment in a single period minus a constant times the square of around the mean for the reward function and advises on choosing what to put in the state, emphasizing the importance of including relevant information that helps the agent make good decisions.

  • 00:25:00 Gordon Ritter discusses how to construct a state vector and action space in reinforcement learning. He explains that in order for an agent to learn to use a signal to make a trading decision, that signal must be included in the state vector. Furthermore, the action space should include choosing which execution strategy to use, choosing a parameter in an algorithm to change its behavior, or deciding whether to cross the spread or join a queue on the near side of the order book. Ritter also provides an example of how Ornstein and Limbic processes can be used in finance to model mean reverting dynamics, which could lead to an arbitrage opportunity.

  • 00:30:00 In this section, Gordon Ritter discusses building a stochastic simulation that has at least an approximate arbitrage as a statistical arbitrage in it, even though it is not a guaranteed profit. He emphasizes that the agent has to figure out everything by playing games and losing a few times. The simulation has a spread cost and an impact cost based on a linear price impact function, and sometimes, he likes to play around with a multiplier in front of the overall cost. He says that the state vector can be quite simple, and the state only contains what the agent holds and the price, which contains the signal. Finally, he notes that this is just a proof of concept since it is not guaranteed to work in real-life trading.

  • 00:35:00 Gordon Ritter discusses the process of creating a simulation that has an arbitrage opportunity without explicitly telling the machine where to look for it. He explains that it works by learning a value function and by a classical method called Q-learning. However, he admits that he doesn't particularly like the model for the Q function because it has to learn each matrix element independently with no continuity. Ritter also presents a plot of the value function as a function of the price for various actions, showing the emergence of a no-trade zone around the equilibrium price.

  • 00:40:00 In this section, Gordon Ritter discusses the limitations of using short-term returns for trading opportunities and the challenges that arise when using a finite state space. He suggests using continuous state spaces and function approximation methods, such as model trees, to estimate the Bellman value function Q and find the best unknown function that fits the training data. This method allows for a more efficient and effective way of approximating the value function and finding trading opportunities.

  • 00:45:00 Gordon Ritter discusses the use of statistical machine learning techniques, such as function approximators, to train reinforcement learning agents to approximate rewards in the form of long-term and short-term rewards. By using a better function approximator, such as a neural network, the Bellmen value function can be more accurately approximated and a continuous function obtained, allowing for a better understanding of optimal actions. Ritter then applies these techniques to the example of derivatives hedging, where banks would like to neutralize risks in positions without dumping the derivatives on the market. The goal is to use reinforcement learning agents that can optimally trade a basket of derivatives based on dynamic replicating strategies, allowing for automatic hedging and reducing costs from market impacts.

  • 00:50:00 In this section, Gordon Ritter discusses the state variables that must exist, minimally, in a European options market to enable a dynamic replicating portfolio strategy. He states that the state variables that would go into computing the delta in a Black-Scholes type world are the underlying price and the time to expiration, with the strike price of the option being part of the definition of what the option is. Furthermore, he mentions that the state does not need to contain the option Greeks, and the agent is expected to learn those nonlinear functions themselves. He concludes by saying that the machine can only learn by experience where to generate a large experience set through simulation.

  • 00:55:00 Gordon Ritter discusses the output of his reinforcement learning agent that trades cost for vol, and compares it to a baseline agent that uses Delta hedging. The trained agent shows a smoother position tracking of the Delta hedge, whereas the baseline agent shows excessive trading and higher costs due to Delta hedging. The trained agent has learned to make a trade-off between cost and risk, and Ritter notes that it is acceptable to accept some volatility for a large cost saving. Although the market was simulated with high trading costs, the trained agent still performed better than the baseline agent.

  • 01:00:00 In this section, the speaker presents histograms of simulations to compare the performance of the Delta agent and the reinforcement learning method. The Delta agent shows highly predictable realized vol, but the trained agent shows significant cost savings while maintaining a similar range of realized vol. The speaker suggests further testing, such as looking at trading strategies that achieve gamma neutrality, which could potentially be discovered by the agent. The speaker concludes that the use of value function-based methods, as seen in reinforcement learning, intersect well with the field of derivatives trading, as derivative prices themselves are a form of value function.

  • 01:05:00 Gordon Ritter explains that reinforcement learning can be used to discover arbitrage opportunities by training a stochastic system that can find profitable trades. However, if the system fails to find any opportunity after millions or billions of simulations, it may indicate that the market does not admit arbitrage. He also discusses the limitations of reinforcement learning, including overfitting and the inability to handle infinite trades and unforeseen scenarios such as flash crashes.
Gordon Ritter: "Reinforcement Learning and the Discovery of Arbitrage Opportunities"
Gordon Ritter: "Reinforcement Learning and the Discovery of Arbitrage Opportunities"
  • 2019.05.30
  • www.youtube.com
Seminar Date: March 20, 2019Info: Reinforcement learning is a way of training a machine to find an optimal policy for a stochastic optimal control system, ...
 

Marcos Lopez de Prado: "The 7 Reasons Most Machine Learning Funds Fail"



Marcos Lopez de Prado: "The 7 Reasons Most Machine Learning Funds Fail"

Marcos Lopez de Prado delivered a comprehensive presentation outlining the reasons behind the failure of most machine learning funds in the finance industry. He stressed the significance of several key factors that contribute to success in this domain.

One of the primary factors highlighted by de Prado was the absence of a well-formulated theory in discretionary funds. He noted that many investment conversations lack a constructive and abstract approach due to the lack of a solid theoretical foundation. Without a theory to guide decision-making, discretionary funds struggle to interact with others and test their ideas, resulting in poor choices and potential losses.

De Prado also discussed the detrimental effects of working in isolated silos within machine learning funds. He emphasized that collaboration and communication are essential for success, warning against hiring numerous PhDs and segregating them into separate tasks. Instead, he advocated for a team-based approach where specialists work independently but possess knowledge of each other's expertise, leading to better strategies and outcomes.

Specialization within the team was another crucial aspect highlighted by de Prado. He stressed the importance of assembling a group of specialists capable of handling complex systems and tasks. These experts should possess independent skills while understanding the overall strategy and being aware of their colleagues' fields of expertise. This meta-strategy paradigm is valuable not only for developing effective strategies but also for making informed decisions in uncertain situations, including hiring, investment oversight, and defining stopping criteria.

Proper handling of financial data was another key factor discussed by de Prado. He emphasized the need to achieve stationarity in data while preserving valuable information. He suggested differentiating data by fraction to retain memory information from previous observations, enabling critical predictions at specific points. Additionally, he advised using a specific threshold to achieve almost perfect correlation between stationary and original series without using excessive memory. De Prado cautioned against using returns in cases where there are no liquid future contracts, recommending the use of a single observation in most scenarios.

Sampling frequency and appropriate labeling of data were also addressed by de Prado. He proposed basing the sampling frequency on the arrival of market information rather than relying on conventional methods like daily or minute observations. By using techniques like dollar bars that sample based on transaction volume, one can ensure equal amounts of information are included in the sample. Proper labeling of observations, such as using the Touch Barrier Labeling method, allows for the development of risk-aware strategies, taking into account price dynamics and the possibility of being stopped out.

The concept of meta-learning, where one machine learning model predicts the accuracy of another model's predictions, was discussed as a means to achieve precision and recall. By composing two separate models, one can balance the trade-off between precision and recall using the harmonic mean. De Prado recommended employing different machine learning algorithms for distinct tasks to optimize performance.

De Prado highlighted the challenges of applying machine learning in finance, emphasizing the need for human experts to filter data before using machine learning algorithms. Financial data is inherently messy and non-iid, making it difficult to link specific observations to assets. Moreover, the constant changes in financial markets due to regulations and laws necessitate a careful and nuanced approach to implementing machine learning algorithms. Simply plugging financial data into a machine learning model is not sufficient for success in finance.

Addressing the issues of non-uniqueness and overfitting was another significant aspect of de Prado's presentation. He proposed a methodology to determine the uniqueness of observations, recommending the removal of observations that contain older information than what is shared with the testing set, a process known as "purging." This helps create more accurate machine learning models by aligning with the assumptions of cross-validation techniques. De Prado also warned against the dangers of overfitting, emphasizing that repeatedly back-testing strategies can lead to false positives and diminishing usefulness over time. Considering the number of trials involved in discovering strategies is crucial to avoid overfitting and false positives. De Prado advised setting a high threshold for the performance of strategies to mitigate the risks associated with overfitting.

The concept of the "deflated strawberry" was introduced by de Prado, illustrating that many hedge funds exhibit negative skewness and positive excess kurtosis, even if fund managers did not intentionally target these characteristics. This is primarily because fund managers are evaluated based on the Sharpe ratio, and these statistical properties can inflate the ratio. De Prado emphasized the importance of considering the sample size and number of trials involved in producing a discovery when analyzing returns. He cautioned against investing in strategies with a low probability of achieving a true Sharpe ratio greater than zero.

Achieving a balance between model fit and overfitting was underscored by de Prado. He advised against striving for a perfect fit, as it can lead to overconfidence and increased risk. Instead, he recommended finding a way to preserve important memories while effectively applying statistical models. De Prado also cautioned against using overly complicated models, as they can hinder data feeding and cross-pollination, impeding the overall effectiveness of machine learning algorithms.

De Prado addressed the phenomenon in the industry where certain traits or metrics become preferred, leading to a convergence of strategies. Comparing it to the breeding of dogs, where human preference and aesthetic shape certain traits, he explained how the use of specific metrics, such as the combination of Sharpe ratio and negative skewness, has become favored in hedge funds, even if it was not initially targeted. Addressing this phenomenon proves challenging, as it occurs without any specific triggering event.

Furthermore, de Prado emphasized the importance of using recent price data when forecasting, as it holds greater relevance for the immediate future. He recommended employing exponential weight decay to determine the sample length when using all available data. Additionally, he highlighted the significance of controlling the number of trials and avoiding isolated work environments as common pitfalls leading to the failure of machine learning funds. He noted that finance differs from other fields where machine learning has made significant advancements, and hiring statisticians may not always be the most effective approach for developing successful trading algorithms.

In summary, Marcos Lopez de Prado's presentation shed light on the reasons why most machine learning funds fail in the finance industry. He emphasized the need for a well-formulated theory, team collaboration, specialization, proper handling and differentiation of financial data, appropriate sampling and labeling, addressing challenges like non-uniqueness and overfitting, and incorporating human expertise in implementing machine learning algorithms. By understanding these factors and taking a careful and nuanced approach, practitioners can increase the likelihood of success in the dynamic and complex world of finance.

  • 00:00:00 Marcos Lopez de Prado discusses how the lack of a well-formulated theory in discretionary funds prevents people from having a truly constructive and abstract conversation about investments. When attending investment conferences, he finds most conversations to be anecdotal with no real theory being discussed. As a result, discretionary funds may suffer from an inability to interact with others and test theories. This lack of a well-formulated theory may lead to poor decision-making and eventually result in loss of business.

  • 00:05:00 In this section, Marcos Lopez de Prado discusses why most machine learning funds fail, citing the issue of working in silos as a major factor. He explains that it is impossible to hire 50 PhDs and put them to work together in silos, each working on the same tasks repeatedly, without any collaboration or communication. This often leads to multiple strategies being attempted, resulting in faulty discoveries, failed implementations, and eventually, the fund being shut down. Lopez de Prado asserts that developing strategies requires a team effort, and many strategies are needed to achieve success.

  • 00:10:00 Marcos Lopez de Prado emphasizes the importance of specialization within a group as a key factor for success in implementing machine learning in finance. He argues that creating a team of specialists is essential for building a high-performance infrastructure capable of handling complex systems such as industrial processes or machine learning strategies. The individual experts should be able to work independently, but still, be knowledgeable about the whole game plan and be aware of each other's fields of expertise and the queries and issues relevant to them. This meta-strategy paradigm is not just useful for developing strategies but for making decisions under uncertainty, including hiring, overseeing investments, and stopping criteria for strategies.

  • 00:15:00 In this section, Marcos Lopez de Prado emphasizes the importance of properly handling financial data to achieve stationarity while preserving the most valuable information. Differentiating data has a cost as it erases valuable signal information, making it impossible to predict anything. Therefore, he suggests differentiating data by fraction to preserve some memory information about previous observations that allows for discernment of whether a series is at a critical point for making a prediction. A combination of differentiating and stationary data provides useful information for classical analysis.

  • 00:20:00 The speaker discusses creating stationary series and how to achieve it. By using a specific threshold, it is possible to achieve a stationary series that is almost perfectly correlated with the original series, without using too much memory. If the correlation with the original series is virtually zero, achieving stationarity is useless. Additionally, the speaker observes that there are no liquid future cases where using returns is justified, and advises against using it even on daily data. He suggests that using a single observation would suffice in most cases.

  • 00:25:00 The speaker discusses the importance of sampling frequency and suggests that it should be based on the amount of information that arrives at the market, rather than using canonical methods such as daily or one-minute observations. He gives an example of using dollar bars, which sample based on the amount of transactions, to ensure that the sample includes equal amounts of information, rather than just equal amounts of time or price. The speaker also emphasizes the importance of taking prices into account when sampling, as it provides critical information that affects market activity.

  • 00:30:00 Marcos Lopez de Prado discusses the importance of sampling and labeling data correctly in finance. He explains that it is crucial to take more samples when a lot of information is arriving at the market because they contain more information. He suggests using the Touch Barrier Labeling method to label observations correctly by taking into account what happens to the price and how it reached that particular outcome. By doing so, it allows one to develop a strategy that takes into account risk levels, which is important because most people need to follow risk levels and need to consider the possibility of being stopped out.

  • 00:35:00 Marcos López de Prado discusses the concept of meta-learning, where a machine learning model is used to predict if another machine learning model's prediction is correct. He explains the importance of composing the two decisions into two different models and how it is useful in achieving precision and recall in machine learning algorithms. López de Prado also introduces the concept of harmonic mean, which is used to balance the trade-off between precision and recall, and suggests using different machine learning algorithms to take care of two very different tasks.

  • 00:40:00 In this section, Marcos Lopez de Prado explains the challenges of using machine learning in finance. He emphasizes the importance of having human experts filter the data before using machine learning algorithms, as financial data is messy and non-iid, meaning it is not easy to link a particular observation to a particular patient, or in this case, a particular asset. Furthermore, financial markets change constantly due to new regulations and laws, which can significantly impact the performance of a machine learning model. Therefore, using machine learning in finance requires a careful and nuanced approach, and cannot simply be implemented by plugging financial data into a machine learning algorithm.

  • 00:45:00 Marcos Lopez de Prado discusses the issue of non-uniqueness of observations and proposes a methodology to address it. He suggests identifying the amount of overlap in each observation and determining their uniqueness to derive a solution. As cross-validation techniques assume that observations are independent and identically distributed, he also recommends identifying which observations should be removed from the training set to avoid the assumption of IID. This process, called "purging," removes observations that contain information older than what is shared with the testing set, resulting in more accurate machine learning models in finance.

  • 00:50:00 In this section, Marcos Lopez de Prado discusses the seventh reason why most machine learning funds fail, which is overfitting. He explains that even if the Sharpe ratio of a strategy is zero, by repeatedly back-testing the strategy, one can eventually find an amazing strategy on paper. However, repeatedly back-testing can lead to false positives, and it becomes less useful over time. To avoid overfitting and false positives, one needs to be smart and practice taking into account the number of trials involved in their discovery. The more you practice, the higher the threshold you should demand for the practice.

  • 00:55:00 Marcos Lopez de Prado explains the concept of the deflated strawberry, which is the idea that most hedge funds have negative skewness and positive excess kurtosis, despite fund managers not intentionally targeting these moments. This is because fund managers are evaluated based on the Sharpe ratio, and statistically, negative skewness and positive excess kurtosis can inflate this ratio. De Prado highlights the importance of considering the sample size and number of trials involved in producing a discovery when analyzing returns, and warns against investing in a strategy that has a low probability of having a true Sharpe ratio greater than zero.

  • 01:00:00 Marcos Lopez de Prado emphasizes the importance of balancing the tradeoff between fitting your model to the data and avoiding overfitting. He suggests not focusing too much on achieving a perfect fit because it may lead to overconfidence and increased risk. Instead, he recommends finding a way to preserve memories while still being able to buy and apply statistical models effectively. Lopez de Prado also notes that using models that are too complicated can make cross-pollination and feeding data difficult.

  • 01:05:00 Marcos Lopez de Prado explains how certain traits or metrics can become the preferred ones in machine learning funds and hedge funds, leading to a convergence in the industry. Using the example of breeding dogs, where certain traits are preferred because of human preference and aesthetic, he compares this phenomenon to the use of Sharpe ratio trade and negative skewness, which has become the preferred combination for hedge funds, even though it was not initially targeted. He notes that addressing this phenomenon is challenging, as it happens without a particular event occurring.

  • 01:10:00 In this section, Marcos López de Prado discusses the importance of using recent price data when forecasting, as it is more relevant for the immediate future. He suggests using all available data with an exponential decay in weighting to decide sample length. López de Prado also emphasizes the need to control for the number of trials and to avoid working in silos, as these are common reasons why machine learning funds fail. Additionally, he highlights that finance is different from other fields where machine learning has made significant advancements, and hiring statisticians is not always the best approach for developing a successful trading algorithm.
Marcos Lopez de Prado: "The 7 Reasons Most Machine Learning Funds Fail"
Marcos Lopez de Prado: "The 7 Reasons Most Machine Learning Funds Fail"
  • 2019.05.13
  • www.youtube.com
Seminar Date: September 5, 2017For more information, please visit Marcos Lopez de Prado's website: http://www.quantresearch.org/Summary: In this popular thro...
 

Irene Aldridge: "Real-Time Risk in Long-Term Portfolio Optimization"



Irene Aldridge: "Real-Time Risk in Long-Term Portfolio Optimization"

Irene Aldridge, President and Managing Director of Able Alpha Trading, delivers a comprehensive discussion on the impact of high-frequency trading (HFT) on long-term portfolio managers and the systemic changes in the marketplace that affect the entire industry. She explores the increasing automation in finance, driven by advancements in big data and machine learning, and its implications for portfolio optimization. Additionally, Aldridge delves into the challenges and opportunities presented by intraday volume data and proposes a step-by-step approach that integrates real-time risk identification using big data. She advocates for a more nuanced portfolio optimization strategy that incorporates microstructural factors and suggests the use of factors as a defensive measure. Aldridge also touches upon the three-year life cycle of quantitative strategies, the potential of virtual reality and automation in data analysis, and the application of a computer matrix in portfolio optimization.

Throughout her presentation, Aldridge challenges the misconception that high-frequency trading has no impact on long-term portfolio managers. She argues that systemic changes in the marketplace affect all investment strategies, regardless of their time horizon. Drawing on her expertise in electrical engineering, software development, risk management, and finance, Aldridge emphasizes the importance of exploring new areas such as real-time risk assessment and portfolio optimization.

Aldridge highlights the significant shift towards automation in the financial industry, noting that manual trading has given way to automated systems in equities, foreign exchange, fixed income, and commodities trading. To remain relevant, industry participants have embraced big data and machine learning techniques. However, she acknowledges the initial resistance from some traders who feared automation would render their expertise obsolete.

The speaker explores the evolution of big data and its role in portfolio optimization. She points out that the availability of vast amounts of structured and unstructured data has revolutionized the financial landscape. Aldridge explains how techniques like singular value decomposition (SVD) enable the processing of large datasets to extract valuable insights. SVD is increasingly used for automating portfolio allocation, with the aim of incorporating as much data as possible to inform investment decisions.

Aldridge delves into the process of reducing data dimensions using singular value decomposition. By plotting singular values derived through this process, researchers can identify the vectors that contain significant information while treating the remaining vectors as noise. This technique can be applied to various financial data sets, including market capitalization, beta, price, and intraday volatility. The resulting reduced dataset provides reliable guidance for research purposes and aids in identifying crucial factors for long-term portfolio optimization.

The speaker discusses the common factors employed by portfolio analysts, such as price, market risk (beta), market capitalization, and dividend yield. Institutional activity is also an important factor, and Aldridge highlights the use of big data to analyze tick data and detect patterns. Recognizing institutional activity provides visible signals to market participants, leading to increased volume and favorable execution.

Aldridge distinguishes between aggressive and passive HFT strategies and their impact on liquidity. Aggressive HFT strategies, characterized by order cancellations, can erode liquidity and contribute to risk, while passive HFT strategies, such as market-making, can reduce volatility by providing liquidity. She notes that the preference for volume-weighted average price by institutional investors and the use of time-weighted average prices in certain markets, such as foreign exchange, where volume information may not always be available.

The speaker addresses the challenges posed by intraday volume data, given the multitude of exchanges, shrinking time intervals, and the need to determine the best business and best offer among multiple exchanges. Despite these challenges, Aldridge sees significant opportunities for innovation and further research in slicing and analyzing intraday volume data. She mentions the Security Information Processor (SIP) run by the SEC, which aggregates limit orders from multiple exchanges, but acknowledges the ongoing challenge of reconciling and resolving issues across different exchanges.

Aldridge highlights the unexplored microstructural factors and risks in portfolio optimization. While long-term portfolio managers traditionally focus on risk-return characteristics and overlook microstructural factors, Aldridge suggests incorporating them as inputs and leveraging the wealth of data available. She proposes a step-by-step approach that involves using singular value decomposition to predict performance based on previous returns and utilizing big data to identify and address real-time risks. Algorithms can help identify and leverage complex intricacies in exchanges, such as pinging orders, that may go unnoticed by human traders.

In challenging the limitations of traditional portfolio optimization, Aldridge introduces a more comprehensive approach that integrates microstructural factors and other market dynamics. She highlights the disruptive potential of factors like ETFs and flash crashes and emphasizes that correlation matrices alone may not suffice for analyzing risk. By considering independent microstructural factors that go beyond broader market movements, Aldridge advocates for a nuanced portfolio optimization strategy that can enhance returns and improve Sharpe ratios. Further details on her approach can be found in her book, and she welcomes questions from the audience regarding high-frequency trading.

Aldridge further delves into the persistence of high-frequency trading within a day and its implications for long-term portfolio allocation. She illustrates this with the example of Google's intraday high-frequency trading volume, which exhibits stability within a certain range over time. Aldridge highlights the lower costs associated with high-frequency trading in higher-priced stocks and the lower percentage of high-frequency trading volume in penny stocks. Additionally, she notes that coding complexity often deters high-frequency traders from engaging with high-dividend stocks. Aggressive high-frequency trading strategies involve market orders or aggressive limit orders placed close to the market price.

The speaker explains the three-year life cycle of a quantitative strategy, shedding light on the challenges faced by quants in producing successful strategies. The first year typically involves bringing a successful strategy from a previous job and earning a good bonus. The second year is marked by attempts to innovate, but many struggle to develop a successful strategy during this period. In the third year, those who have found a successful strategy may earn a good bonus, while others may opt to leave and take their previous strategy to a new firm. This contributes to a concentration of similar high-frequency trading strategies, which may be tweaked or slightly modified and often execute trades around the same time. Aldridge emphasizes that high-frequency trading, like other forms of automation, is beneficial and should not be dismissed.

Aldridge concludes her presentation by discussing the potential of virtual reality and automation in data analysis. She touches on the usefulness of beta-based portfolios and factors, using the example of purchasing a pair of socks versus buying a Dell computer and how changes in beta affect their prices differently. The importance of normalizing returns and addressing randomness in business days is also highlighted. Aldridge suggests employing factors as a form of defense and emphasizes that using factors can be an enjoyable approach.

In one section, Aldridge explains the application of a computer matrix in determining the importance or coefficient for each stock in a portfolio. The matrix incorporates variance covariance and shrinking techniques to adjust returns and achieve a more precise outcome. By identifying patterns in previous days' returns, the matrix can predict future outcomes and optimize the portfolio. While the discussed toy model represents a basic example, it exemplifies the potential of using a computer matrix for long-term portfolio optimization.

In summary, Irene Aldridge's presentation provides valuable insights into the impact of high-frequency trading on long-term portfolio managers and the evolving landscape of the financial industry. She emphasizes the role of automation, big data, and machine learning in portfolio optimization. Aldridge discusses the challenges and opportunities presented by intraday volume data, advocates for incorporating microstructural factors, and proposes a step-by-step approach to real-time risk identification. Her ideas contribute to a more nuanced understanding of portfolio optimization and highlight the potential of virtual reality and automation for data analysis. Aldridge's comprehensive approach encourages portfolio managers to embrace technological advancements and leverage the vast amounts of data available to make informed investment decisions.

Furthermore, Aldridge emphasizes the importance of considering microstructural factors that often go unnoticed in traditional portfolio optimization. By incorporating factors such as ETFs and flash crashes into the analysis, portfolio managers can gain a more accurate understanding of market dynamics and associated risks. She challenges the notion that correlation matrices alone are sufficient for risk analysis and proposes a more sophisticated approach that takes into account independent microstructural factors. This approach has the potential to enhance portfolio returns and improve risk-adjusted performance.

Aldridge also sheds light on the intricate world of high-frequency trading. She discusses the distinction between aggressive and passive HFT strategies, highlighting their impact on market liquidity and volatility. While aggressive strategies involving order cancellations may erode liquidity and increase risk, passive strategies focused on limit orders and market-making can provide liquidity and reduce volatility. Understanding the dynamics of high-frequency trading and its implications on portfolio allocation is essential for long-term portfolio managers.

In addition, Aldridge discusses the challenges and opportunities associated with intraday volume data. With multiple exchanges and shrinking time intervals, effectively analyzing and interpreting this data can be complex. However, Aldridge sees this as an opportunity for innovation and further research. She mentions the Security Information Processor (SIP) operated by the SEC, which aggregates limit orders from various exchanges to determine the best business and best offer. However, she acknowledges that reconciling and resolving issues between different exchanges remains a challenge.

Aldridge's presentation also emphasizes the importance of using factors as a form of defense in portfolio optimization. By considering various factors beyond traditional risk-return characteristics, portfolio managers can gain deeper insights and improve their decision-making process. Factors such as market capitalization, beta, price, and intraday volatility can provide valuable information for optimizing long-term portfolios.

Lastly, Aldridge touches on the potential of virtual reality and automation in data analysis. These technological advancements offer new possibilities for analyzing complex financial data and gaining a deeper understanding of market dynamics. By harnessing the power of automation and leveraging virtual reality tools, portfolio managers can enhance their data analysis capabilities and make more informed investment decisions.

In conclusion, Irene Aldridge's discussion on the impact of high-frequency trading and the evolving financial landscape provides valuable insights for long-term portfolio managers. Her exploration of automation, big data, and machine learning highlights the transformative potential of these technologies in portfolio optimization. By incorporating microstructural factors, utilizing factors as a form of defense, and embracing technological advancements, portfolio managers can adapt to the changing market dynamics and unlock new opportunities for achieving optimal long-term portfolio performance.

  • 00:00:00 Irene Aldridge discusses the misconception that high-frequency trading does not impact long-term portfolio managers. While many managers claim that they can hold assets for a long time and thus avoid the impact of high-frequency trading, Aldridge argues that it actually does affect long-term portfolio managers. She explains how systemic changes in the marketplace, and how they impact everyone, can lead to implications for portfolio managers whether their investment strategy is long-term or short-term. Aldridge has a background in electrical engineering, software development, risk management, and finance, and her work includes exploring new areas such as real-time risk and portfolio optimization.

  • 00:05:00 In this section, the speaker discusses the shift towards automation in the financial industry, and how even a decade ago, most trading was done manually. However, now automation has become prevalent not only in equity trading, but also in foreign exchange, fixed income, and commodities trading. The goal of automation is to replace human trading, and those who remain relevant in the industry have embraced big data and machine learning to stay up-to-date. However, some traders were resistant to sharing their knowledge with computers, fearing that it would lead to immediate automation and their own obsolescence.

  • 00:10:00 Irene Aldridge talks about the evolution of big data and how it is being used in portfolio optimization. She notes that just a few years ago, most financial institutions did not have access to large amounts of data, but this has changed, and there are now databases of structured and unstructured data that can be processed in different ways to yield useful insights. One such method is the singular value decomposition (SVD), which reduces vast amounts of data into more manageable forms. Aldridge explains how SVD is being used to automate portfolio allocation, which is an industry that is on the brink of automation. Even though some firms still use researchers to analyze monthly data and make investment decisions based on that data, the trend is to incorporate as much data as possible to inform investment decisions.

  • 00:15:00 Irene Aldridge discusses the process of reducing data dimensions through singular value decomposition. By plotting the singular values extracted through this process, researchers can determine which vectors contain significant information and focus on keeping those vectors while considering the rest as noise. This technique can be applied to a variety of data sets, including financial data such as market capitalization, beta, price, and intraday volatility. The resulting reduced data set provides reliable guidance for research purposes and helps to identify important factors for long-term portfolio optimization.

  • 00:20:00 In this section, Irene Aldridge discusses the factors that are commonly used by portfolio analysts, such as price and market risk or beta. Market capitalization and dividend yield are also factors used in portfolio optimization that are included in the framework used by companies such as MSCI, Barra, and others. Aldridge explains how they estimate institutional activity using big data on tick data and looking for specific patterns in the data. Institutional activity is important because it is a visible signal to market participants, which can lead to the pouncing of other market participants causing the order to increase in volume and result in the favorable execution of the order.

  • 00:25:00 Irene Aldridge discusses the difference between aggressive and passive HFT strategies, which both impact liquidity. Aggressive HFT strategies can be alpha driven and involve a lot of order cancellations, which erodes liquidity and contributes to risk, while passive HFT strategies, which involve purely limit orders like market-making, can reduce volatility by providing more liquidity. Institutional investors prefer a volume-weighted average price, while time-weighted average prices are still used in some markets like foreign exchange where volume isn't always available. Overall, HFT is a complex topic that has both benefits and risks.

  • 00:30:00 In this section, Irene Aldridge discusses the structure of data columns and the challenges that come with intraday volume data, given the large number of exchanges, the shrinking time intervals of changes, and the issue of finding the best business and best offer among multiple exchanges. Despite the challenges, she believes that intraday volume data can be sliced and diced in many different ways and presents an opportunity for innovation and further research. She also mentions the Security Information Processor (SIP) run by the SEC that aggregates limit orders from multiple exchanges and determines the best business and best offer, but notes that reconciling and resolving issues between different exchanges is still a challenge.

  • 00:35:00 The speaker explains that while long-term portfolio managers are primarily concerned with risk-return characteristics and do not concern themselves with execution, there are many microstructures and risk factors that are completely unexplored that could be used as inputs, as well as a lot of data that could provide new information and insights. They propose a step-by-step approach that involves using singular value decomposition to predict performance based on previous returns, and leveraging big data to identify and address real-time risks. The speaker also notes that there are a lot of pinging orders and other complexities in exchanges that are not always obvious to human traders, but can be identified and leveraged using algorithms.

  • 00:40:00 In this section, Irene Aldridge discusses the limitations of traditional portfolio optimization for long-term investing and introduces a new approach that integrates microstructure and other market factors into the optimization process. She explains how factors such as ETFs and flash crashes can disrupt the market and how correlation matrices may not be sufficient for analyzing risk. By considering microstructural factors that are independent of larger market movements, Aldridge proposes a more nuanced approach to portfolio optimization that can improve returns and Sharpe ratios. She notes that her approach is covered in more detail in her book and takes questions from the audience about high-frequency trading.

  • 00:45:00 Irene Aldridge explains the persistence of high-frequency trading within a day and how it affects long-term portfolio allocation. She notes that while the intraday high-frequency trading volume may range from 0 to 100, over time, it has been pretty stable for Google, for example, with a range of 36-42%. This stability persists for other stocks as well. High-frequency trading has a lower cost when trading higher-priced stocks, and there is a lower percentage of high-frequency trading volume for penny stocks. Additionally, high-frequency traders tend to avoid high-dividend stocks due to coding complexity. Aggressive high-frequency trading is the one that uses market orders or aggressive limit orders close to the market price.

  • 00:50:00 Irene Aldridge explains the three-year life cycle of a quantitative strategy, where in the first year, the quant brings a successful strategy from their previous job and earns a good bonus, in the second year they try to innovate but many people struggle to produce a successful strategy, and in the third year, if they found something good, they may earn a good bonus, otherwise they leave and take their previous strategy to a new shop. This contributes to the concentration of similar high frequency trading strategies, which could be tweaked or slightly modified, and often execute almost at the same time. Aldridge believes that high frequency trading is good and not excusable as it is automation, just like robots that clean floors or a home automation system that controls heating and cooling.

  • 00:55:00 Irene Aldridge, President and Managing Director of Able Alpha Trading, discusses the potential of virtual reality and automation for data analysis. She also touches on the usefulness of beta-based portfolios and factors, citing the example of buying a pair of socks versus buying a Dell computer and how changes in beta affect their prices differently. She emphasizes the importance of normalizing returns and addresses the issue of randomness in business days. Lastly, Aldridge covers the use of factors as a form of defense and suggests that using factors can be fun.

  • 01:00:00 In this section, Aldridge discusses the use of a computer matrix to determine the importance or coefficient for each stock in a portfolio. The rows of the matrix represent each stock, with the first row representing apples and the other rows being market data for different stocks. By incorporating variance covariance and shrinking, the matrix can incorporate the return and make tweaks to reach a more specific outcome. This is done by finding X-mas in the previous days' return and predicting from there. While the toy model described is just a basic example, it showcases how a computer matrix can be used to optimize a portfolio.
 

Basics of Quantitative Trading



Basics of Quantitative Trading

In this video on the basics of quantitative trading, algorithmic trader Shaun Overton discusses the challenges and opportunities involved in algorithmic trading. Overton explains that data collection, analysis, and trading are the three simple problems involved in algorithmic trading, though the process can get complicated due to finding high-quality data and proper analysis. It can be challenging to select the right platform with good data and features to meet the trader's goals, with the most popular platforms being MetaTrader, NinjaTrader, and TradeStation, depending on the trading type one prefers. Overton also discusses the harsh reality of how easy it is to blow up accounts when trading in the live market, and how important it is to manage risk. Additionally, he explains how quantitative traders can predict overextended moves in the market and discusses the impact of currency wars.

The "Basics of Quantitative Trading" video on YouTube covers various strategies for algorithmic trading, including sentiment analysis and long-term strategies based on chart lines; however, the biggest returns are made during big tail events and trends. Attendees of the video discuss different platforms for backtesting, challenges of integrating multiple platforms for trading analysis, and the increasing interest in formalizing and automating trading strategies. Some long-term traders seek automation as they have been in the game for a long time, and NinjaTrader for programming languages is recommended but has limitations.

  • 00:00:00 Algorithmic trader Shaun Overton explains the three simple problems involved in algorithmic trading: data collection, analysis, and trading. However, the process can become complicated due to obstacles such as finding high-quality data and proper analysis, especially as trading requires careful examination of data. Trading using free options is not recommended as they may contain duplicates or gaps in the data. Additionally, using paid options is out of the retail trader's league as it can cost thousands of dollars per instrument. Nonetheless, trading can be simplified by using platforms that offer software and broker APIs.

  • 00:05:00 The speaker discusses the different software options available for analyzing data and placing trades. The most popular platforms for forex trading are MetaTrader, NinjaTrader, and TradeStation, depending on the type of trading one prefers. MetaTrader is overwhelmingly the most popular, and there are more than a thousand brokers around the world that offer it. The speaker explains that using a pre-built platform like these options makes trading and analyzing data more straightforward and avoids the need to recode analysis multiple times when it comes time to trade. The speaker also goes over the different programming languages used by each platform.

  • 00:10:00 The speaker discusses different platforms for quantitative trading and explains how Multicharts has become popular by copying TradeStation's platform and language. However, there are differences between the languages and it is not always completely compatible. The speaker also talks about the importance of data in quantitative trading and the challenges that come with each platform. He notes that MetaTrader is simple to use but not sophisticated enough for more complex analysis, and the data provided is often of poor quality. Overall, the speaker highlights the importance of carefully selecting a platform with good data and features that meet the trader's goals.

  • 00:15:00 Shaun Overton discusses the challenges of collecting and storing data for quantitative trading strategies. He explains the difficulties in trying to store years' worth of testing data and the limitations that brokers place on obtaining data due to server limitations. He notes that while MetaTrader offers free data, it is not high quality data, while NinjaTrader provides good data but has a steep learning curve to set up. He also warns about the dangers of programming strategies specific to a certain broker as it marries the trader to that particular broker, making it difficult to switch if they are unsatisfied. He lists reasons traders might be upset with a broker, including bad service and bad execution.

  • 00:20:00 Shaun Overton explains some of the issues and games that brokers play to make money off of traders and their trades. Brokers can manipulate market pricing and trades to force traders to pay more for their trades by showing one price and then making traders accept a worse price. Additionally, a trader can receive bad execution from poor latency or software failure. Currently, the biggest issue with algorithmic trading is institutionalized corruption and how institutions can steal money from traders due to technological accidents, as well as Dark Pools and other trading venues that have their own rules in place to manipulate trades.

  • 00:25:00 The speaker discusses the limitations of broker-specific platforms for quantitative trading. While they may be efficient for extremely simple strategies, they have limitations and cannot support anything more sophisticated. The speaker recommends stable platforms like NinjaTrader and MultiCharts, which have good research quality and allow for custom programming and GUI adjustments. However, the speaker warns that these platforms are not suitable for managing portfolios or running funds as they lack the ability to talk to multiple charts and require a lot of manual labor.

  • 00:30:00 Shaun Overton discusses the harsh reality of how easy it is to blow up accounts when trading in the live market, which 90-95% of accounts are closed within 6 months or a full year. There are 2 ways brokers make money, by commissions or risk, and frequently the more popular and lucrative way is by taking on trading losses. Regular traders make money when volatility is low, but when it's high, they get decimated. Risk management is talked about, but for most people, it's just hot air, and they continue to lose money by not managing their risk.

  • 00:35:00 Shaun discusses how volatility affects quantitative trading strategies and how retail traders tend to be wrong in their market predictions. He explains how the ratio of long vs. short positions can be tracked by brokers with access to client accounts and how this information can be used to predict overextended moves. Overton notes that this information is becoming more widely available, with websites like MyFxBook and OANDA publishing data on market positioning. However, he cautions that while this information can be a gold mine for brokers, it may not provide steady cash flow and may result in periods of large losses.

  • 00:40:00 Shaun Overton discusses the potential for quantitative traders to look into client funds of major banks to devise long and short strategies based on the percentage of trades going in a certain direction. He also comments on the skepticism of retail investors participating in the stock market, particularly in light of recent negative news, leading to a withdrawal of billions of dollars since the last crash. Overton also mentions a recent news story on CNBC regarding big fund managers and their impact on the shares of big companies, demonstrating the power of institutional money in moving the market.

  • 00:45:00 It is discussed how institutional trading, especially in forex, may not be as influential in the market as retail trading due to the average account size of traders. However, larger evaluations and greater amounts of money traded lead to more people messing with prices, and even small events such as drunk trading could have an impact on the market. The main driver of currencies is interest rates, and it is a currency war where everyone wants a zero interest rate, making it harder to determine which country's currency is the weakest. Lastly, Japan's currency pair, Dollar Yen, is analyzed in terms of its history and how its prices going down could be related to the dollar weakening and the yen strengthening.

  • 00:50:00 Shaun Overton discusses the impact of currency wars on exporters. He explains how exporters such as Toyota are heavily impacted when the value of the currency in which they operate increases in value. Overton states that there is currently a currency war among major currencies, where countries are trying to devalue themselves, with everyone competing to be zero. Therefore, traders need to be speculating on who is going to do the worst job at destroying a currency, as they will be the best in this environment. Overton feels that the Dollar is currently a disaster, but the best disaster so far. Country-specific social risks and events, such as September 11th and the Fukushima disaster, can also impact currency prices.

  • 00:55:00 Speakers discussed trading in thin markets and exotic currencies. It was mentioned that for algorithmic trading, you need liquidity and a thin spread, which makes it difficult to trade in less popular currencies like the South African Rand or Turkish Lira. Furthermore, the spread of these currencies can be 8 or 9 times more than it costs to trade the Euro against the Dollar, making it challenging to make a profit. Regarding strategies for those with less than 50k in their accounts, speakers mention the importance of focusing on things like the Commitments of Traders report in futures markets to gain insights into market positions.

  • 01:00:00 A group discusses various strategies for algorithmic trading, including sentiment analysis and a simple long-term strategy based on chart lines. The challenge with trading is understanding the distribution of returns since most of the time it is just noise. However, the biggest returns are made during big tail events and trends. Therefore, the best strategies do not consistently make money but grab opportunities when they are there. Despite the desire for signals and action, it is best to let the market do what it is going to do. Quantopian, a program that analyzes market data, is also mentioned.

  • 01:05:00 In this section, attendees of the "Basics of Quantitative Trading" YouTube video discuss different platforms they use for backtesting and optimization, as well as the challenges of integrating multiple platforms for trading analysis and strategy development. While some attendees note that Quantopian provides a platform for individual analysis and is negotiating contracts with brokers to potentially solve platform integration challenges, others discuss the limitations of platforms like NinjaTrader and the difficulties of integrating them with other platforms, with some highlighting the fact that they are better suited for manual trading or as simple backtesting tools. Additionally, Shaun Overton notes that his business is built around formalizing and automating traders' own strategies, with attendees noting that both individual traders and markets are showing increasing interest in formalizing and automating their trading strategies.

  • 01:10:00 Traders attending a quantitative trading seminar ask about the benefits of automating certain trading strategies. Shaun Overton, the speaker, notes that some traders who have been in the game for 10, 20, or even 30 years simply want to automate their strategies so they no longer have to monitor them all day. When discussing trading-specific programming languages, Overton endorses NinjaTrader because it runs on C Sharp, but notes that there are limitations to what can be done within it.
Basics of Quantitative Trading
Basics of Quantitative Trading
  • 2013.02.26
  • www.youtube.com
http://www.onestepremoved.com/ Shaun Overton speaks to the meetup group Dallas Algorithmic Traders about quantitative trading. Most members of the audience h...
 

What is a quant trader?



What is a quant trader?

"What is a quant trader?" is a video where Michael Halls-Moore delves into the world of quant trading, explaining how math and statistics are used to develop trading strategies and analyze market inefficiencies. While quant funds primarily focus on short-term strategies, the speaker highlights that low-frequency and automated approaches are also utilized. Institutional traders prioritize risk management, while retail traders are driven by profits. Effective market regime detection is crucial but challenging due to random events in the market. It is advised for quant traders not to rely solely on a single model but to constantly research and test new ones to account for known and unknown market dynamics. Despite the risks involved, successful quant traders can achieve an impressive 35% annual return on fees.

In the video, Michael Halls-Moore provides an insightful perspective on the concept of a "quant trader." He explains that quant traders employ mathematical and statistical techniques in the field of finance, utilizing computational and statistical methods. Their work encompasses a broad range of activities, from programming trading structures to conducting in-depth research and developing robust trading strategies. While buying and selling rules play a role, they are not the sole focus, as quant traders operate within a larger system where signal generators are just one component.

Quant funds typically engage in high-frequency trading and strive to optimize technology and microstructures within market assets. The timeframes involved in quant trading can vary greatly, ranging from microseconds to weeks. Retail traders have a significant opportunity in adopting higher-frequency style strategies.

Contrary to popular belief, quant trading is not solely focused on high-frequency trading and arbitrage. It also incorporates low-frequency and automated strategies. However, due to their scientific approach of capitalizing on physical inefficiencies in the system, quant funds predominantly concentrate on short-term strategies. The speaker emphasizes the importance of having a blend of scientific and trading backgrounds to thrive in the field of quant trading.

A notable distinction between retail and institutional traders lies in their approach to risk management. Retail traders are primarily driven by profit motives, whereas institutional traders prioritize risk management, even if it means sacrificing potential returns. Institutional traders adopt a risk-first mentality and emphasize due diligence, stress testing, and implementing downside insurance policies to mitigate risks effectively.

Risk management involves various techniques, such as adjusting leverage based on account equity using mathematical frameworks like the Kelly criterion. More conservative traders opt for reducing drawdowns to achieve a controlled growth rate. Leading risk indicators like the VIX are utilized to gauge future volatility. In these trades, the risk management system holds more significance than the entry system. While stop losses are employed in trend following, mean reversion strategies call for reevaluating and exploring different scenarios and historical data for drawdown planning. Prior to implementing trading algorithms, backtesting phases are conducted to manage risk factors effectively.

The video delves into the significance of filtering out trading strategies and using backtesting as a tool to filter them rather than directly putting them into production. It highlights the importance of expecting worse drawdowns during the walk forward and utilizing filtration mechanisms to determine the suitability of a strategy for implementation. The conversation then delves into Nassim Nicholas Taleb's belief in fat tails and explores how machine learning technology can be employed to apply range trading and trend trading strategies, enabling market regime detection.

Effective market regime detection is a critical aspect of quantitative finance. However, it poses challenges due to its reliance on random events, such as interest rate drops and market trends. More sophisticated firms track fundamental data and incorporate it into their models to enhance market regime detection. When trading, the selection of stocks or ETFs depends on the specific market, and choosing the right assets can be a complex task. The speaker emphasizes that a combination of mathematical models and market fundamentals is crucial for effective defense against Black Swan events, as previous periods of high volatility can provide insights into predicting future volatility and market changes.

The video further explores the potential returns and risks associated with quant trading. Quant traders have the potential to earn an impressive 35% annual return on fees, especially when coupled with a solid educational background, such as a PhD, and an efficient management process. However, high-frequency quants may face challenges when changes occur in the underlying hardware or exchange, potentially leading to system crashes.

Despite the risks involved, achieving a consistent return of 15% to 20% by exploiting profitable opportunities in the long term is considered favorable. Quant traders do not rely on a single magic algorithm or panic when faced with problems. Instead, they delve into statistical properties that may be complex to analyze but prepare in advance to navigate potential challenges.

The video emphasizes the importance of avoiding overreliance on a single model in quantitative trading. Models cannot accurately predict all future events, as evidenced by historical Wall Street crashes and investment failures resulting from model shortcomings. It is essential for quant traders to continually research and test new models, evaluating their performance. Drawdown periods are an inherent part of the trading journey, and traders must be prepared to navigate them.

In conclusion, while some traders may become overly focused on micromanaging their models, it is vital to understand if a model accounts for all market dynamics, including the unknown unknowns. Quant traders should adopt a multidimensional approach, combining mathematical models with market fundamentals to gain a comprehensive understanding of market behavior. By constantly refining and diversifying their strategies, quant traders can increase their chances of success in an ever-evolving financial landscape.

  • 00:00:00 In this section, Michael Halls-Moore explains the meaning of "quant trader", which is someone who uses mathematics or statistics in finance in a computational and statistical manner. This can range from programming trading structures to researching hardcore trading and developing a strategy. The importance of buying and selling rules is not as significant as other aspects, and signal generators are just a part of a larger system. Quant funds usually deal with higher frequency trading and focus on optimizing technology and microstructures within market assets. The typical timeframe for quant traders ranges from microseconds to weeks, and the retail trader's biggest opportunity lies in higher frequency style strategies.

  • 00:05:00 In this section, we learn that quant trading is not all about high frequency trading and arbitrage, as it also includes low frequency and automated strategies. However, quant funds generally focus on short-term strategies due to their scientific approach of exploiting physical inefficiencies in the system. The speaker believes that having a mix of both a science and trading background is crucial to succeed in quant trading. When it comes to risk management, he notes a cultural difference between retail and institutional trading, where the latter has a risk-first mentality and emphasizes due diligence, stress testing, and downside insurance policies.

  • 00:10:00 In this section, the video discusses the different approaches used by retail and institutional traders regarding risk management. While retail traders are primarily profit-driven, institutional traders focus on risk management, even if the potential returns are only a fraction of what's possible. The video mentions the Kelly criterion as a mathematical means of adjusting leverage based on account equity, with more conservative traders opting for a slide where they reduce their drawdown to achieve a more controlled growth rate. Additionally, leading risk indicators like the VIX are used to see the future volatility. The risk management system is more important than the entry system in these trades, with stop losses being utilized in trend following, but not in mean reversion, where traders re-think and explore different scenarios and histories for drawdown planning. Before starting trading algos, backtest phases are conducted to manage risk factors.

  • 00:15:00 In this section, the interviewer and quant trader discuss the importance of filtering out trading strategies and how to use backtesting as a means to filter out strategies rather than as a means of putting them into production. They highlight the importance of expecting worse drawdowns during the walk forward and using a filtration mechanism to determine whether or not a strategy is suitable for implementation. The conversation then turns to Taleb’s belief in fat tails and how to apply range trading and trend trading strategies in the future using machine learning technology to determine market regime shifts.

  • 00:20:00 In this section, the speaker emphasizes the importance of effective market regime detection in quant finance. The problem is, it's difficult to keep this in mind because it relies on purely random events, such as interest rates drop and trends in the market. While detecting market regimes is tricky, the more sophisticated firms will be tracking fundamental data and incorporating it into their models. When trading, depending on what one is trading, there are different numbers of stocks or ETFs to choose from, and selecting the right one can be tricky. Additionally, the speaker believes that Black Swan defense depends on the mix of math models and market fundamentals since prior horrific volatility can enable one to predict future volatility and market changes.

  • 00:25:00 In this section, the video explains the returns that quant traders can expect and the risks it brings them. A quant trader can earn a 35% annual return on fees, with the help of a PhD and an efficient management process. However, high frequency quants may suffer due to changes in the underlying hardware or exchange, causing their system to crash. Despite these risks, returning 15 to 20% in exploiting something that is possible to do so in the long term is a good return. Quantum traders neither have a single magic algorithm nor panic when they face problems. They are expected to go through some statistical properties that are painful to analyze and prepare for in advance.

  • 00:30:00 In this section, the speaker discusses how relying too much on a single model is not advisable in quantitative trading since a model cannot predict all future events accurately. He cites examples of classic Wall Street crashes and investment failures primarily due to model shortcomings. The speaker emphasizes the importance of continually researching new models and checking for their performance; however, experiencing drawdown periods will always happen. In conclusion, while some traders may reach the point of micromanaging their models, it's essential to understand if the model accounts for all the market dynamics or the unknown unknowns.
What is a quant trader?
What is a quant trader?
  • 2013.12.02
  • www.youtube.com
http://www.onestepremoved.com/ Shaun Overton interviews Michael Halls-Moore, a quantitative developer. Mike jumped from postgraduate school straight into alg...
 

PyCon Canada 2015 - Karen Rubin: Building a Quantitative Trading Strategy (Keynote)



PyCon Canada 2015 - Karen Rubin: Building a Quantitative Trading Strategy (Keynote)

Continuing the discussion, Karen Rubin delves into the findings and insights from her study on female CEOs in the Fortune 1000 companies. The analysis reveals that female CEOs yield a return of 68%, while male CEOs generate a return of 47%. However, Karen emphasizes that her data does not yet demonstrate that female CEOs outperform their male counterparts. She considers this study as an intriguing concept within high-revenue and high-market capitalization companies.

Motivated by her findings, Karen emphasizes the importance of diversity in the finance and technology industry. She encourages more women to join the field and participate in shaping investment strategies. She believes that incorporating ideas such as investing in female CEOs can contribute to the creation of a diverse and inclusive fund.

Expanding the discussion, Karen touches upon other factors that may influence the success of CEOs, including their gender, the method of hiring (internal or external), and even their birth month. She acknowledges the theory that companies may appoint female CEOs when the organization is performing poorly, and subsequently replace them with male CEOs to reap the benefits of restructuring. However, Karen has not been able to arbitrage this theory thus far. Additionally, she notes that stock prices often experience a decline after a CEO announcement, although she remains uncertain if this trend differs between women and men CEOs.

In conclusion, Karen highlights that building a quantitative trading strategy for CEOs involves considering various factors and conducting thorough analysis. While her study provides valuable insights into the performance of female CEOs, she emphasizes the need for further research and exploration to gain a more comprehensive understanding of gender dynamics in executive leadership and its impact on investment outcomes.

  • 00:00:00 In this section, the speaker introduces herself and her experience with writing an algorithm to invest in the market. As a VP of Product at Quantiacs, a crowdsourced hedge fund, she needed to write an algorithm to understand what her users were doing so she could build effective software for them. She became interested in investing in female CEOs after reading the Credit Suisse gender report and wondered if she could build a strategy that looked at female CEOs historically and sold when they were no longer the CEO.

  • 00:05:00 In this section, Karen Rubin talks about the initial steps she took in building a quantitative trading strategy. She needed to get a historical list of all the female CEOs within a specific time period in order to create a simulation of what happened over time. Karen explains that getting and cleansing the data took up a significant amount of her time during the early stages of the project, as she had to manually search for and analyze each CEO's start and end dates and the corresponding ticker symbols. She also talks about the challenges of ensuring that the pricing data was accurate and cleansed before analyzing it. Despite the small sample size, Karen continued to push forward with her study.

  • 00:10:00 In this section, Karen Rubin explains the process of her back test in algorithmic trading and how simulation works in her strategy. She simulates her strategy as if she were trading in the actual market by going through historical data and making buy and sell decisions for her based on her list of female CEOs. She compares her first version of the algorithm to a benchmark which is the S&P 500. However, she later rewrites her strategy with her resident quant's assistance due to her failure to consider leverage in her previous strategy.

  • 00:15:00 In this section of the video, Karen Rubin discusses how she rebalanced her trading strategy to ensure an equal weighted portfolio across all companies. Her algorithm buys and sells companies and calculates the value of her portfolio to ensure she's not losing money or having to borrow money to make buys in the future. She also discusses feedback she received from the Reddit and Hacker News community who questioned whether her strategy was reliant on Yahoo and Alibaba's stock prices. She removed Yahoo from her strategy to test this theory and found that while it impacted overall returns, it was not the sole cause of those returns.

  • 00:20:00 In this section, the speaker discusses how to avoid sector bias by creating a sector-neutral portfolio. By dividing the portfolio amount by the number of sectors, every company within that sector gets an equal amount of investment. For example, if Healthcare has three companies, their allocation would be divided into thirds, whereas Consumer Cyclical, which has roughly 20 companies, each would get one-twentieth of the total amount allocated to the sector. The resulting returns of the speaker's strategy are 275 percent, while the equal-weighted benchmark returns 251 percent and the S&P 500 returns 122 percent. While some argue benchmarks like the S&P 500 aren't philosophically accurate as its companies aren't equally weighted, the RSP Guggenheim Equal Weight S&P 500 index provides a better benchmark for comparison purposes.

  • 00:25:00 In this section, Karen Rubin discusses the challenges of finding the right benchmark when investing in female CEOs. She highlights that while the Fortune 1000 benchmark seems like the right choice, purchasing its historical constituents list is costly. Instead, she creates the Quanto 1000, a new benchmark by ranking all companies by revenue and selecting the top 1000. Comparing her algorithm's returns to the Quanto 1000 and the S&P 500, she finds that the algorithm outperformed the other two benchmarks with a 43% difference. She also explores a new dataset from Event Fessor on CEO changes, which allows her to create comparative strategies between male and female CEOs. The results show that a strategy investing in female CEOs on the date they enter their position and stopping on the date they leave returned 28% over a seven-year period, compared to male CEOs, which returned 44%.

  • 00:30:00 In this section, Karen details the results of her study on female CEOs in the Fortune 1000 companies. The analysis showed that female CEOs are returning 68% while male CEOs return 47%. However, Karen believes that her data doesn't show that female CEOs are outperforming their male counterparts yet. She thinks that this study provides an interesting idea about female CEOs in high-revenue and high-market cap companies. Karen wants to encourage diversity in the finance and technology industry and invites more women to join the field. She believes in the importance of bringing ideas like investing in female CEOs to create a diverse fund.

  • 00:35:00 In this section, the speaker discusses various factors that can influence the success of CEOs, including their gender, internal or external hiring, and birth month. She also addresses the theory that companies will bring in female CEOs when they are performing poorly and then replace them with male CEOs to reap the benefits of restructuring. However, she has not been able to arbitrage this theory. Additionally, she notes that stock prices often go down after a CEO announcement, but she is unsure whether this trend is different for women CEOs versus men CEOs. Overall, there are many factors to consider when building a quantitative trading strategy for CEOs.
PyCon Canada 2015 - Karen Rubin: Building a Quantitative Trading Strategy (Keynote)
PyCon Canada 2015 - Karen Rubin: Building a Quantitative Trading Strategy (Keynote)
  • 2015.11.24
  • www.youtube.com
PyCon Canada 2015: https://2015.pycon.ca/en/schedule/67/Talk Description:Upon joining Quantopian, in order to understand her users better, Karen Rubin embark...
 

Machine Learning for Quantitative Trading Webinar with Dr. Ernie Chan



Machine Learning for Quantitative Trading Webinar with Dr. Ernie Chan

Dr. Ernie Chan, a prominent figure in the finance industry, shares his insights and experiences with machine learning in trading. He begins by reflecting on his early attempts at applying machine learning to trading and acknowledges that it didn't initially yield successful results. Dr. Chan emphasizes the importance of understanding the limitations of machine learning in trading, particularly in futures and index trading, where data may be insufficient.

However, he highlights the potential of machine learning in generating profitable trading strategies when applied to individual tech stocks, order book data, fundamental data, or non-traditional data sources like news. To address the limitations of data availability and data snooping bias, Dr. Chan suggests utilizing resampling techniques such as oversampling or bagging. These techniques can help expand the data set, but it's crucial to preserve the serial autocorrelation in time series data when using them for trading strategies.

Feature selection plays a vital role in successful machine learning applications in trading. Dr. Chan stresses the importance of reducing data sampling bias by selecting relevant features or predictors. He explains that while many people believe that having more features is better, in trading, a feature-rich data set can lead to spurious autocorrelation and poor results. He discusses three feature selection algorithms: forward feature selection, classification and regression trees (CART), and random forest, which help identify the most predictive variables.

Dr. Chan delves into the support vector machines (SVM) classification algorithm, which aims to predict future one-day returns and their positive or negative nature. SVM finds a hyperplane to separate data points and may require nonlinear transformations for effective separation. He also touches on other machine learning approaches, such as neural networks, but highlights their limitations in capturing relevant features and their unsuitability for trading due to the non-stationary nature of financial markets.

The webinar also emphasizes the importance of a customized target function in a trading strategy. Dr. Chan recommends techniques like stepwise regression, decision trees, and set-wise regression to develop predictive models. He underscores the significance of reducing the square root of the number of trades to achieve high accuracy in protecting returns. The Sharpe ratio is presented as an effective benchmark for evaluating strategy effectiveness, with a ratio of two or greater considered favorable.

Dr. Chan provides valuable insights into the application of machine learning in the finance industry, highlighting its potential in certain areas while cautioning against its limitations. He emphasizes the importance of feature selection, data resampling, and selecting an appropriate target function for successful machine learning applications in quantitative trading.

  • 00:00:00 In this section, Dr. Ernie Chan shares his background and experiences with machine learning in the finance industry. He discusses how he did not succeed in applying machine learning to trading despite his expertise in the field and working for well-known firms. Dr. Chan shares that the goal of the talk is to explain the pitfalls of machine learning and why it doesn't work in trading, as well as how it can work in trading. He notes that when he first started using machine learning in trading, he made the mistake of thinking it would work on daily bars and using technical indicators as inputs, which ultimately did not yield successful results.

  • 00:05:00 In this section, Dr. Ernie Chan discusses the limitations of using machine learning algorithms on futures and index trading due to insufficient data and the risk of data snooping bias. He believes that machine learning has more potential in generating profitable trading strategies when applied to individual tech stocks, order book data, fundamental data, or non-traditional data such as news. To overcome the limitation of insufficient data and data snooping bias, Dr. Chan suggests using resampling techniques, such as oversampling or bagging. While resampling can expand the data set, careful consideration must be given to preserve the serial autocorrelation in time series data when using these techniques for trading strategies.

  • 00:10:00 In this section, Dr. Chan discusses the use of trigrams in machine learning, which allows for the use of multiple days as input instead of just a single day to preserve autocorrelation. He also emphasizes the importance of reducing data sampling bias, which can be accomplished by reducing the number of features or predictors. While many people think that having more features is better, that is not the case in trading since a feature-rich data set is a curse due to spurious autocorrelation between features and the target. Therefore, feature selection is critical, and machine learning algorithms that support feature selection are ideal for trading. Dr. Chan highlights three such algorithms, including stepwise regression, random forests, and LASSO regression. He cautions that neural network and deep learning algorithms, which do not select features but take everything and mix them together, are not ideal for trading.

  • 00:15:00 In this section, Dr. Ernie Chan discusses three different feature selection algorithms: forward feature selection, classification and regression trees (CART), and random forest. Forward feature selection involves adding features to linear regression models one at a time until the algorithm identifies which ones improve predictability. On the other hand, CART is similar to a decision tree and operates hierarchically with conditions imposed at each iteration for classification purposes. Random forest is a technique that can be applied to different classification algorithms by combining bagging with random subspace, which involves oversampling data and undersampling predictors to achieve a balance between data and features. Dr. Chan provides an example table with hypothetical features to predict tomorrow's return in order to better explain the concept.

  • 00:20:00 In this section, Dr. Ernie Chan discusses the process of reducing the feature set using classification algorithms such as classification regression trees. He explains that there are many techniques for this, such as under-sampling or using mutual information. However, he states that these techniques are the most simple and well-known. Using a representative sample from the data, he demonstrates how the algorithm works by identifying which technical indicators are useful in predicting future returns and which values of these indicators will generate positive or negative returns. Once a subset of data is classified, the process is reiterated to identify other variables for better classification.

  • 00:25:00 In this section, Dr. Ernie Chan explains that machine learning algorithms work by finding predictive variables and parameters useful for the classifier, and iterating until no statistical significance is found. Machine learning algorithms are often statistical regression systems with more details and conditions on the data. He goes on to discuss the support vector machines classification algorithm, which aims to predict future one-day returns and whether they will be positive or negative. The algorithm tries to find a hyperplane to cut through the data, but often, a nonlinear transformation is required to find a separation. This transformation is critical to making the support vector machine work effectively.

  • 00:30:00 In this section, Dr. Chan discusses the need to resample data if there is not enough data for the machine learning algorithm to learn, although the amount necessary is relative to how many predictors there are. He outlines how Support Vector Machines are a way of classifying data, and although less of a feature selection algorithm than Stepwise Regression or Classification Tree, the SVM finds a hyperplane that can cut through any dimension. He notes that Neural Networks are a non-linear equation and they fit the data with a monster long linear function instead of using a linear function as in Regression and that Deep Learning is simply a Neural Network with many layers but very few nodes per layer making it easier to capture features in stages.

  • 00:35:00 In this section, Dr. Ernie Chan discusses the concept of using a neural network for quantitative trading. He explains that a neural network is a powerful tool because it can approximate any nonlinear function and is capable of predicting tomorrow's return given today's variables. However, he also notes that the neural network doesn't work well in trading because the financial markets are not stationary, and it's difficult to capture relevant features using this approach. He emphasizes that the neural network uses all the inputs and doesn't select features, making it challenging to find variables that have a causal effect on the market.

  • 00:40:00 In this section, Dr. Ernie Chan explains when machine learning is useful to traders. Machine learning is helpful when traders lack intuition about their data or the market, or if they do not have a mathematical model of their data. Additionally, machine learning can help traders develop intuition when there are too many features or when they do not know which features are important. However, if traders have good intuition and a simple mathematical model, they are better off building simple models rather than using machine learning. Traders should also be careful when using machine learning if they have too little data or if there have been regime changes to their market because a poor market model can lead to algorithms that fall apart when there is a regime change.

  • 00:45:00 In this section of the webinar, Dr. Ernie Chan explains the importance of using stationary data when applying machine learning techniques in quantitative trading. He notes that many statistical and technical tests can be used to determine the stationarity of a data set, but the results can often be ambiguous. Dr. Chan also discusses his skepticism towards the effectiveness of reinforcement learning and deep learning in trading due to the lack of successful out-of-sample replication. Additionally, he emphasizes the need for a customized target function in a trading strategy and suggests the use of techniques such as set-wise regression or decision trees for predictive modeling.

  • 00:50:00 In this section, Dr. Ernie Chan discusses the target function selection in machine learning for quantitative trading and explains that the availability of data determines the selection of the target function. If the target function is a one-month return, daily returns become the input, and the selection of the target variable must match the predictor variable's time scale. Dr. Chan also explains the difference between Adam and deep learning methods, stating that deep learning does less good in feature selection. Furthermore, the section dives into defining different regimes and how one can define them based on their preferred criteria. Lastly, Dr. Chan emphasizes that the number of trades in quantitative trading is not the determining factor for success.

  • 00:55:00 In this section, Dr. Ernie Chan discusses how reducing the square root of n to a manageable number is crucial in achieving high accuracy in protecting return. He explains that the error is proportional to the square root of the number of trades, and the Sharpe ratio is an excellent measure of statistical significance as it incorporates this concept in its construction. A strategy with a Sharpe ratio of two or greater is considered to work effectively. Although the last question mentioned by Christophe may be too technical, Dr. Chan believes that the Sharpe ratio is a good benchmark for strategy effectiveness.
Machine Learning for Quantitative Trading Webinar with Dr. Ernie Chan
Machine Learning for Quantitative Trading Webinar with Dr. Ernie Chan
  • 2017.03.28
  • www.youtube.com
Quantitative trading and algorithmic trading expert Dr. Ernie Chan teaches you machine learning in quantitative finance. You will learn:1) The pros and cons ...