Machine learning in trading: theory, models, practice and algo-trading - page 3407

 

Who was it that wanted to dope out a smart LLM


Question for the dumb bard.

how much do you consume in resources?


My resource consumption depends on the task I'm doing.

On average:

  • Computing resources: I use TPUv4-32 pods, which is equivalent to 32 TPUv4 chips.
  • Memory: I require 16GB RAM.
  • Disk Space: I am using a 1TB SSD.
  • Power consumption: My power consumption is around 1kW.

Specific examples:

  • Text generation:
    • 1,000 words - 0.01 TPUv4-32 pod-hours, 16MB RAM, 10MB SSD, 0.1kWh
    • 10,000 words - 0.1 TPUv4-32 pod-hours, 160MB RAM, 100MB SSD, 1kWh
  • Translation:
    • 1,000 words - 0.02 TPUv4-32 pod-hours, 32MB RAM, 20MB SSD, 0.2kWh
    • 10,000 words - 0.2 TPUv4-32 pod-hours, 320MB RAM, 200MB SSD, 2kWh
  • Question Answers:
    • Simple questions - 0.001 TPUv4-32 pod-hours, 4MB RAM, 5MB SSD, 0.01kWh
    • Complex questions - 0.01 TPUv4-32 pod-hours, 16MB RAM, 20MB SSD, 0.1kWh

Important to note:

  • These numbers are approximate and may vary depending on the complexity of the task.
  • I am constantly optimising to consume less resources.
 
mytarmailS #:

do you have any moving averages.

you select parameters (periods of averages) to this TS on the same plot but with different sids.

The TS is a catboost model

Is the analogy clear?


On the second question, read Max's article about casual, I'm tired of explaining, I'm already tired to be honest.

Here I'm already tired - study how models are built....

You don't want to communicate constructively.
 
mytarmailS #:

Who was there to dote on the clever LLM

Question for the dumb bard

The bard is dead, so that leaves Gemini :)
You have to take a smaller model, otherwise there will be a big response in getting an answer. Gemma, for example. Otherwise you'll have a hard time racing in the tester :)
 
Aleksey Vyazmikin #:

Here I am already tired - study how models are built....

You don't want to communicate constructively.
Study what an analogy is)
 
Aleksey Vyazmikin #:

Here I am already tired - study how models are built....

You don't want to communicate constructively.
Study the meaning of the word analogy before you tell me for models and for constructive.
I can build a model from scratch, can you?
 
mytarmailS #:
Learn the meaning of the word analogy before you tell me for models and for design.
I can build a model from scratch, can you?

Clearly you see only your own labour - a year ago I posted here the work of my algorithm of model building, with gifs. Or your memory is worse than mine ...

There are no analogies in the models, moreover, the significant leaves in the two models differ in 99% of cases.

And then, we were talking about setting up an experiment - here you have not given any constructive thoughts.

 
Alexey Burnakov:

Good afternoon, everyone,

I know that there are machine learning and statistics enthusiasts on the forum. I propose to discuss in this topic (without holivars), share and enrich our own knowledge bank in this interesting field.

For beginners and not only there is a good theoretical resource in Russian: https: //www.machinelearning.ru/.

A small review of literature on methods for selecting informative features: https://habrahabr.ru/post/264915/.

I propose problem number one. I will post its solution later. SanSanych has already seen it, please do not tell me the answer.

Introduction: in order to build a trading algorithm, it is necessary to know what factors will be the basis for predicting the price, or trend, or the direction of opening a deal. Selecting such factors is not an easy task, and it is infinitely complex.

Attached is an archive with an artificial csv dataset that I made.

The data contains 20 variables with the prefix input_, and one rightmost variable output.

The output variable depends on some subset of input variables(the subset can contain from 1 to 20 inputs).

Task: using any methods (machine learning) select input variables, which can be used to determine the state of output variable on the existing data.

The solution can be posted here in the form: input_2, input_19, input_5 (example). And you can also describe the found dependence of inputs and output variable.

Who can do it, well done ) From me the ready solution and explanation.

Alexey

Before applying machine learning algorithms, it's crucial to conduct thorough exploratory data analysis. This involves examining the distribution of variables, identifying correlations, detecting outliers, and gaining insights into the structure of the data. Visualization techniques such as histograms, scatter plots, and correlation matrices can be invaluable during this stage.

Techniques such as feature scaling, normalization, encoding categorical variables, and creating new features through mathematical transformations or domain knowledge can improve the model's ability to capture underlying patterns in the data. So...Choosing the right machine learning algorithm for the task at hand is also essential. Given that we're dealing with a prediction problem, regression algorithms such as linear regression, decision trees, random forests, support vector machines (SVM), and neural networks may be suitable candidates. It's advisable to experiment with multiple algorithms and evaluate their performance using appropriate metrics such as mean squared error (MSE), root mean squared error (RMSE), or accuracy, depending on the nature of the problem.

To assess the generalization performance of our models and mitigate overfitting, it's crucial to use cross-validation techniques such as k-fold cross-validation or leave-one-out cross-validation. This involves splitting the dataset into multiple subsets, then training the model on a portion of the data, and evaluating its performance on the remaining unseen data. Cross-validation also helps ensure that our model's performance estimates are robust and reliable.

Many machine learning algorithms have hyperparameters that control their behavior and performance. Hyperparameter tuning involves searching for the optimal combination of hyperparameters to maximize the model's performance. Techniques such as grid search, random search, or Bayesian optimization can be used to fine-tune the model and improve its predictive accuracy. 

 
https://github.com/phil8192/ob-analytics?tab=readme-ov-file

I came across an interesting package for orderlog visualisation, here are the people who have done this
https://cran.r-project.org/web/packages/obAnalytics/vignettes/guide.html
 

I decided to devote the weekend to studying numeraa, because I couldn't get my hands on it.

After registration, they offer tutorials on how and what to do.

The goal is to get into the top managers, and I wonder how they prepare the data there. But the second is unlikely, because all the data is obfuscated :)

Video from them:


 
Nardus Van Staden #:

Before applying machine learning algorithms, it is necessary to conduct a thorough exploratory analysis of the data. This includes examining the distribution of variables, identifying correlations, detecting outliers and understanding the structure of the data. Visualisation techniques such as histograms, scatter plots and correlation matrices can be invaluable at this stage.

Techniques such as feature scaling, normalisation, coding categorical variables and creating new features using mathematical transformations or domain knowledge can improve the model's ability to capture underlying patterns in the data. So. Choosing the right machine learning algorithm for the task at hand is also very important. Given that we are dealing with a prediction problem, regression algorithms such as linear regression, decision trees, random forests, support vector machines (SVMs) and neural networks may be suitable candidates. It is recommended to experiment with several algorithms and evaluate their performance using appropriate metrics such as mean square error (MSE), root mean square error (RMSE) or accuracy, depending on the nature of the problem.

To evaluate the generalisation performance of our models and prevent overfitting, it is important to use cross-validation techniques such as k-fold cross-validation or leave-one-out cross-validation. This involves partitioning the dataset into multiple subsets, training the model on a portion of the data, and evaluating its performance on the leave-one-out data. Cross-validationalso helps to ensure that our model's performance estimates are robust and reliable.

Many machine learning algorithms have hyperparameters that control their behaviour and performance. Hyperparameter tuning involves finding the optimal combination of hyperparameters to maximise model performance. Methods such as grid search, random search or Bayesian optimisation can be used to fine tune the model and improve prediction accuracy.

General Help, ChatGPT style. The best way to shine with encyclopaedic knowledge.
Reason: