Machine learning in trading: theory, models, practice and algo-trading - page 3400

 
mytarmailS #:

how do you figure that out?

Like finding out which method of feature selection is best?

In fact, there are only two types, exhaustive search and euristic search methods (discrete optimisation).

Full search is always better, but not always possible if there are a lot of features. Besides, we are looking not for one best feature, but for the best subset of prizes, on a more or less normal dataset it is impossible to do a complete search because of the combinatorial explosion, so euristic - discrete optimisation (but without guarantee that we have found the best solution).

There is one good package, I haven't tested it deeply and I don't know the mathematics, but guys claim that the algorithm finds the best subgroup in polynomial time (very fast), i.e. it's not a complete search and not euristics. I used it a little bit, I think the package does what they say. So basically this is the leader of the selection methods.

https://abess-team.github.io/abess/

I think there's one for python, too.

====================================

And the point is not even in the efficient selection of prizes (although it is necessary), but in the generation of candidate features. This is the most important thing, this is the most important thing, these are sensors, eyes and ears of the trading system.

This abes is rubbish.

The best set of features on the history by the criterion of minimising the classification error.

And the problem is in the predictive ability of features, not in the classification error. There is a certain predictive ability of features and to it corresponds a certain classification error, which is a given for the available set of features. If you want to reduce the classification error, look for another set of features with increased predictive ability.

 
mytarmailS #:

how do you figure that out?

Like finding out which method of feature selection is best?

That's right. Take a dozen samples and check the efficiency of building models on selected predictors for each sample.

mytarmailS #:
Full search is always better, but not always possible if there are a lot of features.

This is of course obvious. That's why heuristics are interesting.

mytarmailS #:
There is one good package, I haven't tested it deeply and I don't know the mathematics, but the guys say that the algorithm finds the best subgroup in polynomial time (very fast), i.e. it's not a complete search and it's not heuristic. I've used it a little bit, it seems to do what they say. So in essence, this is the leader of the methods of selection

I've run it on a sample - it's been five hours - I'm tired of waiting. In general, I understand it is suitable for regression more (including logistic for classification), and is not universal.

GPT offers such code in R for selecting and saving excluded predictors. I have here limited the number of predictors to 50 pieces - and decided to wait again.

#  Установка и загрузка пакета abess
#install.packages("abess")
library(abess)

#  Загрузка данных из CSV
data <- read.csv("E:\\FX\\MT5_CB\\MQL5\\Files\\00_Standart_50\\Setup\\train.csv", sep = ";")

#  Указание столбца целевой переменной
target_column <- "Target_100"
target <- data[, target_column]

#  Исключение столбцов по их именам
столбцы_исключения <- c("Time","Target_P","Target_100","Target_100_Buy","Target_100_Sell")
#data_without_excluded <- data[, !names(data) %in% столбцы_исключения]

#  Выбор только первых 50 столбцов
data_without_excluded <- data[, 1:50]

#  Применение метода abess
#  Здесь вам нужно указать вашу модель и настройки метода abess
#  Например:
model <- abess(y = target, x = data_without_excluded, method = "lasso")

#  Все столбцы
все_столбцы <- names(data)

#  Индексы всех столбцов
индексы_всех_столбцов <- seq_along(все_столбцы)

#  Индексы выбранных столбцов
индексы_выбранных_столбцов <- model$selected

#  Индексы исключенных столбцов
индексы_исключенных_столбцов <- setdiff(индексы_всех_столбцов, индексы_выбранных_столбцов)

#  Сохранение информации в CSV файл
write.csv(индексы_исключенных_столбцов, "E:\\FX\\MT5_CB\\MQL5\\Files\\00_Standart_50\\Setup\\исключенные_столбцы.csv", row.names = FALSE)

Excluded all columns :) Either the method is like that, or a bug in the code..... Or maybe the predictors are that bad - I'll add more.

 
Aleksey Vyazmikin #:

Correct. Take a dozen samples and check the efficiency of model building on selected predictors for each sample.

That's obvious, of course. That's why heuristics are interesting.

I ran it on a sample - it's been five hours already - I'm tired of waiting. In general, I understand that it is more suitable for regression (including logistic regression for classification), and is not universal.

GPT offers such code in R for selecting and saving excluded predictors. I have limited the number of predictors to 50 and decided to wait again.

Excluded all columns :) Either the method is like that, or a bug in the code..... Or maybe the predictors are that bad - I'll add more.

how many columns are in the data?

 
mytarmailS #:

how many columns are in the data?

A little over 2000

 
Aleksey Vyazmikin #:

I ran it on a sample - it's been five hours - tired of waiting.

Is it so difficult to read the example with code on the page with the method, why crap with GPT and do not understand what is being done?

it should count in a minute, not 5 hours.

 
Aleksey Vyazmikin #:

A little over 2,000

Send me the sample data and I'll take a look.

and it's good for regression and classification, it says so, and there are examples... what kind of people.


It's all written down.


 
mytarmailS #:

Is it so difficult to read the code example on the method page, why do you need to code with GPT and not understand what you are doing?

It should count in a minute, not 5 hours.

What's wrong with the code? It freezes on the help in studio :)


mytarmailS #:

Send me the sample data and I'll take a look at it.

and it is suitable for both regression and classification, it says so, and there are examples... what kind of people.


It's all spelled out.


It says even on the screen "logistic regression" - of course I was looking at examples, which for some reason they already have in Python.

 
Aleksey Vyazmikin #:

What's wrong with the code? It freezes on the help in the studio :)


It even says "logistic regression" on the screen - of course I was looking at examples, which for some reason are already in python.

library(abess)
set.seed(1)
n <- 1000

target <- sample(0:1, size = n, replace = T) 

col <- 50000

X <- cbind(matrix(rnorm(n*col), ncol = col, dimnames = list(NULL, paste0("noise",1:col))),
          good_0.2 = target+rnorm(n, sd = 0.2),
          good_0.3 = target+rnorm(n, sd = 0.3),
          good_0.4 = target+rnorm(n, sd = 0.4),
          good_0.5 = target+rnorm(n, sd = 0.5))



ab <- abess(x = X, y = target, tune.path = "gsection", early.stop = TRUE)
ex <- extract(ab) тут исправил
ex$support.vars

Try this one

binary classification

1000 rows

50 thousand features / columns

the best feature subset was found in less than 3 seconds.


all features that are relevant to the target found and none of the 50,000 noisy ones.

 
 ex$support.vars
[1]  "good_0.2"   "good_0.3"   "good_0.4"   "good_0.5"  
 
Aleksey Vyazmikin #:

1) What is wrong with the code? On the help in studio it hangs for me :)

2) It says "logistic regression" even on the screenshot - of course I looked at the examples, which for some reason are already in python.

1) EVERYTHING is wrong, what the hell is a lasso, it's a regression, and you put the data in for classification.

2) It says CLASSIFICATION :..... , titanic data, target binary survivor/non-survivor.

Logistic regression is a classification algorithm, e.g. it is used to classify texts.