Machine learning in trading: theory, models, practice and algo-trading - page 3405

 
Aleksey Vyazmikin #:

Write a complete piece of the last code, please. I don't know much about R and I don't quite understand where to insert code fragments.

I did it like this

I got an error

I've been hearing this phrase from you for about 5 years, maybe it's time?


code for linear feature screening.

data <- data.table::fread("D:\\train.csv", sep = ";") |> as.data.frame()
original_colum_names <- colnames(data)
target <- data$Target_100
data <- data[, !(names(data) %in% c("Time", "Target_P", "Target_100", "Target_100_Buy", "Target_100_Sell"))]
bad_colums <- caret::findLinearCombos(data[1:500,])$remove
good_colums_idx <- which(original_colum_names %in% colnames(data)[bad_colums])-1
write.csv(good_colums_idx, "E:\\...\\индексы_колонок.csv", row.names = FALSE)



selection code

data <- data.table::fread("D:\\train.csv", sep = ";") |> as.data.frame()
original_colum_names <- colnames(data)
target <- data$Target_100
data <- data[, !(names(data) %in% c("Time", "Target_P", "Target_100", "Target_100_Buy", "Target_100_Sell"))]
data <- scale(data)
#  table(target) #  баланс целевой
#################################################
###  без балансировки классов в большую сторону ##
#################################################
library(abess)
model <- abess(y = target, x = data, tune.path = "gsection", early.stop = TRUE) 
ex <- extract(model)
ex$support.vars
#   сохранение имен колонок 
#  write.csv(ex$support.vars, "E:\\FX\\MT5_CB\\MQL5\\Files\\00_Standart_50\\Setup\\Pred.csv", row.names = FALSE)
best_colums_idx <- which(!original_colum_names %in% ex$support.vars)-1
#  Сохранение индексов в CSV файл
#write.csv(best_colums_idx, "E:\\FX\\MT5_CB\\MQL5\\Files\\00_Standart_50\\Setup\\Оставшиеся_предикторы.csv", row.names = FALSE)
################################################
###  с балансировкой классов в большую сторону ##
################################################
 x <- caret::upSample(x = data, y = as.factor(target), list = TRUE)
x$y <- as.numeric(as.character(x$y))
#  table(x$y) #  баланс целевой
model <- abess(y = x$y, x = x$x, tune.path = "gsection", early.stop = TRUE) 
ex <- extract(model)
ex$support.vars
#   сохранение имен колонок 
#  write.csv(ex$support.vars, "E:\\FX\\MT5_CB\\MQL5\\Files\\00_Standart_50\\Setup\\Pred.csv", row.names = FALSE)
best_colums_idx <- which(!original_colum_names %in% ex$support.vars)-1
#  Сохранение индексов в CSV файл
#write.csv(best_colums_idx, "E:\\FX\\MT5_CB\\MQL5\\Files\\00_Standart_50\\Setup\\Оставшиеся_предикторы.csv", row.names = FALSE)
 
mytarmailS #:
I've been hearing this phrase from you for about 5 years, maybe it's time?

Well, so far the python syntax is clearer, and the community is larger, so it is easier to find answers to your questions. And now ChatGPT also allows you to adequately suggest not complicated things.

If local participants would help in understanding R code, there might be some use in the idea of studying it to the levels of conscious copying.

mytarmailS #:
code on linear feature screening

Thanks for writing in a completed form.

I managed to run it without errors - put it on model training.

 
Aleksey Vyazmikin #:

Managed to run without errors - put it on model training.


Waiting for results🙄
 
fxsaber #:

Without understanding the context, when I see such phrases, I immediately start thinking about the dependence on the time of day. The lower (starting from H1) the timeframe, the weaker the traceability on increments.

Sorry for the remark. I treat you with respect.

The time spent on qualitative study of market relations is approximately equal to the time spent on qualitative learning of programming + time spent on studying the topic of application.

 

OFFTOP.

I replaced that after a while after publishing a post, I can not edit it, just disappears button "edit". Is this some kind of innovation or I have something glitchy?

 
mytarmailS #:

OFFTOP

I replaced that after a while after publishing a post, I can not edit it, just disappears the "edit" button. Є Is this some kind of innovation or I have something glitchy?

The time to edit posts has been reduced. Significantly. It is necessary to express thoughts correctly at once.

 
mytarmailS #:

Waiting for the results🙄

Here are the results - the first graph in the series is rebalancing, the second is linear feature dropout, and the third is nothing.
Sample train:


Sampling test:


Exam sample (cf. signs: 2551, 3117, 2866):


It turns out that only dropping the linear features gave some result in improving the mean value of the model's fin results.

Even though the probability of choosing a losing strategy is relatively low, such a spread of financial results is depressing.

At the same time, I know that it is possible to get better results with maximum results from the sample - here is one of the options - training on model leaves (without model selection):


 
Aleksey Vyazmikin #:

Here are the results -

I don't get it. Explain in normal words what helped, what didn't, what was better, what was worse and what was best.

 
mytarmailS #:

I don't get it. Explain in normal words what helped, what didn't, what's better, what's worse and what's best.

Well, graphs are better than words - in brief - there is no special effect of improvement (relative to the original), but at the same time"linear feature attrition" showed itself better, if measured by the average value of the balance of models on an independent sample. At the same time, balancing and abesssifting were able to isolate significant predictors on which we can already build a model - we can consider that the backbone. It may be worthwhile to do a dozen balancing in a cycle and pull out all the resulting predictors.

One thing is obvious: these methods are fast, but far from optimal.

 
Aleksey Vyazmikin #:

Well, graphs speak better than words here - in brief, there is no special effect of improvement (relative to the original), but at the same time,"linear attribute screening" has shown itself to be better, if measured by the average value of the balance of models on an independent sample. At the same time, balancing and abesssifting were able to isolate significant predictors on which we can already build a model - we can consider that the backbone. It may be worthwhile to do a dozen balances in a cycle and pull out all the resulting predictors.

To make the graphs speak better than words, it would be good to explain what they mean in general, what is on them, how they were calculated, etc....

Aleksey Vyazmikin #:

One thing is obvious, these methods, although fast, are far from the optimum.

What do you mean they are far from the optimum?

Take the selection of linearly dependent attributes, the essence of the method is to throw out those attributes that essentially duplicate others.

You had 2410 features, after selection you have 500, the model works the same or better, the method works 100%.

How is this far from the optimum?

You still don't seem to understand what you were doing.

Reason: