Machine learning in trading: theory, models, practice and algo-trading - page 3622

 
I don't know the first thing about graphs, unfortunately.
 
mytarmailS #:
I don't know the first thing about graphs unfortunately

reciprocally )

 
Maxim Dmitrievsky #:

reciprocally )

You wrote that you need to find correlation in the dataset.
When I tried to train models based on different regressions, I noticed that the higher the correlation between features, the better the model was trained.
And preprocessing of the dataset in the form of data centring reduced the error.
You also write that the features should still be found. Maybe search for signs by correlation could be added somehow?
Well, and PCA algorithm seems to reduce large sample by selecting the main components.
Just sharing a thought, just in case.

 
Roman #:

You wrote that you need to find the correlation in the dataset.
When I tried to train models based on different regressions, I noticed that the higher the correlation between the features, the better the model was trained.
And preprocessing of the dataset in the form of data centring reduced the error.
You also write that the features should still be found. Maybe search for signs by correlation could be added somehow?
Well, and PCA algorithm seems to reduce large sample by selecting the main components.
Just sharing a thought, just in case.

https:// en.m.wikipedia.org/wiki/Correlation_does_not_imply_causation

That's what this contest is about.

Correlation gives 0.36 maximum accuracy on new data. You can get 1.0 on training data.
 
Maxim Dmitrievsky #:

The correlation gives 0.36 maximum accuracy on the new data.

I have a maximum of 0.274, I'm ashamed to send such a result)))

On average, how much time do you spend on one training cycle?

 
Evgeni Gavrilovi #:

I get a maximum of 0.274, it's a shame to send such a result)))

How much time do you spend on one training cycle?

The longest time there is markup, it learns quickly. I mark up 1/10 of the dataset first, for speed, see what I get :)

There is no point in sending less than 0.5, it doesn't get into the top 10 anymore
 
Maxim Dmitrievsky #:
The longest there is the marking, learns fast.

Fast is 5-10 minutes? For some people even 1 hour will seem fast )

 
Evgeni Gavrilovi #:

Fast is 5-10 minutes? For some people even 1 hour will seem fast )

less than a minute in a colaba, catbusters.

 
Evgeni Gavrilovi #:

Select the part of the dataset to be partitioned and trained, the very end of the Computing section (X_train, y_train)

print(f"Creating X_y_group_train from {len(names_datasets_train)} datasets and graphs")
MAX_SAMPLES = 1000
#  Получаем первые MAX_SAMPLES ключей
first_keys_f = list(names_datasets_train.keys())[:MAX_SAMPLES]
#  Создаем новый словарь с первыми MAX_SAMPLES записями
first_dict_f = {k: names_datasets_train[k] for k in first_keys_f}
first_keys_l = list(names_graphs_train.keys())[:MAX_SAMPLES]
#  Создаем новый словарь с первыми MAX_SAMPLES записями
first_dict_l = {k: names_graphs_train[k] for k in first_keys_l}

X_y_group_train = create_all_columns(
    {
        pearson_correlation: first_dict_f,
        #  enhanced_pearson_correlation: first_dict_f,
        #  fast_regression_analysis: first_dict_f,
        #  ttest: first_10_dict_f,
        #  mutual_information: first_10_dict_f,  #  uncomment this line to add features but at high computational cost
        label: first_dict_l,
    },
    n_jobs=-1,
)
 
Maxim Dmitrievsky #:

Select a part of the dataset for markup and training

Thanks, hadn't even thought of that