Machine learning in trading: theory, models, practice and algo-trading - page 3613

 
mytarmailS #:
You can't understand the whole process by one function, show it with data and model, so that you can run the whole process and make sense of it.

I will have to explain even more... :) you know there will be only questions. I don't want to, because I haven't finished everything I want to do. The essence of the idea is still describing. I haven't fully tested it myself yet.

The input is any dataset with marks, the output is the same dataset, but with corrected ones. You train on it, then test it on another (new data). Other modifications of functions, more simple, earlier in the text

This whole code will have to be described, it is better to write it in an article, because it is long.
 
Maxim Dmitrievsky #:

More explaining will have to be done.... :) you know there will be only questions. I don't want to, because I haven't finished everything I want to do. The essence of the idea is still describing. I haven't fully tested it myself yet.

The input is any dataset with marks, the output is the same dataset, but with corrected ones. You train on it, then test it on another (new data). Other modifications of functions, more simple, earlier in the text

It's the whole code then you'll have to describe, it's better through an article, because it's long.

just the full code will exclude all questions

 
mytarmailS #:

just the full code will eliminate all questions

No, there will be questions for the tester, and there must be a peek somewhere, and how does this work, and what to click here.... And how to load the file correctly, and what is the library version, and what is the python version... I can't get it to work. There's a lot of code in there.
 
Maxim Dmitrievsky #:
No, you will start asking questions to the tester, and there must be a peek somewhere, and how does this work, and what to click here.... And how to load the file correctly, and what is the library version, and what is the python version... I can't get it to work. That's a lot of code.

ok, take a simple publicly known dataset like fisher irises or make your own synthetic one and apply your function and show the code of it, without model, testing, tester, etc.

 
mytarmailS #:

ok, take a simple publicly known dataset like fisher irises or make your own synthetic dataset and apply your function and show the code of it, without model, testing, tester, etc.

Iris has many classes, you need a binary one. You can take something as an example later, yes.

Or you can take the simplest version. It just fixes the labels in the dataset and doesn't throw anything away, no meta model.

Try rewriting to R via chatgpt, it should be able to handle it.

from sklearn.cluster import KMeans

def fix_labels_subset_mean(dataset, n_clusters=200, subset_size=100) -> pd.DataFrame:
    # Применяем KMeans для кластеризации
    dataset['clusters'] = KMeans(n_clusters=n_clusters).fit(dataset[dataset.columns[:-1]]).labels_
    # Вычисляем среднее значение 'labels' для каждого кластера
    cluster_means = dataset.groupby('clusters')['labels'].mean()
    # Сортируем кластеры по их средним значениям и выбираем те, которые наиболее далеки от 0.5
    sorted_clusters = cluster_means.sub(0.5).abs().sort_values(ascending=False).index[:subset_size]
    # Создаем словарь для отображения средних значений в новые значения только для выбранных кластеров
    mean_to_new_value = {cluster: 0.0 if mean < 0.5 else 1.0 for cluster, mean in cluster_means.items() if cluster in sorted_clusters}
    # Применяем изменения к исходным значениям 'labels' только для выбранных кластеров
    dataset['labels'] = dataset.apply(lambda row: mean_to_new_value[row['clusters']] if row['clusters'] in mean_to_new_value else row['labels'], axis=1)
    dataset = dataset.drop(columns=['clusters'])
    return dataset
 
Maxim Dmitrievsky #:

Irises have many classes, you need a binary. We can take something as an example later, yes.

Or you can take the simplest version. It just fixes the labels in the dataset and doesn't throw anything away, no meta model.

Try rewriting to R via chatgpt, it should be able to handle it.

Iris has three classes, you can make a binary target in one line like class is 1 and is not 2 and is not 3.
The result is a binary target.


Yes, I can put it into gpt and translate it to another language, but I can't be sure that it works correctly... I have to control what data goes in, how it is converted and what comes out... I need a normal reproducible example, you know?
 
mytarmailS #:
Irises has three classes, can you make a binary target in one string like class is 1 and is not 2 and is not 3.
The result is a binary target.


Yes, I can put it into a gpt and translate it to another language, but I can't be sure that it works correctly... I need to control what data goes in, how it is converted and what comes out... I need a normal reproducible example, you know.

I'll think of something later. Then it will be possible to estimate only through model errors, if you don't have a tester.

 

The trick there is that if you apply this feature, you'll immediately see an improvement on new ones after training, unless the dataset is full random. That is, traine and test will look more similar, less overtraining.