How to determine if your MO will work on new data - General

Maxim Dmitrievsky 2024.05.14 14:37 #35171

There is, in general, an ingeniously simple thing that allows you to determine if your MO will work on new data.

To do this, you only need to have a set of labels :) Because they are a binarised representation of the original series.

def calculate_entropy(series):
  """
  Calculates the entropy of a binary series using Shannon's formula.

  Args:
      series: A list of binary values (0 s and 1 s).

  Returns:
      The entropy of the series (float value).
  """
  counts = {0: 0, 1: 0}
  for bit in series:
    counts[bit] += 1
  total_length = len(series)
  probabilities = {key: value / total_length for key, value in counts.items()}
  entropy = 0
  for p in probabilities.values():
    if p > 0:  # Avoid log2(0) which is undefined
      entropy += -p * math.log2(p)
  return entropy

Any rookie question, so How to compare str Question on working with

Aleksey Vyazmikin 2024.05.14 14:59 #35172

Forester #:

If it's overtrained, it means less training.

I wonder, if I am not heard, then others are not heard either - so everyone is talking to himself here.

I wrote about quantum segment - actually the range of any predictor - there is no other here. We take the range, take all zeros and ones from the sample - weight them for balance and plot the balance - if the label is "1" then +1, if it is "0" then -1*K_Balance.

In fact, this is a graph of the change in the probability bias in dynamics - over the chronology of the sample.

Probabilistic neural networks, packages Is there a pattern Zero sample correlation does

mytarmailS 2024.05.14 17:47 #35173

Maxim Dmitrievsky #:

check it out for yourself, it seems to be not empty, according to my first tests.

mytarmailS #:
Topological Data Analysis (TDA) for predicting crises on the Russian stock market. #sber, #rosn (youtube.com)

Aleksey Vyazmikin 2024.05.14 18:40 #35174

Maxim Dmitrievsky #:

There's basically a simple-to-genius thing you can do to determine if your MO will work on new data.

To do this, you only need to have a set of labels :) Because they are a binarised representation of the original series.

Need prufs with graphs :)

Aleksey Vyazmikin 2024.05.14 18:53 #35175

This is an experiment to illustrate that even a relatively good classification will not give a guaranteed profit.

The code contains an array (sequence) of "-1" - loss and "1" - profit, summing up to zero. But there are two normal distributions - from one of them we extract a losing financial result and from the other a profitable one.

The experiment is carried out 100 times and we observe different balance graphs and histogram of their distribution.

By default the distribution for profit is bigger than for loss, you can play with different variants - code below. I don't have fixed stop levels, so this experiment shows very clearly that a model cannot be evaluated only by balance or only by classification metrics, we need some kind of complex criterion.

import numpy as np
import matplotlib.pyplot as plt

# Параметры нормальных распределений
mean_N1, std_N1 = -50, 25  # для первого распределения (от -100 до 0)
mean_N2, std_N2 = 100, 25  # для второго распределения (от 0 до 200)

# Заданная последовательность
sequence = np.array([1, 1, -1, 1, 1, -1, -1, -1, 1, -1])

# Количество экспериментов
num_experiments = 100

# Массив для сохранения результатов баланса
balances = []

# Проведение экспериментов
for _ in range(num_experiments):
    values = np.where(sequence == 1, np.random.normal(mean_N2, std_N2, len(sequence)), np.random.normal(mean_N1, std_N1, len(sequence)))
    balance = np.cumsum(values)  # Используем кумулятивную сумму для отслеживания баланса на каждом этапе
    balances.append(balance)

# Визуализация балансов
plt.figure(figsize=(12, 6))

# Построение графиков баланса
plt.subplot(1, 2, 1)
for i, balance_curve in enumerate(balances):
    plt.plot(balance_curve, label=f'Experiment {i + 1}', alpha=0.7)  # Используем alpha для прозрачности
plt.title('Balances Over Experiments')
plt.xlabel('Step')
plt.ylabel('Balance')
#plt.legend()

# Построение гистограммы распределения итоговых балансов
plt.subplot(1, 2, 2)
final_balances = [balance[-1] for balance in balances]
plt.hist(final_balances, bins=20, color='skyblue', edgecolor='black')
plt.title('Distribution of Final Balances')
plt.xlabel('Balance')
plt.ylabel('Frequency')

plt.tight_layout()
plt.show()

If we knew exactly Avalanche Yoghurt systems and canned

Aleksey Vyazmikin 2024.05.15 03:03 #35176

СанСаныч Фоменко #:
lstm works with radically unbalanced classes - for 2 classes by a factor of ten?

Drop a sample, at least I'll estimate the potential for training with my method.

Maxim Dmitrievsky 2024.05.15 07:28 #35177

Aleksey Vyazmikin #:

Need prufs with graphs :)

No prufs yet, because I do random sampling and the entropy of such a series always tends to 1, i.e. random

Maxim Dmitrievsky 2024.05.15 08:09 #35178

mytarmailS #:

Check your own, it's not empty, according to my first tests.

I don't get it.

Maxim Dmitrievsky 2024.05.15 08:40 #35179

Maxim Dmitrievsky #:

No prufs yet, because I do random sampling and the entropy of such a series always tends to 1, i.e. random

If I find a correlation between the results on OOS and the estimation of labels, for example, through entropy, it will be eloquent

Maxim Dmitrievsky 2024.05.15 09:20 #35180

PE - permutation entropy of labels before training

Below is the R2 of the trained model on them, taking into account the OOS

Iteration: 0, Cluster: 4, PE: 1.4114993035356607
R2: 0.9750027827074201
Iteration: 0, Cluster: 14, PE: 1.4024791111873602
R2: 0.9254099918204924
Iteration: 0, Cluster: 8, PE: 1.41096302580775
R2: 0.9713689561861256
Iteration: 0, Cluster: 0, PE: 1.4269597630754562
R2: 0.9807136795397998
Iteration: 0, Cluster: 1, PE: 1.391583598392451
R2: 0.9600008089806283
Iteration: 0, Cluster: 11, PE: 1.4537902469647772
R2: 0.9720898913608796
Iteration: 0, Cluster: 10, PE: 1.3738852280483222
R2: 0.9536059212630769
Iteration: 0, Cluster: 7, PE: 1.37156426933497
R2: 0.9702039222164988
Iteration: 0, Cluster: 3, PE: 1.433485632603243
R2: 0.9846474447504004
Iteration: 0, Cluster: 12, PE: 1.4031034270604625
R2: 0.9480575294534516
Iteration: 0, Cluster: 2, PE: 1.3916341170184174
R2: 0.9587764283979536
Iteration: 0, Cluster: 9, PE: 1.4121627190055983
R2: 0.9449264868011292
Iteration: 0, Cluster: 6, PE: 1.4026169498968089
R2: 0.5991722238532007
Iteration: 0, Cluster: 5, PE: 1.321808319045704
R2: 0.9698055619808859
Iteration: 0, Cluster: 13, PE: 1.4465424887848997
R2: -0.05071422654396962

all values are very similar and it is impossible to determine anything :)

This is because the way of partitioning is always the same. You need other datasets, you can calculate on yours. Example calculation.

import numpy as np
from itertools import permutations

def permutation_entropy(time_series, order=5, delay=1):
    #  Create a list of all possible permutations of the given order
    permutations_list = list(permutations(range(order)))
    perm_count = {perm: 0 for perm in permutations_list}
    
    #  Calculate the permutation patterns in the time series
    for i in range(len(time_series) - delay * (order - 1)):
        #  Extract the sequence
        sequence = time_series[i:i + delay * (order - 1) + 1:delay]
        #  Find the permutation pattern
        sorted_index_tuple = tuple(np.argsort(sequence))
        #  Increment the permutation count
        perm_count[sorted_index_tuple] += 1

    #  Normalize the counts to get a probability distribution
    perm_count_values = np.array(list(perm_count.values()))
    perm_probabilities = perm_count_values / np.sum(perm_count_values)
    
    #  Calculate the permutation entropy
    pe = -np.sum([p * np.log2(p) for p in perm_probabilities if p > 0])
    
    return pe

#  Example usage
time_series = np.array([4, 7, 9, 10, 6, 11, 3])
order = 2
delay = 1
pe = permutation_entropy(time_series, order, delay)
print(f"Permutation Entropy: {pe}")

Machine learning in trading: theory, models, practice and algo-trading - page 3518