Machine learning in trading: theory, models, practice and algo-trading - page 3388

 
Aleksey Vyazmikin #:

Nothing is clear. Probability of finding where is the same example as in the training sample?

the same row in the dataset

if you only have 1,000 rows

Roughly speaking, if you have 18+ features, you're training a classifier to remember every row because they don't even repeat

and in causal inference, you can't match examples to calculate statistics.
 
Aleksey Vyazmikin #:

1. How do you get this matrix? What are the numbers there?

2. I'm talking about rules. I don't care in my approach how and from what the rule is derived, but if the response is similar to another in the training sample, it doesn't carry additional information.

1. any feature values

2. I will surprise you, no one cares how the features were created, everyone evaluates the features based on the response alone
 
Maxim Dmitrievsky #:

Why are large numbers of signs evil? Interesting graph from a book on kozulu.

Probability of finding the same example in the training sample, depending on the number of features.

If you have more than 14 (and even 10) features, you get a lot of rules that you can't reduce without loss.


This is all within the casual framework.
In models with unstructured features (text, pictures).
A few thousand attributes is the norm.
 
mytarmailS #:
It's all within the realm of the casual...
In models with unstructured features (text, pictures).
A few thousand attributes are normal.

They use efficient compression algorithms inside neuronics, like sec2sec, so that's also true.

 
Maxim Dmitrievsky #:

It uses efficient compression algorithms inside neuronics, like sec2sec, so it's also fair.

If we are talking about text, it uses in 95% of cases the usual word count like - how many times a word occurred in a given observation? 0, 1, 103..

And in order to make the feature matrix occupy less space it is kept in the format of "sparse matrix" it is favourable because 95% of matrix values are zeros.

The pictures are convolution.

And seq2seq is exotic for a rare problem.
 
mytarmailS #:
If we are talking about text, there is used in 95% of cases the usual word counter like - how many times a word occurred in this observation? 0, 1, 103..

And to make the matrix of features less occupied it is kept in the format of "sparse matrix" it is favourable because 95% of matrix values are zeros
.

In the pictures, the convolution.

And seq2seq is exotic for a rare problem.

These are different architectures, layer cakes. It's hard to compare. We're talking about normal classification or regression. In this case it looks like a universal law.

 
Maxim Dmitrievsky #:

These are other architectures, layer cakes. It's hard to compare. We're talking about ordinary classification or regression. In this case, it looks like a universal law.

It's all the same thing.

I'm not talking about neurons, I'm talking about the structure of the feedforward.

----------------------------------------------------------------------

Oh, I remember, it's called a bag of words.



What's new, unfamiliar, incomprehensible, complicated?


The same table of signs + any MO


This is working with unstructured data (text) then we translate it into a bag of words structure and then anything else we want

 
mytarmailS #:
It's all the same.

I'm not talking about neurons, I'm talking about the structure of the trait feed.

----------------------------------------------------------------------

Oh, I remember, it's called a bag of words.



What's new, unfamiliar, incomprehensible, complicated?


The same table of signs + any MO


This is working with unstructured data (text) then we translate it into a bag of words structure and then anything else we want

This is from a different topic. No matter how you transform them, the dimensionality of the input vector must be lower than the specified threshold, otherwise you will not be able to determine a pattern. Categorical ones probably have a larger limit on vector length. Plus, consider the dependence on the number of rows. On huge data the number of features may be larger.
 
Maxim Dmitrievsky #:
That's a different matter. No matter how you transform them, the dimensionality of the input vector must be lower than the specified threshold, otherwise you cannot detect a pattern. The categorical ones probably have a larger limit on vector length. Plus, take into account the dependence on the number of rows. On huge data, the number of features may be larger.
What other)))
The whole world does it and everyone is happy)
 
mytarmailS #:
What a different one)))
The whole world does it and everyone is happy)
Well, do like the rest of the world. That's the kind of answers you'll get.
You can take 100500 optimisers and smear yourself with them, also an option :)