Need help! Can't solve the problem, I'm hitting hardware limitations - page 14

 
komposter:

...

There are many similar sequences of trades and each sequence is sorted by time.

Will one sequence fit in memory?

You can write an Expert Advisor. At the start, the Expert Advisor loads the sequence # (parameter in the properties window) and trades on this sequence. Optimize №.

The task is not quite clear, while a lot of different things can be implemented.

 
Integer:

Will one sequence fit in memory?

You can write an Expert Advisor. The Expert Advisor loads sequence No. at startup (parameter in the properties window) and trades on this sequence. Optimize №.

The task is not quite clear, while a lot of different things can be phanathized.

If we have 20Gb (21 474 836 480 bytes), with 1 million sequences, we get about 21 475 bytes on average per sequence (~21Kb). I think it should fit in the memory, even in the phone ))

О. By the way, what about distributed computing? We should think in that direction too...

 
I think he's got the history files from the signals.)
 

Sorry again for the pause, I've been experimenting with the RAM drive (not very successful so far). Replying to everyone in order.

Candid:

But the sequences are independent of each other, right? Then why can't I loop through the dates on a single-loaded sequence at once? Here, by the way, just might be an opportunity to go to some efficient recurrence algorithm, but that's as luck would have it. The size of a million on a million will remain, and the file will be read once.

Of course, a problem where the number of steps remains the same at the next iteration (i.e. the search area does not get narrower as computation proceeds) does not look very robust. But of course, this is subjective.

Independent. And how can we loop through all sequences at once without loading them into memory?

The number of steps can be reduced, if you figure out how to read sequences from the right place (the last X trades from the current analyzed date).

Urain:
Is the whole base fit into 10 yr lines or not? all files cumulatively

One file per million sequences (I write each one in 9 lines for convenience).

Well, or a million files, it doesn't matter.

ALXIMIKS:

Did I understand correctly the following:

1) A 20gb file consists of about a million sequences sorted by time

2) The size of each sequence may vary and depends on the number of trades the sequence contains

3) The average size of a sequence is 20/10^6 = 20 Mb, so what can we guarantee to download completely one sequence?

4) The K coefficient depends only on the trades within the given sequence

5) For each sequence, we must find K (a total of 10^6 pieces) and select the top 10

  1. Yes
  2. Yes
  3. 20K, we can guarantee
  4. Yes, one run uses one Criterion with fixed settings. But in the future I would like the next runs (with a changed Criterion or other settings) to be counted quickly too.
    Until I had such volumes, I just downloaded everything into memory and ran it in a loop, counting everything I needed.
  5. The Criterion value is calculated for each deal of the sequence, starting from deal #X (this is the amount needed for calculation).
    The best sequence should be selected at each point of history (the best one - sequence with the best Criterion at the current moment, the Criterion is calculated using the last closed transaction.

ALXIMIKS:

A if we create another file with values of distances between sequences.

I didn't understand this and the next one.

Candid:
By the way yes, you can load batches of sequences at once.

It won't save, you need all of them.

Silent:

I don't understand it somewhere.

Here is the criterion - all - in the interval from Date1 to Date2.

I.e., it reads as follows.

Why not break the file into many intervals from Date1 to Date2? There will be worked sequences that can be closed, right?

The interval "Date1 - Date2" currently covers all transactions of all sequences.

And the idea of splitting it into several smaller ones is quite sensible. True, then you would have to read the information from the disk every time the parameters were changed... But that's something.

Candid:
Apparently one of the results of a single date pass is a new date.

Yes, but I think you can find a point where there are no deals for any of the sequences and make a break.

Then there will be a transition to the next interval. I'll give it a try.

 
ALXIMIKS:

if the problem is this:

given a row 1 2 3 4 5 6 7 8 9

The width is given, for example 4, you need to move with this width along the entire row to find some value inside the width (e.g. minimum):

first you have to find in 1 2 3 4 then 2 3 4 5 then 3 4 5 6 then 4 5 6 7 then .... etc.

If X (number of deals) was fixed (4 in your example) and there were no other parameters - yes. And so - no.

Vinin:
I would ponder this task

The conditions are stated above, welcome to join our ranks ;)

marketeer:
Since the problem is rather academic (sounds like a question for hiring a programmer) and many people showed interest in it, why not formulate it more strictly in terms of input data description format, and everyone would be able to generate 20 Gigs of test data and submit their practical solution?

No problem.

Generate sequences of random trades (in chronological order, even for 1 year) with all their attributes: date and time of opening, opening price, SL, TP, date and time of closing, closing price. Number the sequences from 1 to a million.

The task is to create a new series of consecutive trades out of all the trades of all sequences (not overlapping in time), and selected by a certain criterion.

Let the criterion be the average profit of the last 20 trades of the sequence.

Example result:

  1. Sequence #250, trade #53, criterion = 51: 2013.01.31 00:00 - 2013.02.12 12:30 (criterion was calculated for trades #32-52, i.e. 53rd was not used)
  2. sequence #1222, deal #28, criterion = 75: 2013.02.13 10:00 - 2013.02.13 02:21
  3. And so on to the end of the year.

joo:
so i take it we're talking about a homemade tester/optimiser?

Yeah, something like that.

sergeev:

nah, there's something different there.

I guess some broker/provider has got the deal database. :)

If only =)

sergeev:

I'll repeat the task in simpler terms.

No, it's not. I gave a concrete example, it can be considered as close to the real problem as possible.

 
elugovoy:

Typical of the database. But there is no way to do it without data aggregation. You can write in a separate table the unique attributes of a sequence (c-dates), the average value of profit K and variance D, and then look for the top 10 sequences that are close to the criteria you need. With indexes on these fields, the search will not take that long (even in a million records). Then, when you get the right 10 sequences, you can poke around in the raw data, but it won't be a million searches any more, as we have a date limit.

If the Criterion were static...

What if its parameters change?

elugovoy:

It's still a mystery - what should we be looking for? What should be the result of all operations? If we are talking about making a decision in terms of opening/closing an order, any processing of such a volume will take a fairly large amount of time.

Yes, then there will be a trade. But the recalculation will only be needed for the freshest data, we do not need to recalculate the entire history.

elugovoy:

There is another point. Since we are talking about deals, maybe there is a sense to separate the deals for each symbol? And to write similar trading robots designed for EURUSD, USDJPY, etc.

It's one instrument...

elugovoy:

It seems to me that that way you can only identify the strategy that was used to trade (or the set of parameters of the robot) of a certain sequence and switch to it in a certain market situation.

Exactly.

 
Integer:

Will one sequence fit in memory?

You can write an Expert Advisor. At the start, the Expert Advisor loads the sequence No. (parameter in the properties window) and trades on this sequence. Optimize number.

It will.

But we don't need a single (final) result, but at each point in time.

Basically, we could use cloud by giving sequence number and date, up to which to read, as a parameter. But it's hardly faster than recalculation of a file)

elugovoy:

О. By the way, what about distributed computing? We should think in this direction too...

What shall we calculate in parallel? Criterion's value for different sequences?

But we need to load them into memory for that. And it doesn't matter if they are loaded from one process (Expert Advisor, terminal) or from several processes. Maybe, we may get 8 (or 12, 16, 20) instead of 4 Gb. But one has to "glue" the result afterwards.

 
komposter:

What shall we count as parallel? What is the meaning of the criterion for different sequences?

The idea is to divide all of the stories into, say, 100 groups and for each group calculate the optimal set separately.

Then, for each group, we would leave only those stories that are in the optimal set of the group and proceed to the next step with a smaller number of stories. Then, theoretically, it parallels 100 times.

And from memory all is fine, the size of the group can be adjusted

 
TheXpert:

The idea is to divide all of the stories into, say, 100 groups and for each group calculate the optimal set separately.

Then, for each group, leave only the stories in the optimal group set and proceed to the next step, with fewer stories. Then, theoretically, it parallels 100 times.

And from memory all is fine, the size of the group can be adjusted

If you load 100 of 100 parts in parallel, there is not enough memory =)

And if you load sequentially (each time memorizing one best variant), then where is the paralleling? And the file will still be read every time you go to the machine.

I think it's possible to invent a clever partial loading mechanism, but it has to be invented.

For example, on the first read, find for each pass the last trade closed before the Start Date, go back and read X previous trades, remember the point in the file where that trade ends.

After that, find the first deal in the results, and then work only with fresh data: read the file from the desired point to the new actual date, and each time shift deals in the array (get fixed size array - X elements).

This will solve the problem with multiple reading (it is simply not needed), and with memory (only if we can accommodate X million transactions).

I will move in this direction.

 

Switching to binary files, by the way, gave almost no gain in size.

The text format I optimized was very compact: the date was memorized one pass (opening of the first trade), and all others (both opening and closing) were memorized as an offset from the previous date; SL, TP and PriceClose were also saved as an offset from the opening price.

When I switched to binary format there was no sense in this optimization - shift in seconds (uint) takes as much space, as full date (I refused long-a, 2030 years is enough for me), but ushort is not enough (only 18 hours maximal shift). The same with the prices - I could move the offset to ushort, but then I would have to add checks for overflow (if, for example, SL = 0), and the gain would be only 6 bytes on all 3 prices. Decided not to do it.

But I've removed a little unnecessary information, so I've got 20% of it. Projected size of the original file (which was 20Gb) - 7-8Gb, a couple of hours will convert.

Well, I also won CPU time, which was used to convert the lines.