Need help! Can't solve the problem, I'm hitting hardware limitations - page 11

 
elugovoy:

Yurichik, I meant without any fiddling with file processing, compression, etc. Purely working with SQL and robot/indicator logic. Worked with many databases, the only problem was to make MQL and SQL work together)). I created a nice-looking solution without arrays and structures.

In general, I prefer not to reinvent the wheel and solve problems by the best means.

Zhenya yes, I got you ...

That's exactly what I prefer

... especially if a professional tool is available... better to integrate nicely...

 

What a discussion! Thank you all for your participation!

I'll reply to everyone together, trying not to miss anything (and anyone).

1. Перейти на x64

It took me a while to realise that I didn't specify the terminal version. I'm working with 4, and I'm not going to migrate this EA to 5 yet. MT4, unfortunately, is 32-bit only.

Removing the 4 GB memory limitation on 32 bit Windows 8 / 8.1 won't help, my system is x64 as it is.

2. Cut to pieces / read in parts / load small blocks

It won't work for the conditions of the task.

I'll try to give more details below, maybe it will give a better understanding of the problem.

3. Buy more memory / rent a powerful server or cloud / move everything to SSD

I will of course put another 16GB of memory in the available slots, but that's not the answer.

That volume is not the limit, and expanding the capacity will only solve a special case of the problem.

I think this should only be resorted to when it's certain that it's 100% algorithmically squeezed.

4. Compress data

That's what I'm doing now.

The text cache (which is 20GB) got compressed by a factor of 2, and probably will get compressed some more.
It doesn't solve the memory size problem, but it gets closer to some other solutions (RAM disk).

I will convert it to binary too, it will speed up reading and reduce volume.


It is not clearhow the wizards fromhttp://www.nanex.net/historical.htmlcompress the data. Their data structures are quite redundant.

5. Compress/encode the data and uncompress/uncompress the desired chunk before use

Accepted as an option.

But something tells me that it will also take a long time (here we'll get bogged down by the processor).

6. Transfer calculation to an external program(Matlab/R/etc).

I don't want to, there are many reasons.

Unless it is in a well-integrated with MT environment. But that would still take time to learn the software/environment and/or order the solution from a third party developer. It's inconvenient and costly at the development stage (which I am).

Anyway, trying to stay in the sandbox for now, with all its pros and cons.


7. Create index file and work with indexes

I don't see how this can help.

Still have to repeatedly retrieve data from the main file (constantly re-reading it).


8. Use a database

A very tempting option, given:

  • scalability and portability (you can rent a server or just connect a neighbouring PC);
  • automation and good workflow of many processes that otherwise have to be done manually on a crumple;
  • and other benefits.

But there are some disadvantages:

  • Relatively new environment for me (haven't worked tightly, only basic queries);
  • The complexity of installation on a single client machine (standalone version);
  • Probably something else.
Anyway, if other options don't work, I'll probably go back to this one.


9. Understand and test new and obscure terms

This is just a note to myself for the future and / or a request for more information to the authors ;)

Left undisclosed: File mapping,hash-based solutions, B-trees.


10. Move the terminal with all caches to a virtual RAM disk

So far this is the most promising (in terms of cost/benefit ratio) option.

RAM Disk from SoftPerfect installed, I'll finish cache compression, rewrite calculator to permanent file reading and check performance.

11. Get the task right =)

Very good advice, especially considering the paucity of input information.

As promised, I'll try to give more details.

There are many similar sequences of trades, each sequence is sorted by time.

Deals in different sequences are different and distributed unevenly in time (and in a different way in each sequence). The number of deals is different. But all of them are in the interval from Date1 to Date2.

The task is to move from D1 to D2 with M minutes step (or better - exactly by points of making trades of all sequences), find a sequence that is better than others by criterion K (a separate task - not only to find the best sequence, but to sort the whole set by criterion and output the top 10 - but it is an optional, not required yet).

Criterion K is calculated based on X previous trades of the corresponding sequence, almost all information about each of X trades is used in calculations (profit alone, for example, is not enough).


The criterion (K), number of deals (X) and other parameters influencing results are changed by a user. That is, they cannot be "written" into the algorithm.

Something like this.

In idea, we can restructure the file to make it linear for all trades of all sequences.

But how can this be done without putting all the information into memory? And then how can I recalculate the Criterion, if deals of one sequence will be "smeared" all over the file?

Now, hopefully, the task is clear.

Once again, thank you very much for your participation, discussions, questions, links and concrete replies:

TheXpert,Urain,sergeev,elugovoy,anonymous,meat,ALXIMIKS,IvanIvanov,Integer,C-4,marketeer,barabashkakvn,Silent,GT788,papaklass,grizzly_v,artemiusgreat,YuraZ,Candid,Contender andserver

Thank you!

 
komposter:

What a discussion! Thank you all for your participation!

....

A very tempting option, considering:

  • Scalability and portability (you can rent a server or just connect a neighbouring PC);
  • automation and good workflow of many processes, which otherwise have to be done manually at the knees;
  • and other benefits.

But there are some disadvantages:

  • Relatively new environment for me (haven't worked tightly, only basic queries);
  • The complexity of installation on a single client machine (standalone version);
  • Probably something else.
In general, if other options don't work out, I'll probably go back to this one.


If we go in the direction of SQL


  • Relatively new to me environment (haven't worked closely, only basic queries);

It can be quite a drag on the learning curve.

If you choose this variant, it is better to build all business logic with stored procedures

leaving the Expert Advisor only two functions - send a request to the server and get a completely finished result

all calculations on the server

  • The complexity of installation on a single client machine (stand-alone version);

In fact, the network can find many descriptions of how to put SQL server

( ORACL, SYBASE + CENTOS for example ) ORACL, SYBASE, MS SQL+WINDOWS separate machine

ORACL is a little bit harder to learn - less experts, less literature.

MS SQL - perhaps the biggest amount of information and more literature on the web.

there will be no difficulties - there are many descriptions on the web and more books in the shop.

MSSQL 2012 by its parameters is very close to ORACL - already in 2014.

SQL + LINUX is usually chosen for operation in a production environment - if you don't know anything about LINUX it's better to use WINDOWS

 

komposter:

There are many similar sequences of deals, each sequence is sorted by time.

Transactions in different sequences are different, distributed unevenly in time (and in a different way in each sequence). The number of deals is different. But all of them are in the interval from Date1 to Date2.

The task is to move from D1 to D2 with M minutes step (or better - exactly by points of making trades of all sequences), find a sequence that is better than others by criterion K (a separate task - not only to find the best sequence, but to sort the whole set by criterion and output the top 10 - but it is an optional, not necessary yet).

Criterion K is calculated based on X previous trades of the corresponding sequence, almost all information about each of X trades is used in calculations (profit alone, for example, is not enough).


The criterion (K), number of deals (X) and other parameters influencing results are changed by a user. That is, they cannot be "written" into the algorithm.

Something like this.

In idea, we can restructure the file to make it linear for all trades of all sequences.

But how can this be done without putting all the information into memory? And then how can I recalculate the Criterion, if deals of one sequence will be "smeared" all over the file?

Now, I hope, the task is clear.

Traditionally I'm slowing down in the morning :).

Will one sequence fit in the memory or is there already a problem with it?

If the former, when does multiple reads from disk occur, when the user changes criteria and parameters?

If yes, is the change by some algorithm or manually on some subjective grounds?

 
Candid:

Traditionally I'm slowing down in the morning :).

Will one sequence fit in memory or is there already a problem with that?

If the former, when does multiple reads from disk occur, when the user changes criteria and parameters?

If yes, is the change by some algorithm or manually on some subjective grounds?

A million sequences.

And then in each of them a dot with the right date is found and the previous history is analysed.

And the best of the sequences is chosen.

And it moves on to the next point in the "story".

 
I think that if you don't want to bother end users with installing DBMS, then there is an option to synthesize several (a dozen) typical K-criteria (for example, call them conservative trading, aggressive trading, etc.) and actually put them into the algorithm, calculate once for all sequences the change of the selected indicator in time and then work with one-dimensional vectors.
 
marketeer:
I think that if you don't want to bother end users with installing DBMS, you can synthesize several (a dozen) typical K-criteria (for example, call them conservative trading, aggressive trading, etc.) and actually save them in the algorithm, calculate once for all sequences the change of the selected indicator in time and then work with one-dimensional vectors.

Let's say there are only 2 criteria.

But there are several parameters that set up the criterion, and each can take different values.

Even if we very roughly take several values of each parameter, we will get not a one-dimensional vector, but a 3 or 4-dimensional array.

Then there's not enough memory for sure =)

 
komposter:

A million sequences.

And then in each of them a dot with the right date is found and the previous history is analysed.

And the best of the sequences is chosen.

And it moves on to the next point in the "story".

The entire sequence is loaded into memory. Then it "finds the point with the desired date and analyzes the previous history". The criterion value is compared with the best achieved. If better, the criterion and what to know about the best sequence is remembered. Then the next sequence is loaded in place of the processed one. And so on a million times.

The file is read sequentially and only once.

What is wrong?

 
Candid:

The entire sequence is loaded into memory. It then 'finds the point with the right date and analyses the previous history'. The criterion value is compared with the best achieved. If better, the criterion and what to know about the best sequence is remembered. Then the next sequence is loaded in place of the processed one. And so on a million times.

The file is read sequentially and only once.

What's wrong?

That's right.

And then the "right date" is shifted by the deal closing point of the selected sequence and the algorithm repeats itself.

And so on a million more times =)

 
komposter:

It goes like this.

And then the "right date" is shifted by the closing point of the trade from the selected sequence and the algorithm is repeated.

And so on a million more times =)

But the sequences are independent of each other, right? Then why can't we make a cycle of dates on one loaded sequence at a time? Here, by the way, just might be an opportunity to go to some efficient recurrence algorithm, but that's as luck would have it. The size of a million on a million will remain, and the file will be read once.

Of course, a problem where the number of steps remains the same at the next iteration (i.e. the search area does not get narrower as computation proceeds) does not look very robust. But this is subjective, of course.