Need help! Can't solve the problem, I'm hitting hardware limitations - page 16

 
komposter: How to calculate the criterion for the last X deals of the sequence, having such a file?


We have a fragment of the file in our memory, go through it and form a sample of necessary length for calculating the criterion, selecting only deals, which belong to the same sequence. Then we calculate the criterion on this sample. By the way, there are possibilities of using recursion when selecting, in theory.

Or did I not understand the question?

P.S. Of course, we have to go backward on the file when forming the sample.

 
Candid:

We have a fragment of the file in our memory, go through it and form a sample of the necessary length for calculating the criterion, selecting only deals, belonging to the same sequence. Then we calculate the criterion on this sample. By the way, there are possibilities of using recursion when selecting, in theory.

Or did I misunderstand the question?

P.S. Of course, we have to go backward on the file when forming the sample.

the problem with inserting new data - solve it somehow.

Why go through and select the white socks many times, if it's easier to go and throw the white ones in one basket and the black ones in another, and then ask who is there and in what quantity.

 
komposter:

A chunk is read. The size of the chunk is determined by the number of transactions up to the Seek Date that were in a particular sequence.

By the way, if the starting point of each sequence is known, the desired dates can be searched by binary search, as the trades are ordered by time.
 
ALXIMIKS:

the problem with inserting new data - solve it somehow.

why go through many times and choose white socks when it's easier to go and throw white in one basket black in another and then ask who's there and in what quantity.

Too much data is also bad :)

The problem is that it's not the whites and blacks that are selected here, but those that are locally whiter. So calculating the global degree of blackness doesn't save. By the way, I started in this thread just with a suggestion of continuous calculation of the criterion.

P.S. By the way, nobody prevents to process some files together - simply cache for each will have to do less. But it seems to have reserve on cache size.

That is, the new data can simply be accumulated in another file.

P.P.S. By the way, slicing the file into several smaller files will ease the problem of sorting.

 
komposter:

1. If the Criterion were static... What if its parameters change?

2. Yes, then there will be a trade. But recalculation will only be needed for the most recent data, there's no need to shake the entire history.

3. This is one tool...

4. Exactly.

1. From what I said above"Let the criterion be the average profit of the last 20 trades of the sequence."This should be understood as one criterion, moving expectation of profit. What others are there?

In the database, generate a table with sequence identifier and corresponding moving averages. The sequences that don't fit the conditions should be deleted immediately. This should be done by a concurrent mode procedure at DBMS level, on request from the robot, with process status displayed in the robot.

Let us say, FilterAvgProfit (pProfitValue, pTrades, pDeviation),

where pProfitValue is the target profit, pTrades is the number of trades for the moving average profit, pDeviation is the allowed deviation from pProfitValue.

The result - filled table with sequence IDs and average values of profit.

Similarly, you can write stored procedures for each of the criteria.

2. If part of the data is discarded (use "fresh data" rather than 1million) it will give a performance gain.

3. It wasn't quite clear from the statement. Now ok.

4. I understand that if we are looking at a strategy selection, this operation should not be performed too often (say, on every bar or immediately before opening an order). This approach is reasonable if the current strategy shows N losing trades in a row - then we can choose another one and it will take time to "make a decision", there is nothing to avoid. Or, perform such selection once a week (on weekends, when the market is closed), and, either confirm the currently selected strategy, or switch to another one. It is possible to make a list of optimal recommended strategies for a trader, in given conditions. Then, when the market opens and with a clear head (on Monday), the trader will confirm the choice (or earlier, before the market opening... the e-mail alert, etc.).

Somewhere like this.

 

Память выделяется однократно для массива структур последовательностей.

The structure of the sequence includes: No., Array of structures of all deals of the sequence [X], Criterion value, File Index position.

The next step is only filling the elements of the structure (including arrays of deals). The deals in the array are shifted, so there are always only X deals of each sequence in memory.

You allocate memory for an array of structures and get:

arrayno,

an array of arrays of structures of all deals of the sequence [X],

an array of Criterion Value,

an array of File Pointer Positions.

Why do you needCriterion Value array and FileIndex Positions array? (Have you thought about storing one Criterion and the last trade?)

Have I got it right:

First pass - search on interval from 0 to SeekDate

then find the best criterion andFindDate = trade closing time + 1

search now on the interval from"trade closing time" to theSeekingDate ?

and you need to have X trades in that interval to calculate the criterion for each sequence?

 
komposter:

Sharing the results of my research.

Binary cache file of 7529 MB reads:

  • From hard disk: in 212.3 sec (35.46 MB/sec)
  • From RAM disk: in 88.1 sec (85.46 MB/sec)
It's hard to call the difference a cosmic one, though I have the most common hard drive (though, the memory isn't fast either).

Conclusion: about 2.5x speedup on a large file read from a RAM disk.

Strange results.

Here's from our working server system under load:

  • with SSD: 200 Mb per second, NTFS
  • with RAM: 2000-2500 Mb per sec, FAT32, Softperfect RAM Disk 3.4.5

Without RAM disks, it takes many times longer to build projects.

 
Renat:

Strange results.

Here's from our production server system under load:

  • with SSD: 200 Mb per second, NTFS
  • with RAM: 2000-2500 Mb per second, FAT32, Softperfect RAM Disk 3.4.5

Without RAM disks, it takes many times longer to build projects.

This is what I was saying - you have to read large files in large chunks, otherwise small ones may take up to 10 times longer.
 
papaklass:

In my opinion, the solution to the problem lies in coding the raw data.

If you can't get away with reading the raw data multiple times, you need to convert it to an acceptable format for multiple reads.

One possible option is to convert each record to a 16-bit number. Each field of the original record should be allocated a specific number of bits in the number. For example:

The most significant digit of the number:

- "0" means a negative transaction result;

- "1" denotes a positive result of a transaction.

the lower digit of the number:

- "0" denotes a BUY trade;

- "1" means a SELL deal.

and so on.

Thus, instead of repeatedly reading a source file with many fields, the work is reduced to repeatedly reading a single number field, which should give a significant speedup.

By and large, the source file can be immediately generated in an encoded format, although the information in it, will appear in a non-visual form.

But not in "16-bit" but in 64-bit, Andrew has an x64 processor, so the minimum unit of volume when accessing memory is 64 bits. Even if you read one byte from memory, the processor will still read 8 bytes (two double words).

 
komposter:

Yes, in this form the task is parallel - each timethe SeekDate changes,you can run a simultaneous search for the best Criterion on different parts of the sequence set. For example, we divide them into 20 parts and give the task to 20 Expert Advisors. And they should read the file, find the deal, and send back only the best sequence (№№, Criterion and file position).

Thank you very much!

There you go, another thing ))