Crazy cache of testing agents

 

Good day to you all!

Ran into the following problem:

Having 32 logical processors in the system - using 32 agents for optimization respectively (+ another 40 remote ones)

Each agent rather quickly builds up cache of totally inadequate size 2-2.6GB, which in total is more than 70GB per day! The cache does not delete itself, and is constantly growing. The only thing that stopped the madness was running out of disk space. After which the agents stupidly stop working.

The questions are as follows:

Has anyone faced with such a problem? How do I deal with it? What can cause such big cache sizes?

Wrote a request to servicedesk, so far silence.

 
P.S.: terminal is 64x bit, latest version
 
alrane:

Good afternoon everyone!

Ran into the following problem:

Having 32 logical processors in the system - using 32 agents for optimization respectively (+ another 40 remote ones)

Each agent rather quickly builds up cache of completely inadequate size 2-2.6GB, which in total amounted to more than 70GB in a day! The cache itself is not deleted, and is constantly growing. The only thing that stopped the madness was running out of disk space. After which the agents stupidly stop working.

The questions are as follows:

Has anyone faced with such a problem? How to deal with it? What may be causing such cache sizes?

Wrote a request to servicedesk, so far silence.

The cache size depends on the number of generated ticks (i.e. the longer the test period and number of characters, the bigger the cache).

In your case, probably the main problem is the number of agents, because now (build 1495) each agent uses its own cache instance!

The cache space is freed after 5 minutes of agent downtime.

Additionally, the tick history for agents in the tester can take space, if agents are used in the cloud (the tick history is eventually cleaned up, too, but it goes by days or weeks).

By the way, cloud agents and local agents are different. In the picture the cloud agents on the same computer are added to the local network farm - Voilà! We got 8 test agents on a processor with two cores and four logical processors (whether it's worth doing so is another question).

 
Ashes:

В Вашем случае, вероятно, главная проблема - количество агентов, т.к. сейчас (билд 1495) каждый агент использует собственный экземпляр кэша!

Therein lies (may the developers forgive me) the stupidity of the tester organization, when the number of agents turns from an advantage into a problem.

The tester doesn't have any settings at all, so it's impossible to optimize it for your system. In the output we get hard drive abuse with huge number of small files rewritten (up to 800GB/day on SSD 120GB at 32 agents in the system), and what's funny, the cores at the time are idle.

Partially solved the problem by running 4 different testers in portable mode on different physical drives. Including RAM-diske, as the tester leaves a large amount of memory unattended.

By the way, running an agent with cache on a ram-disk, often increases performance by up to 3 times! Which once again points to the disgusting way the tester is organized.

Ashes:

By the way, cloud agents and local agents are different. In the picture the cloud agents on the same computer are added to the local network farm - Voilà! We got 8 test agents on a CPU with two cores and four logical processors (whether you should do that is another question).

You should not do so for the same reason - the cores will also be waiting for data from disk, but already in double volume. I think this will only decrease performance.
 
alrane:
This is the (forgive me developers) stupidity of the tester organisation, where the number of agents is turned from an advantage into a problem.

The tester doesn't have any settings at all, so it's impossible to optimize it for your system. In the output we get hard drive abuse with huge number of small files overwritten (up to 800GB/day on SSD 120GB at 32 agents in the system), and what's funny, cores stay idle during this time.

...
By the way, running an agent with a cache on a frame-disk, often increases performance by up to 3 times! Which once again points to the disgusting way the tester is organized.

...

Write to Service Desk.

 
I've written before. It's no use.
 

Having to read several gigabytes of data from a drive is "disgusting organisation"? Even just reading 1gb of data from an ssd at an average speed of 200mbps will take 5 seconds. And if there are 4-32 agents out there?

You just think about the technical side of the task. Nothing is free and no one multiplies the technical requirements by zero.

The technical solution and the level of agent optimization is amazing - we put a wild amount of work into it and scratched milliseconds out of all the processes. Don't forget the data volumes, put in more RAM, put in bigger ssd's, put in frame disks and everything will be accelerated.

The prices of all these things are already reasonable, but the class and volume being solved require a serious approach.

 
alrane:

Each agent is rapidly building up a cache of a completely inadequate size, 2-2.6GB, totaling over 70GB in a day! The cache won't delete itself, and is constantly growing. The only thing that stopped this madness was running out of disk space. After which the agents stupidly stop working.

What is there to cache in such volumes?!
 
fxsaber:
What is there to cache in such volumes?!
Usually, traders prefer to look at the size of the folder without noticing that there are tens of gigabytes of their personally generated extensive logs.

Everything is OK with the data caches. We keep it on disc and store it in memory waiting for re-runs. Please note how much faster recalculation on the same agent (take one agent and one run to demonstrate the effect).



One more thing - we work very sparingly with disks. We write in large multiples and clearly understand peculiarities of ssd disks.
 
Renat Fatkhullin:
Traders usually prefer to check the folder size without noticing that there are about 10 Gbytes of big, personally generated logs.
We were talking about gigabytes of cache per local agent. I still don't understand what can be stored there in such quantities?
alrane:
Partly solved the problem by running 4 different testers in portable mode on different physical disks. Including RAM-diske, since the tester leaves a large amount of memory unattended.

By the way, running an agent with cache on a ram-disk, often increases performance by up to 3 times! Which once again points to the disgusting organization of the tester.

What is the reason for the RAM-disk organization to increase performance many times over, if the level of optimization of agents is "marvelous"? In my opinion, logical questions, though unpleasant.

ZS We need to get some software that will erase logs of agents. They are useless in such quantities. The Print+Alert should be disabled by the user in the code during optimizations.

 

fxsaber:
Речь шла о гигабайтах КЕША на каждый локальный агент. Мне до сих пор не понятно, что можно там хранить в таких количествах?

Instead of making forum statements, take a look for yourself.

What's the reason for organising RAMdisk to multiply performance if the level of agent optimisation is "amazing"? In my opinion, logical questions, though unpleasant.

Because we don't have the right to eat 100% of the RAM for the cache and keep it indefinitely. But if a person created a 32-64gb disk frame himself, moved agents there and began to actively work with the disk, then yes, it is possible to get a speedup of disk operations many times over.

But specifically disk operations, not "all at once by a factor of N".

That the tester handles data amazingly is evident to anyone who uses it in constant mode and gets a lot of benefits from warmed up tester caches waiting for new runs in the background. Experimentation is usually tens or hundreds of tester runs with constant recompilation of code.

HH We need some kind of software to erase the agent logs. We don't need them in such large quantities. And Print+Alert should be disabled by the user in the code during optimizations.

The tester logs are erased automatically. Those who use the tester know it. And the tester's caches are wiped by the terminal as soon as it understands that the tester is not used anymore.

The topikstarter started the thread in "how long?" mode and made some unsubstantiated claims. If he had provided properly collected data, 50% of the questions would have fallen away at the data collection stage.