Need help! Can't solve the problem, I'm hitting hardware limitations - page 6

 
YuraZ:

>>> I don't know what I'm looking for ...

>>> You have to go through all the sequencesrepeatedly and do some calculations.

Well - yes - search - but it searches through 20 gigs...

In principle, the search is based on the fact that there is some search and comparison

I'm going by what the author wrote.

Maybe the data can't be shrunk - compressed - indexed


it is logical to put the data into SQL

pass the business logic to the server + data

the Expert Advisor will send only short data to the server for enumeration and calculation

and receive a ready answer.

Searching with brute force is the Middle Ages.
 
Integer:

Oh and I'm struck by a lot of things around here.

It seems to me that even a child should be able to understand it. If a text is compressed using an algorithm, it will be exactly the same today and tomorrow.

By the way - 3 terabytes were copied from server to server for hours - network of 1gb

when compressed into ZIP, 3 terabytes were compressed for over a day

When you bought the great utility LiteSpeed which first compresses in the server memory and then backups

3 terabytes of compression time was reduced to a few hours

Unpacking (to change or delete something) also takes several hours.


Solving the compressed data search algorithm is cool!

maybe in the future someone will come up with algorithms to search for insertion and deletion in already compressed databases

... but there are no such algorithms in industrial scale yet


There are industrial databases of ORACL MS SQL no one in the world stores data in compressed form - if they are intensively worked with

 
YuraZ:

1. By the way, 3 terabytes were copied from server to server for several hours - 1gb network

when compressed into ZIP, 3 terabytes were compressed for over a day

I bought a cool utility called LiteSpeed which first compresses the server memory and then backups

3 terabytes of compression time was reduced to a few hours

unpacking (to change or restore, delete) also takes a few hours.


2. Solving the algorithm for searching in compressed data is cool!

3. maybe in the future someone will come up with algorithms to search for insertion and deletion in already compressed databases

4. ... but there are no such algorithms in industrial scale yet


There are industrial ORACL MS SQL databases no one in the world stores data in compressed form - if you work with them intensively

1. For the task at hand, data compression is performed only once and it can take a week to compress the data.

2) What's so cool about it?

3) Why should we invent something? The question is, do you need it or not?

4. So what if it is not?

 
Integer:

1. For the task at hand, data compression is done once, you can do it for a week.

2. What's so cool about it?

3. What's there to invent? The question is, is it necessary or not?

4. So what if it is not?

1) p1 only after solving p4

2) well - I do not know maybe the question(FAST) search in large data sets has already been thought through enough qualified professionals and more than once - and so far no algorithm

3) God knows - search in compressed data may be invented, but it's not solved and most likely because it's just not needed...

4) maybe - the best brains in the world will invent an algorithm for(FAST) search in compressed data

searching (SLOWLY) in compressed data is possible - with a methodology (uncompressing and then searching) it is not a question...

 

No one is talking about searching in compressed data. We're talking about comparing two compressed sequences.

Suppose an array, "aaa", "bbb", "vvvv". Each of the three array elements is compressed by itself independently from the rest. Suppose we compress and get the array "a", "b", "c".

We have the sought string "bbb" which we need to find in the array. Before searching, we compress it and get "b". Now search and find.

 
Integer:

No one is talking about searching in compressed data. We're talking about comparing two compressed sequences.

Suppose an array, "aaa", "bbb", "vvvv". Each of the three array elements is compressed independently of the rest. Suppose we compress and get the array "a", "b", "c".

We have the desired string "bbb" which we need to find in the array. Before searching, we compress it and get "b". Now we search and find it.

the idea is clear...

and yet this methodology (with a quick search) is absent in industrial databases

there must be a reason

 
Integer:

Oh and I'm struck by a lot of things around here.

It seems to me that even a child should be able to understand it. If you compress some text with some algorithm, it will be exactly the same in compressed form today and tomorrow too.

Are you saying that using the same compression algorithm and compressing two different texts you can obtain two completely identical data sequences?

What makes you think I'm saying that?

 
YuraZ:

the idea is clear...

and yet this methodology (with a quick search) is not available on industry bases

there must be a reason.

Of course there are reasons :)

Data compression is about eliminating redundancy. And the more efficiently the compression is done, the less redundancy. And the search proposed above method will not work, because in the compressed text, any part will depend on the entire text.

 
Contender:

Naturally, there are reasons :)

Data compression is about eliminating redundancy. And the more efficiently the compression is done, the less redundancy there is. And you can't search using the above method, because in a compressed text any part will depend on the whole text.

:-) that's what we're talking about ...
 
elugovoy:

What makes you think I'm saying that?

You're kind of hinting:

Well, it will give you 4-8 times the compression as for text. Consider the fact that the compression algorithms create their own transcoding trees for each file.

In other words, the source file will have one tree, but the portion of data you want to find will have a different tree..

Just wondering how you propose to search? even theoretically ))))

How to search, I wrote earlier.

Reason: