Need help! Can't solve the problem, I'm hitting hardware limitations - page 4

 
komposter:

Thought about it. Asked for an opinion:

What would be the speedup compared to reading a file and what would be the slowdown compared to working in memory?

The DBMS puts its tables in memory whenever possible.

but not in your case, unless of course you have 32 GB of RAM

so how to put 20GB in 4GB is a real optimization challenge.


If you want to simplify your task, make conventional RAM drive in memory.

If you can't, then go for a hard drive.

 
1) Consider the SSD option. You can buy a fast 100 gig drive for about 5 roubles or even less.


3) Variant 1 + variant 2, i.e. to fill your data into the database, and the database in turn be placed on a solid-state drive.

I think the last option will suit you fine. If not, change your operating system from user to server OS.
 
There was an article here about data transfer between MKL and e.g. C#, you can put all the heavy operations there and read the file piece by piece without taking up all the RAM. The data transfer is quite handy and fast in the form of structures.
 
komposter:

How much faster will it be compared to reading a file and how much slower will it be compared to working in memory?

Well, you don't just need to read the file, you also need to search, calculate, convert text to numbers, perform sorting, etc.

Firstly, if the data is not updated often, you can create as many indexes as you want for the attributes involved in the data search (including aggregate attributes). Thus, the search will be faster (using indexes), hence the computation as well.

Secondly, say MySQL, MS SQL, Oracle and other databases use data caching technology for repetitive queries, which also gives some processing speed advantage.

Thirdly, you can divide a table into parts (partitions), say, by years. Thus queries selecting data for one year will not read/search for data located in other partitions.

Fourthly, since your source data is in text form, when you load it into the database, it should be smaller in size due to the natural type conversion. For example, number 124.223456221 in text form will take 13 bytes, in the database depending on the type 4-8; date and time "2014-08-17 10:23:35" will take 19 bytes, and in the database 8 bytes.

Fifth, if you use frequently aggregated information for certain periods, you can aggregate that data once and store it in another table.

Of course, if we are just talking about reading data into memory, WinApi will do it faster, but what to do with the data afterwards? Imagine, even to search for the right part of data, you have to read all the data from disk. Or you have to write the indexing functionality, sort data in the file, create index files for all the search operations, and rewrite half of the DBMS functionality. To process such amount of data and want reasonable performance, it's necessary.

My opinion is unambiguous - a server DBMS (file DBMS like MS Access, SQLite will not work here) on a dedicated machine. It will be reasonable enough performance and easy to process data (SQL queries). Otherwise, you will waste a lot of time writing low-level "internals" to process the file.

 
komposter:

Thought about it. Asked for an opinion:

What would be the speedup compared to reading a file and what would be the slowdown compared to working in memory?

( I have experience with databases over 3TB and relatively dwarf databases from 10-100gigs )


but with certain hardware ... say 64gb of RAM and higher with a good disk subsystem

In this situation compared with working with a huge file

SQL will speed up considerably but speed will depend on SQL implementations of course

- correct database design - correct indexes - correct database configuration

this means file splitting (the way elugovoy writes about is true)

A full implementation would require a separate server and a server OS - SQL database

if MS SQL must be no lower than 2008 (in terms of software is also desirable not below 64)

But in my opinion it will be rather labour- and hardware-intensive to implement... ( 64 bit is ideal)

--

If you have only 16 gigs on your machine and it is used as a station

Just put SQL server on it will not be great - but better than to bother with a text file

But if you don't have any experience with SQL, some effort will be needed in the implementation.

 
barabashkakvn:

And if this file is compressed with an archiver, how big will it get (because the text should be very well compressed)?

The time it takes to uncompress each pass will kill performance

 
YuraZ:

the time it takes to un-archive each pass will kill performance

I didn't mean unarchiving. If archiving can greatly reduce the size, then it makes sense to compress the information into an index file.
 
barabashkakvn:
I don't mean unarchiving. If archiving can greatly reduce the volume, then it makes sense to compress the information into an index file.

originally was

barabashkakvn:
And if this file is compressed with an archiver, what is the volume (after all, the text should compress very well) ?

hence the reaction to your post!


index file - create... ?! that's another topic

It is even cooler to write your own (SQL-like) server - but why?

 
YuraZ:

from the beginning it was

hence the reaction to your post !


index file - to create... ?! that's another topic

It is even cooler to write your own (like SQL) server - but why?

Originally there was a question to the author - but how much will the file be compressed. You've already made it up about unzipping.
 
barabashkakvn:
The original question to the author was how much the file is compressed. ....

May I ask why?

It will be compressed by 70-80% and what will it do for the author to solve the problem he described?