Discussion of article "Parsing HTML with curl"

 

New article Parsing HTML with curl has been published:

The article provides the description of a simple HTML code parsing library using third-party components. In particular, it covers the possibilities of accessing data which cannot be retrieved using GET and POST requests. We will select a website with not too large pages and will try to obtain interesting data from this site.

One may ask "What's the point?" A simple solution is to access the site page directly from an MQL script and read the already known number of positions at the known page position. Then the received string can be further processed. This is one of the possible methods. But in this case the MQL script code will be tightly bound to the HTML code of the page. What if the HTML code changes? That is why we need a parser which enables a tree-like operation with an HTML document (the details will be discussed in a separate section). If we implement the parser in MQL, will this be convenient and efficient in terms of performance? Can such a code be properly maintained? That is why the parsing functionality will be implemented in a separate library. However, the parser will not solve all problems. It will perform the desired functionality. But what if the site design changes radically and will use other class names and attributes? In this case we will need to change the search object or event multiple objects. Therefore, one of our goals is to create the necessary code as quickly as possible and with the least effort. It will be better if we use ready-made parts. This will enable the developer to easily maintain the code and quickly edit it in case of the above situation.

We will select a website with not too large pages and will try to obtain interesting data from this site. The kind of data is not important in this case, however let us try to create a useful tool. Of course, this data must be available to MQL scripts in the terminal. The program code will be created as a standard DLL.

In this article we will implement the tool without asynchronous calls and multi-threading.

Author: Andrei Novichkov