Features of the mql5 language, subtleties and tricks - page 163

 
Nikolai Semko:

no I would have noticed. Although I don't exclude that in some cases (when working with Unicode) this is possible. In Java, for example, char type is 2 bytes.
I tried to parse data from crypto-exchange in two variants: via this JSON library and via working with char array.
The difference turned out to be 700(!!!) times by speed. I was shocked. Perhaps it was far from the best JSON implementation.


character is 16LE and strings are obviously from pascal . By the way and arrays from Fortran

 
Nikolai Semko:

no I would have noticed. Although I don't exclude that in some cases (when working with Unicode) this is possible. In Java, for example, char type is 2 bytes.
I tried to parse data from crypto-exchange in two variants: via this JSON library and via working with char array.
The difference turned out to be 700(!!!) times by speed. I was shocked. Perhaps it was far from the best JSON implementation.

When passing mql string to dll, on dll side, mql string type is taken as wchar_t*.
And type size mismatch is not only found in Java, it depends on architecture type, I don't remember what, or operating system, or iron.

700 times? Wow, I was just putting this library aside for JSON parsing, it's not worth it?
And it's better to translateStringToCharArray and parse array in loop?

 
Roman:

700 times? Wow, I just put this library aside for JSON parsing, so it's not worth it?
And it's better to translateStringToCharArray and parse array in loop?

I think so, yes. Although you should always check it. Do some measurements. I don't rule out that the string functions weren't written in the best way, and now they've been fixed.
I took these measurements more than a year ago.

The code will of course be larger when working with char arrays, but it is more flexible.

 
Roman:

And most likely under mql string there is short[] or wchar_t[] or wchar_t*.
After all, mql strings are in Unicode, while utf is 2 bytes.
And StringToCharArray converts from short[] to char[].

unicode != utf && utf != 2 bytes (utf is not the same as utf) && MSVC is not a standard

The point of wchar_t is to fit any supported character into a single wchar_t (well, about smallsoft their way), and the input output streams convert to/from locale encoding themselves. No size/encoding guarantees. When accepting wchar_t in dll, think about whether it's correct. Unless, of course, it's interesting to look beyond the sandbox into the adult world.

 
Vict:

unicode != utf && utf != 2bytes (utf utf'y is different) && MSVC is not a reference

The point of wchar_t is to fit any supported character into a single wchar_t (well, about smallsoft their way), and the input output streams convert to/from locale encoding themselves. No size/encoding guarantees. When accepting wchar_t in dll, think about whether it's correct. Unless, of course, it's interesting to look beyond the sandbox into the adult world.

Yes, I know that Unicode and UTF are different encodings, and they're supposed to be different.
I just wanted to write and abbreviate the word Unicode, so I guess I didn't get it right.

Although the Unicode reference says that the standard includes characters from almost every written language in the world.
The standard consists of two main parts: the Universal character set (UCS) and the Unicode transformation format (UTF).

Because Unicode already contains a UTF encoding, I put it that way to make the word shorter.

I don't know if wchar_t* is correct or not.
Used what's in Renat's examples, from the article how to write dll.
mql5 strings are in Unicode, which contains UTF, therefore I think it is logical to use wchar_t * in example of the article.
To accommodate any supported character in one wchar_t.

About no size/encoding guarantees, didn't even know about it, maybe use Cish short* for purity then ?
If it will be correctly supported by MSVC IDE, of course.
Because usual true will be swallowed by environment and give it TRUE.

 

UTF-8 and UTF-16 have the appropriate bit depth.

In UTF-8 the language pages are switched by special codes.

UTF-16 includes the full variety of characters at the same time.

 
Edgar Akhmadeev:

UTF-8 and UTF-16 have the appropriate bit depth.

In UTF-8 the language pages are switched by special codes.

UTF-16 includes the full variety of characters at the same time.

Well, as I understand from what many people write on the forum, mql5 strings are just in UTF-16
And in the mql documentation they write:
A text string is a sequence of characters in Unicode format with a trailing zero at the end.
Because of this, it is hard to understand which encoding is actually mql5 string.
And if Unicode already contains all families of UTF, then why even use the word UTF, and introduce confusion.
Unicode is all, plain and simple.
Or should we say so?
Unicode with a bitrate of UTF-16?

Actually someone from developers earlier wrote that
mql string type consists of two parts, buffer 8 bytes and pointer 4 bytes, resulting in 12 bytes.

 
Roman:

I know that Unicode and UTF are different encodings.
Just as it happens, I wanted to write and abbreviate the word unicode, probably not luck.

Although the Unicode reference says that the standard includes characters from almost every written language in the world.
The standard consists of two main parts: the Universal character set (UCS) and the Unicode transformation format (UTF).

Because Unicode already contains a UTF encoding, I put it that way to make the word shorter.

I don't know if wchar_t* is correct or not.
Used what's in Renat's examples, from the article how to write dll.
mql5 strings are in Unicode, which contains UTF, therefore I think it is logical to use wchar_t * in example of the article.
To accommodate any supported character in one wchar_t.

You are confused. Unicode is a table of characters with codes, it used to fit in 0-65535(2 bytes), then it grew. And spending 4 bytes per character is fat. That's where utf, an encoding with variable length, came in handy (for example, utf-8 encodes ASCII characters with one byte). Therefore the Unicode (table) does not contain any utf.

About no size/encoding guarantees, didn't even know about it, maybe use Cish short* for purity then ?
If it will be correctly supported by MSVC IDE, of course.
Because usual true will be swallowed by environment and give it TRUE.

The standard includes char16_t, char32_t, fixed size types. Wchar_t has a different meaning.

 
Roman:

As I understood from what many people write on this forum, mql5 strings are in UTF-16.
And in the mql documentation they write:
A text string is a sequence of characters in Unicode format with a trailing zero at the end.
Because of this, it is hard to understand which encoding is actually mql5 string.
And if Unicode already contains all families of UTF, then why even use the word UTF, and introduce confusion.
Unicode is all, plain and simple.
Or should it be said that way?
Unicode with UTF-16 bit rate ?

That's not all.

As ANSI Cyrillic = CP1251, so

Unicode:

UTF-8 = CP65001, // UNIX/Linux

UTF-16LE = CP1200, // Windows

UTF-16BE = CP1251,

UTF-32LE = ?

UTF-32BE = ?

ISO10646:

UCS-2 ~ UTF-16

UCS-4 = UTF-32

Confusion? Nope, haven't heard.

 
Edgar Akhmadeev:

UTF-8 and UTF-16 have the appropriate bit depth.

In UTF-8 the language pages are switched by special codes.

UTF-16 includes the full variety of characters at the same time.

What code pages, what are you talking about? The "special codes" define the number of bytes to encode a character because the encoding is of variable length. UTF-8 can encode any Unicode character as well as UTF-16. And utf-16 with variable length (surrogate pairs).