<format> in Visual Studio 2019 version 16.10

>>> Shared from Original Post C++ Team Blog

C++20 adds a new text formatting facility to the standard library, designed primarily to replace snprintf and friends with a fast and type safe interface. The standardized library is based on the existing {fmt} library, so users of that library will feel at home.

Before diving into how std::format works I want to thank Victor Zverovich, Elnar Dakeshov, Casey Carter, and miscco, all of whom made substantial contributions to this feature, and were the reason why we could complete it so quickly.

Overview

To start using <format> you just need to be using Visual Studio 2019 version 16.10 or later and you need to be compiling with /std:c++latest. You can get the latest Visual Studio preview here.

The simplest and most common way to use <format> is to call:

template<class... Args>
string format(string_view fmt, const Args&... args);

fmt is the format-string and args are the things you’d like to format. The format string consists of some text interspersed with curly brace delimited replacement fields. For example: "Format arguments: {} {}!" is a format string for formatting two arguments. Each replacement field corresponds to the next argument passed. So std::format("Format arguments {} {}!", 2, 1) would produce the string "Format arguments 2 1!"

Format strings can also contain numbered replacement fields, for example "Format arguments {1} {0}!". These refer to the numbered argument passed in, starting from zero. Numbered and un-numbered (automatic) replacement fields can not be mixed in the same format string.

There are all sorts of modifiers you can use to change the way a particular parameter is formatted. These are called “format specifiers” and are specified in the replacement field like so: std::format("{:<specifiers>}", <arg>). Let’s look at an example that has one of everything.

std::format("{:🐱^+#12.4La}", 4.f);

This returns the string “🐱+1.0000p+2🐱” (printing this string out to the console on Windows can be a bit difficult). Let’s go through what each component of the above string told std::format to do. First we have “🐱^” the “fill and align” part of the format specifiers, saying we’d like our output center aligned and padded with cat emojis. Next we have “+”, meaning we’d like a sign character no matter what (the default is “-” to only print the “-” sign for negatives, and you can also use a space to ask for a minus sign or a space). After that we specify “#”, meaning “alternate form”. For floats the alternate form causes format to always insert a decimal point. Next we specify “12.4” to get a width of 12 and a precision of 4. That means format will use the “fill” and “alignment” settings to make sure our output is at least 12 characters wide and the float itself will be printed to 4 digits of precision. Next the “L” specifier causes format to use locale specific formatting to print things like decimal separators. Finally “a” causes the output to be in hexfloat format. More detailed information about the possible format specifications can be found at cppreference.

For width and precision specifiers you may reference a format argument instead of using a literal value like so:

std::format("{0:{1}.{2}}", 4.2f, 4, 5);

This results in a width of 4 and a precision of 5. The rules for mixing automatic and manual indexing (don’t do it) still apply, but you can use automatic indexing to reference width and precision as in:

std::format("{:{}.{}}", 4.2f, 4, 5);

The assignment of automatic indices is performed left to right, so the above two examples are equivalent.

Performance

In general std::format performance should be in the same ballpark as fmt::format and snprintf if you compile your code with the /utf-8  . If you don’t use the /utf-8 option then performance can be significantly degraded because we need to retrieve your system locale to correctly parse the format string. While we’re working to improve performance for this case in a future release we recommend you use /utf-8 for the best experience.

Unicode

std::format doesn’t do any transcoding between different text encodings, however it is aware of the “execution character set” and uses it to interpret the format string. The versions of std::format taking a wide (wchar_t) format string are always interpreted as UTF-16. The versions of std::format taking a narrow (char) format string interpret the format string as UTF-8 if we detect the /utf-8 (or /execution-charset:utf-8) option. Otherwise we interpret the format string as being encoded in the active system codepage. This means that if you compile your code with a non-UTF-8 execution charset it may not run correctly on systems with a different system codepage setting. There’s also a significant performance cost to figuring out the system codepage, so for best performance we recommend you compile with /utf-8. We’re working to improve the performance of format in non-UTF execution character sets in future releases.

Unicode also comes into play when dealing with width and precision specification for strings. When we interpret the format string as UTF-8 or UTF-16 we compute the “estimated width” of a string taking into account a rough estimate of the size of each code-point. If we’re interpreting the format string as a non-Unicode encoding we just estimate the width as the number of code units (not code points) in the string. In a future release we’ll add grapheme clusterization to the width computations for Unicode encodings.

Locales

While we always parse the format string according to the rules above, the locale used for things like decimal separator positions can be customized. By default no locale is used. If you use the L specifier then some locale specific formatting may be used. By default it’s the current global locale as returned by a default constructed std::locale, however each formatting function has a version allowing you to pass in your own std::locale object to override that behavior.

Future work

Over the next few Visual Studio releases we’ll be improving the performance of std::format, and fixing bugs. Additionally C++23 will likely add compile time format checking to format literals and we may implement that before 2023 (for code you want to work great in C++23 don’t rely on catching std::format_errors from invalid format strings!). C++23 will also make a small change to the definitions of std::vformat_to and std::format_to that reduces code size but can be observable, for forward compatibility make sure any custom formatters work with all output iterators. More information on these changes can be found in p2216r3. C++23 may also bring additional functionality like std::print and better ways to handle Unicode text.

Differences from {fmt} (not exhaustive)

For those familiar with {fmt}, a quick list of differences from the standardized version of the library:

  • Named arguments are not supported.
  • None of the miscellaneous formatting functions like fmt::print or fmt::printf are supported.
  • Format strings are not checked at compile time
  • There is no support for automatically formatting types with an std::ostream& operator<<(std::ostream&, const T&) overload
  • The behavior of some format specifiers is slightly different (for example the default alignment for void*, and allowing sign specifiers for unsigned types)

Give us feedback

Try out format in your own code, and file any bugs on our out GitHub issue tracker.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

The post <format> in Visual Studio 2019 version 16.10 appeared first on C++ Team Blog.

>>>Read the Full Story at C++ Team Blog