Writing UTF-8 files in C++

Let’s say you need to write an XML file with this content:

How do we write that in C++?

At a first glance, you could be tempted to write it like this:

When you open the file in IE for instance, surprize! It’s not rendered correctly:

So you could be tempted to say “let’s switch to wstring and wofstream”.

And when you run it and open the file again, no change. So, where is the problem? Well, the problem is that neither ofstream nor wofstream write the text in a UTF-8 format. If you want the file to really be in UTF-8 format, you have to encode the output buffer in UTF-8. And to do that we can use WideCharToMultiByte(). This Windows API maps a wide character string to a new character string (which is not necessary from a multibyte character set). The first argument indicates the code page. For UTF-8 we need to specify CP_UTF8.

The following helper functions encode a std::wstring into a UTF-8 stream, wrapped into a std::string.

With that in hand, all you have to do is doing the following changes:

And now when you open the file, you get what you wanted in the first place.

And that is all!

Hits for this post: 27938 .
Trackback

18 comments untill now

  1. Gravatar

    If i have a string(WideChar) that contains UTF-8 character and I want to write it on a file, How do I do that?

  2. Gravatar
    mariusbancila @ 2009-02-17 15:02

    If it’s already UTF_8 then write it just like you’d write any other string. Use ofstream. You can see the last example in my post. Or maybe I’m missing something and didn’t understand you well.

  3. Gravatar
    Torkel Bj?rnson @ 2009-02-25 01:54

    You can also do this:

    wofstream file(“test”);
    file.imbue(locale(“en_US.utf8″)); // can throw
    file << L”this is a na??ve example” << endl;

    Or if you know your default locale is UTF-8
    file.imbue(locale(“”));

  4. Gravatar

    locale(“en_US.utf8″); doesn’t work in Visual Studio. For “English_United States.1252″ you must use locale(“English”) but this does not set it to UTF-8.

    This method seems to work only on linux.

    Mr. Marius’s example is the only working method for converting wchar_t to char UTF-8 on Windows/Visual Studio.

    Ad as far as the encoding goes… Notepad seems to detect automatically the required encoding to display UTF-8 as far as I tested.

  5. Gravatar
    David Coorey @ 2009-03-18 01:30

    Thanks for this article, Marius. It was exactly what I needed to know.

    I sometimes wonder how I managed to do my job before the internet was around…

  6. Gravatar

    I used this in order to create UTF-8 files:
    _wfopen(strFile, L”wt, ccs=UTF-8″);

    Nothing else needed. :D

  7. Gravatar

    [...] about UTF-8 Encoding. Then he gave a C++ code example of convert from/to UTF-16 to UTF-8. This is another example of writing UTF-8 in [...]

  8. Gravatar

    Sorin, if you’re using _wfopen() with css=UTF-8, you automatically add BOM chars to the file beginning. The BOM should create more disadvantages then advantages.
    For instance, if you open again the same file created with the same parameters (just to change your file content) then you must be careful to remove the old BOM chars section, because otherwise you have two BOMs. :D
    I met this situation in the past into a bug of one of my colleges and it isn’t to pretty.
    I use std::ofstream class and a class conversion that contains a similarly conversion method, Marius’s sample.

  9. Gravatar

    hi,i need a help.pleasssssssse
    i want to write a text like:

    “i am a student.i’m studing computer science at school.i love programming.i want to be very good in it.”

    in a text file.dat or .txt and then read it in c++
    in order to find out how many time, say computer word, repaeat in the text
    ?hjow?

  10. Gravatar

    @shahab, sorry this is not a forum where you can ask questions like this. I suggest to bring this problem in a forum like http://www.codeguru.com/forum.

  11. Gravatar

    Hi there,

    thanks for the code it’s exactly I was looking for.
    Could you just help me out with one more thing? How can I read utf-8 files?

  12. Gravatar

    [...] set up some form of automatic conversion that hooks into the C++ streams library. For example, see Writing UTF-8 files in C++ by Marius Bancila. This is information I’m going to keep in mind, but my testing with GCC 4.5 [...]

  13. Gravatar
    quandaso @ 2011-12-15 15:49

    Thanks, very useful!

  14. Gravatar

    would be nice to see a linux compatible article

  15. Gravatar

    Hmm, there’s two things I don’t like about this:

    1. Casting away a const and then writing to the underlying buffer is undefined behavior. As of C++11 you could use &newbuffer[0] which gives you a non-const pointer and is designed for this purpose.

    2. Storing utf-8 in the buffer of a string is surely asking for trouble… What if someone later attemps to use one of strings algorithms on the buffer? Will it work? Maybe if the utf-8 characters all happen to be narrow, probably not if there are any wide ones in there. This is also basically undefined behvior and the only thing you could safely do with that std::string is immediately write it out.

  16. Gravatar
    hamed ahmad @ 2012-04-09 13:54

    Thanks A Greate example I needed
    I got some problems with appendig to wstring but there is allways a work around

  17. Gravatar

    Ok, about the portable variant. It is easy, if you use C++11 standard (cuz there are a lot of new includes like “utf8.h” to do this). But if you want to create multiplatform code with elder standards, you can use this method (like I used) to write with streams:
    1. Read the article about UTF converter for streams from this link (http://www.codeproject.com/Articles/38242/Reading-UTF-8-with-C-streams)

    2. Add “stxutif.h” to your project from sources above

    3. Open file in ANSI mode and add BOM to the start of a file first of all, like this:
    std::ofstream fs;
    fs.open(filepath, std::ios::out|std::ios::binary);
    unsigned char smarker[3];
    smarker[0] = 0xEF;
    smarker[1] = 0xBB;
    smarker[2] = 0xBF;
    fs<<smarker;
    fs.close();

    4. Then open file as UTF and write your content there:
    std::wofstream fs;
    fs.open(filepath, std::ios::out|std::ios::app);
    std::locale utf8_locale(std::locale(), new utf8cvt);
    fs.imbue(utf8_locale);

    fs<<..//write anything you wan…

  18. Gravatar

    marius : can u plz tell me how to open a file and read which is in utf-8 format.

Add your comment now