Wednesday, August 21, 2019

The standard encoding of Dart source files is UTF-8

Today I spent far more time than I'd intended trying to figure out if I can rely on Dart source files being stored as UTF-8. The spec doesn't say anything about file encodings at all, and just says that "source text is represented as a sequence of Unicode code points" (section 20.1)

It's hard to find anything definitive, but the strongest practical factor seems to be that the Dart Common Front End Scanner has two scanner classes: StringScanner and Utf8BytesScanner. The StringScanner expects a String, and Dart's native String representation is UTF-16, like Chrome. When reading bytes, like from a file, it always uses the Utf8BytesScanner.  And trying to run a Dart program in UTF-16, either with or without a BOM, does not work well.

So I conclude that UTF-8 is the only accepted encoding. And since it's hard to find this out, I'm making this blog post so that future me, or someone else searching, can find this out more easily. Or someone can tell me I'm wrong.

No comments:

Post a Comment