c# - File.ReadAllTextの無効な文字

Question

score 14 · Accepted Answer

This is likely due to a mismatch in the Encoding. Use the ReadAllText overload which allows you to specify the proper Encoding to use when reading the file.

The default overload will assume UTF-8 unless it can detect UTF-32. Any other encoding will come through incorrectly.

score 13 · Accepted Answer

ほとんどの場合、ファイルにはデフォルトとは異なるエンコーディングが含まれています。知っている場合は、File.ReadAllTextメソッド（String、Encoding）オーバーライドを使用して指定できます。

コードサンプル：

string readText = File.ReadAllText(path, Encoding.Default);  // <-- change the encoding to whatever the encoding really is

エンコーディングがわからない場合は、この前のSOの質問を参照してください：ファイルエンコーディングが不明な場合にReadAllTextを使用する方法

score 12 · Accepted Answer

You need to specify the encoding when you call File.ReadAllText, unless the file is actually in UTF-8, which it sounds like it's not. (Basically the one-parameter overload is equivalent to passing in UTF-8 as the second argument. It will also detect UTF-32 with an appropriate byte-order mark, I believe.)

The first thing is to work out which encoding it is in (e.g. ISO-8859-1 - but you need to check this) and then pass that as a second argument.

For example:

Encoding isoLatin1 = Encoding.GetEncoding(28591);
string text = File.ReadAllText(path, isoLatin1);

It's always important that you know what encoding binary data is using before you try to read it as text. That's true for files, network streams, anything.

score 0 · Accepted Answer

あなたが読んでいるキャラクターは置換キャラクターです

Unicodeで値が不明または表現できない着信文字を置き換えるために使用されます。置換機能を示すために、制御文字としてのU+001Aの使用を比較してください。

http://www.fileformat.info/info/unicode/char/fffd/index.htm

これは、ファイルの実際のエンコーディングがプログラムが期待するエンコーディングと一致しないために発生します。

デフォルトでは、ReadAllTextはUTF-8を想定しています。有効なUTF-8文字を表していないバイトシーケンスが発生しているため、Replacement文字に置き換えます。

c# - File.ReadAllTextの無効な文字

4 に答える 4

Related

Reference