java - メインフレームから抜粋したこの ANSI ファイルの読み方を教えてください

Question

http://www.2shared.com/document/VqlJ-1wF/test.html

1) このファイルがエンコードされているエンコードは何ですか? 2) Java でこれを読む最良の方法は何ですか?

現在、私は持っています

Scanner scanner = new Scanner(new File("test.txt"), "IBM850");
while (scanner.hasNextLine()) {
buffer = new StringBuffer(scanner.nextLine());
System.out.println("BUFFER = "+buffer.toString());
}

多くのヌルとガベージを出力します。使用する必要がある適切なエンコーディングは何ですか?

score 2 · Accepted Answer

PC と IBM ミッドレンジシステム間でのデータ移動に関して、豊富な経験があります。ファイルが(純粋な) EBCDICではないことは明らかです。各「行」の先頭には、ASCII文字があります。

CODE12312345678901502G830918

そのシーケンスに一致する EBCDIC 文字の可能性は、3 行すべてで同じシーケンスであることは気にしないでください。

私の最善の策は、バイナリデータを含む ASCII リードイン (または既に変換された EBCDIC) です。翻訳されている場合、バイナリ部分はほぼ確実に破損しています。

16進数で調べた後、すぐに詳細がわかるかもしれません。

各「レコード」は、CRLF シーケンスのペアである 16 進数 0D 0A 0D 0A で区切られます。

テキストフィールドが ASCII で、その他のフィールドがバイナリの固定フィールドフラットファイル形式である可能性が高いと思います。

score 1 · Accepted Answer

By the looks of it you have taken a binary mainframe file and done a ascii conversion on it when transferring it to the PC. This will not work.

To illustrate what goes wrong consider a 2 byte binary integer field with a value of 64 (X’0040’) this will be converted to 32 (x’0020’) because x’40’ is also EBCIDIC for the space character; the ascii converter will convert all EBCIDIC spaces to ascii spaces (x’20’). You really want binary and Packed-Decimal fields left alone.

You have 2 options:

Convert all the Comp3 / binary fields to text on the mainframe (Cobol / sort / easytrieve etc can do this). Then do the transfer
Do a binary transfer to the PC and either write a program to read the file. The java package JRecord (http://jrecord.sourceforge.net/) can read and write Mainframe files
Do a binary transfer and use a Utility like the RecordEditor (http://record-editor.sourceforge.net/Record04.htm) to read it. The recordEditor can read mainframe file and save them as CSV or Fixed width ascii files. The RecordEditor can use a Cobol Copybook to view the file.

What I can tell you is the file is 2000 bytes long on the mainframe and contains a lot of Packed-Decimal fields (Cobol Comp-3).

I have decoded the first 120 bytes of the first record:

Field     start     length   Value                    Hex Representation
n0        1         4        CODE                     434f4445        
n1        5         17       12312345678901502        3132333132333435363738393031353032       
n2        22        1        G                        47        
n3        23        6        830918                   383330393138        
n4        29        1        V                        56        
n5        30        3        2470                     02470f        
n6        33        4        0                        0000000f        
n7        37        3        2470                     02470f        
n8        40        2        09                       3039        
n9        42        5        290502                   000290502c        
n10       47        5        10842                    000010842c        
n11       52        5        279660                   000279660c        
n12       57        5        19072                    000019072c        
n13       62        5        11488                    000011488c        
n14       67        5        0                        000000000c        
n15       72        4        0                        0000000c        
n16       76        4        0                        0000000c        
n17       80        7        439914                   0000000439914c        
n18       87        7        0                        0000000000000c        
n19       94        7        0                        0000000000000c        
n20       101       4        7588                     0007588c        
n21       105       4        7588                     0007588c        
n22       109       4        0                        0000000c        
n23       113       4        0                        0000000c        
n24       117       5        0                        000000000c        

Where: 
Start  - Field start (byte number)
length - Field length (in bytes)
Value  - Field value
Hex representation - How the field is stored in the file in hex

score 1 · Accepted Answer

通常、IBM メインフレームのデータは、米国の Cp437 や多言語の Cp870 など、地域別の文字エンコーディングのいずれかで保存されます。

score 1 · Accepted Answer

それは間違いなくEBCDICでエンコードされていません（私は70年代と80年代をIBMメインフレームで作業していたので、EBCDICを認識しています:-)。一部のバイナリコンポーネントを含む ASCII のようです。これを適切に解釈する唯一の方法は、プロバイダが各レコードタイプ (1 つまたは複数の場合があります) を記述し、埋め込まれたバイナリオブジェクトのデータタイプを示すマッピングを提供することです。

java - メインフレームから抜粋したこの ANSI ファイルの読み方を教えてください

5 に答える 5

Related

Reference