c - (C) z827 ASCII 圧縮用にこのアルゴリズムを修正するには?

Question

ノブ警告。

圧縮プログラムを作成しようとしています。引数として ASCII 文字を含む .txt を取り、各文字のバイナリ表現の先頭の 0 を切り捨てます。

これは、2 つの異なる整数の最後の 2 バイトを使用して行われます。整数 write の 4 バイト目に先行ゼロの文字が入り、整数 temp の 3 バイト目に次の文字が入ります。次に、'temp' int を右に 1 回シフトし、'write' で OR 演算して、先頭のゼロスロットが必要なデータで満たされるようにします。これが繰り返され、各文字の後にシフトカウンターが増加します。最初のケースは少し奇妙です。紙に書いた場合、アルゴリズムはそれほど複雑ではありません。

私はすべてを試したような気がします。私は何度もアルゴリズムを調べてきました。問題は shift_counter が 8.. になったときだと確信していますが、問題なく動作するはずです。そうではありません。ここでその理由を説明できます (コードはさらに下にあります)。

これは私の出力の 16 進ダンプです。

0000000    3f  00  00  00  41  10  68  9e  6e  c3  d9  65  10  88  5e  c6
0000020    d3  41  e6  74  9a  5d  06  d1  df  a0  7a  7d  5e  06  a5  dd
0000040    20  3a  bd  3c  a7  a7  dd  67  10  e8  5d  a7  83  e8  e8  72
0000060    19  a4  c7  c9  6e  a0  f1  f8  dd  86  cb  cb  f3  f9  3c    
0000077

そして正しい出力：

0000000    3f  00  00  00  41  d0  3c  dd  86  b3  cb  20  7a  19  4f  07
0000020    99  d3  ec  32  88  fe  06  d5  e7  65  50  da  0d  a2  97  e7
0000040    f4  b4  fb  0c  7a  d7  e9  20  3a  ba  0c  d2  e3  64  37  d0
0000060    f8  dd  86  cb  cb  f3  79  fa  ed  76  29  00  0a  0a        
0000076

コード：

int compress(char *filename_ptr){

    int in_fd;
    in_fd = open(filename_ptr, O_RDONLY);

    //set pointer to the end of the file, find file size, then reset position 
    //by closing/opening
    unsigned int file_bytes = lseek(in_fd, 0, SEEK_END);
    close(in_fd);
    in_fd = open(filename_ptr, O_RDONLY);

    //store file contents in buffer
    unsigned char read_buffer[file_bytes];
    read(in_fd, read_buffer, file_bytes);

    //file where the output will be stored
    int out_fd;
    creat("output.txt", 0644);
    out_fd = open("output.txt", O_WRONLY);

    //sets file size in header (needed for decompression, this is the size of the
    //file before compression. everything after this we write this 4-byte int
    //is a 1 byte char
    write(out_fd, &file_bytes, 4);

    unsigned int writer;
    unsigned int temp;
    unsigned char out_char;

    int i;
    int shift_count = 8;
    for(i = 0; i < file_bytes; i++){


      if(shift_count == 8){
          writer = read_buffer[i];
          temp = temp & 0x00000000;
          temp = read_buffer[i+1] << 8;
          shift_count = 1;
      }else{

        //moves the next char's bits to the left, for the purpose of filling the
        //8 bit buffer (writer) via OR operation
        temp = read_buffer[i] << 8;
      }
      temp = temp >> shift_count;
      writer = writer | temp;

      //output right byte of writer
      unsigned int right_byte = writer & 0x000000ff;

      //output right_byte as a char
      out_char = (char) right_byte;

      //write_buffer[i] = out_char;
      write(out_fd, &out_char, 1);

      //clear right side of writer
      writer = writer & 0x0000ff00;

      //shift left side of writer to the right by 8
      writer = writer >> 8;        

      shift_count++;

    }

    return 0;
}

score 0 · Accepted Answer

入力と出力の結合が強すぎるように思えます。

ある時点で、プログラムは入力から (およそ) 80 番目のオクテットを読み取り、(およそ) 70 番目のオクテットを出力に書き込む必要があります。右？

何のループ

for(i = 0; i < file_bytes; i++){
    ...
    ... = read_buffer[i];
    ...
    write(out_fd, &out_char, 1);
    ...
}

実際に行っているようです: 70 番目のパスでループ - 70==i の場合 - 入力から 70 番目のオクテットを読み取り、70 番目のオクテットを出力に書き込みます。ループの 80 番目のパス (80==i の場合) では、入力から 80 番目のオクテットを読み取り、80 番目のオクテットを出力に書き込みます。

決定する必要があります: "i" で処理される入力文字数を表すか、処理される出力文字数を表すか。両方を行うことはできないため、70 を 80 に等しくすることはできません。

おそらく、このようなものがあなたが望んでいたものに近いでしょう:

/* test.c
http://stackoverflow.com/questions/15080239/c-how-to-fix-this-algorithm-for-z827-ascii-compression
WARNING: untested code.
*/

int compress(char *filename_ptr){

    int in_fd;
    in_fd = open(filename_ptr, O_RDONLY);

    //set pointer to the end of the file, find file size, then reset position 
    //by closing/opening
    unsigned int file_bytes = lseek(in_fd, 0, SEEK_END);
    close(in_fd);
    in_fd = open(filename_ptr, O_RDONLY);

    //store file contents in buffer
    unsigned char read_buffer[file_bytes];
    read(in_fd, read_buffer, file_bytes);

    //file where the output will be stored
    int out_fd;
    creat("output.txt", 0644);
    out_fd = open("output.txt", O_WRONLY);

    //sets file size in header (needed for decompression, this is the size of the
    //file before compression. everything after this we write this 4-byte int
    //is a 1 byte char
    write(out_fd, &file_bytes, 4);

    unsigned int writer;
    unsigned int temp;
    unsigned char out_char;

    int i;
    int writer_bits = 0; // 0 bits of data in writer so far
    for(i = 0; i < file_bytes; i++){
      // i is the number of (7 bit ASCII) characters
      // read from the input so far.

      // add 7 more bits to the writer
      temp = read_buffer[i];
      //moves the next char's bits to the left, for the purpose of filling the
      //8 bit buffer (writer) via OR operation
      //(avoid overwriting the "writer_bits" of good bits
      //already in the buffer).
      temp = read_buffer[i] << writer_bits;
      writer = writer | temp;
      writer_bits = writer_bits + 7;

      //output right byte of writer
      unsigned int right_byte = writer & 0x000000ff;

      //output right_byte as a char
      out_char = (unsigned char) right_byte;

      // output 8 bits of data whenever
      // we have *at least* 8 bits of data in the writer buffer.
      if(writer_bits >= 8){

          //write_buffer[i] = out_char;
          write(out_fd, &out_char, 1);

          //shift left side of writer to the right by 8
          writer = writer >> 8;

          writer_bits = writer_bits - 8;
      }else{
          // 7 or fewer bits in writer --
      // skip writing until next time.
      }
    }

    // is there any leftover bits still in writer?
    if(writer_bits > 0){
          //write_buffer[i] = out_char;
          write(out_fd, &out_char, 1);
    }
    return 0;
}

(現在、プログラムは入力ファイル全体を RAM に読み込み、次に出力ファイル全体を書き込みます。一部のプログラマーは、一度に少しずつ読み込んでから、一度に少しずつ書き込むことを好みます。どちらのアプローチにも長所と短所があります)。

c - (C) z827 ASCII 圧縮用にこのアルゴリズムを修正するには?

1 に答える 1

Related

Reference