c - 私のコード用に最も最適化されたstrstr（）のような関数

Question

私はhttpパーサーを書いていて、これらの機能を持っています

int parse_useragent(char* buf, int length){
    buf[length] = '\0';
    if(strstr(buf, "MSIE") != NULL){
        return 1;
    }else if(strstr(buf, "Firefox") != NULL){
        return 2;
    }
    return DEFAULT_USERAGENT;
}

void parse_headers(unsigned char* buf, http_record_t * http){
    char * position = (char*)buf;
    char referer[] = "Referer";
    char useragent[] = "User-Agent";
    ...
    int length = getlinelength(position); // returns length of line
    while(length != 1){ // position points to start of line every iteration of cycle
        if(strncmp(position, useragent, sizeof(useragent)-1) == 0){
            http->useragent = parse_useragent(position, length);
            fprintf(stderr,"parsing useragent \n");
        }else if(strncmp(position, referer, sizeof(referer)-1) == 0){
            fprintf(stderr,"parsing referer \n");
            char * tmp = malloc(REFERER_LENGHT * sizeof(char));
            parse_referer(tmp,position, length);
            strncpy(http->referer,tmp, REFERER_LENGHT * sizeof(char) - 1);
        }else if(...

        position += length + 1;
        length = getlinelength(position);
    }
    return;
}

bufhttpヘッダーの先頭を指します。

私は各ヘッダーのような機能を持っておりparse_useragent、本当にそれらを最適化する必要があります。パケットの長さは通常<1000であり、行の長さが100の値を超えることはめったにありません。このような短い文字列の最適化は、顕著な効果をもたらしますか？

これらのアルゴリズムの中には、行ごとに解析するという異なるアプローチが必要なものがあることを私は知っています。これらの特定の条件下でどちらを選択しますか？

手伝ってくれてありがとう！

score 1 · Accepted Answer

文字列をコードにハードコーディングしてもかまわない場合は、lexがこの種のタスクを実行するための最速のツールになると思います。これは、ソースコードで明示的に有限状態オートマトンを構築するためです。

このタスクを実行するためのサンプルlexコードは次のとおりです。

%option noyywrap
%{
enum Type{
    TFIREFOX = 0, TMSIE = 1
};
enum Type global_variable; /* the variable to store the parsing result */
%}

%%
FIREFOX {global_variable = TFIREFOX; yyterminate();}
MSIE {global_variable = TMSIE; yyterminate();}
. {}
%%

int lex_strstr(char *buf, int n)
{
    global_variable = -1;
    YY_BUFFER_STATE bs = yy_scan_buffer(buf, n);
    yy_switch_to_buffer(bs);
    yylex();
    return global_variable;
}

次のようなファイルに保存し、resulte.lflexを使用してコンパイルしてacヘッダーファイルを取得します。

flex -o head.h result.l

そして、これがどのように機能するかを示す例を次に示します。

#include "head.h"
int main()
{
    {
        char buf[] = "this is a test MSIE string\0\0";
        printf("%d\n", lex_strstr(buf, (sizeof buf)));
    }
    {
        char buf[] = "this is a test FIREFOX string\0\0";
        printf("%d\n", lex_strstr(buf, (sizeof buf)));
    }
    {
        char buf[] = "this is a test MSIEFIREFOX string\0\0";
        printf("%d\n", lex_strstr(buf, (sizeof buf)));
    }
    {
        char buf[] = "this is a test MIEFIEFOXdfa\0\0";
        printf("%d\n", lex_strstr(buf, (sizeof buf)));
    }
}

結果：

score 0 · Accepted Answer

文字列を接頭辞付きの長さに変換します。

C文字列で終了するヌル文字を探すのに多くの時間が無駄になり、長さのプレフィックスが付いた文字列は、それ自体のオーバーヘッドをほとんど追加せずに、その無駄をなくします。

BSDまたはMITライセンスのいずれかであるライブラリが実際にあります（読んでください：誰でも無料で使用できます）。これはかなり最適化されており、バグ数が非常に少ないです。しかし、私の人生では、libの名前を考えることはできません。Safestrかもしれません。Safestrを見つけるためのヒントについてはこちらを、ページが読み込まれる場合はこちらをご覧ください。

c - 私のコード用に最も最適化されたstrstr（）のような関数

2 に答える 2

Related

Reference