I'm trying to parse a protocol where each message looks something like this:
[01001ACP01010100]
that is, each message has a starting character ([), and end character (]), a 5 byte sequence number, a type (ACP in this case). The data in between is decided by the type.
What I'm looking for is a way to declare the structure of all valid messages in one or more tables, and then make a parser that utilizes that table.
I'd also like a solution that can handle node.js streams and partially transmitted messages.
My first attempt looked something like this:
var sub_parsers = {
"beg" : make_parser(function (char) {return char === "<"}, 1), // start character
"end" : make_parser(function (char) {return char === ">"}, 1), // end character
"seq" : make_parser(isnum, 5), // sequence number
"typ" : make_parser(isupper, 3), // type (must be all uppercase)
};
var order = ["beg", "seq", "typ"];
var make_parser = function (valid, length) {
var buf, ret;
buf = "";
return function (char) {
buf += char;
if (valid(char)) {
if (buf.length === length) {
ret = buf.slice(0);
buf = "";
return ret;
}
} else {
buf = "";
return null;
}
return undefined;
};
return f;
};
Then I keep the current state somewhere, and pump characters the parse function corresponding to my state.
There are several problems with this approach:
- There are some states, such as the "typ" state above, where the actual value parsed affect the parser. I have no way of encoding this in the table above.
- I'd like to have a table, that not only encodes how to parse the messages, but also how to serialize new ones.