Your question is very interesting, but the title is misleading.
This is what you need in terms of data model (x - first string, y - second string, * - distance matrix).
y <-- first string (scrolls from top down)
y
x x x x x x x x <- second string (scrolls from left to right)
y * * *
y * * *
y * * * <-- distance matrix (a donut) scrolls together with strings
and grows/shrinks when needed, as explained below
y
Have two relatively long (but still << N) character buffers and relatively small ( << buffers size) rectangular (start from square) distance matrix.
Make the matrix a donut - bi-dimentional ring buffer (can use the one from boost, or just std::deque).
When string fragments currently covered by the matrix are 100% match shift both buffers by one, rotate the donut around both axes, recalculating one new row/column in the distance matrix.
When match is <100% and is less than configured threshold then grow the size of the both dimensions of the donut without dropping any rows/columns and do it until either match gets above the threshold or you reach the maximum donut size. When match ratio hits the threshold from the below you need to scroll donut discarding head of x and y buffers and at the same time aligning them (only X needs moving by 1 when the distance matrix tells that X[i] does not exist in Y, but X[i+1,i+m] matches Y[j, j+m-1]).
As a result you will have a simple yet very efficient heuristic diff engine with deterministic limited memory footprint and all memory can be pre-allocated at startup so no dynamic allocation will slow it down at runtime.
Apache v2 license, in case you decide to go for it.