Diff algorithm, based on "An O(ND) Difference Algorithm and its Variations",
by Eugene Myers.
The basic idea is to put the line numbers of text A as columns ("x") and the
lines of text B as rows ("y"). Now you try to find the shortest "edit path"
from the upper left corner to the lower right corner, where you can always go
horizontally or vertically, but diagonally from (x,y) to (x+1,y+1) only if
line x in text A is identical to line y in text B.
Myers' fundamental concept is the "furthest reaching D-path on diagonal k": a
D-path is an edit path starting at the upper left corner and containing
exactly D non-diagonal elements ("differences"). The furthest reaching D-path
on diagonal k is the one that contains the most (diagonal) elements which
ends on diagonal k (where k = y - x).
Example:
H E L L O W O R L D
____
L \___
O \___
W \________
Since every D-path has exactly D horizontal or vertical elements, it can only
end on the diagonals -D, -D+2, ..., D-2, D.
Since every furthest reaching D-path contains at least one furthest reaching
(D-1)-path (except for D=0), we can construct them recursively.
Since we are really interested in the shortest edit path, we can start
looking for a 0-path, then a 1-path, and so on, until we find a path that
ends in the lower right corner.
To save space, we do not need to store all paths (which has quadratic space
requirements), but generate the D-paths simultaneously from both sides. When
the ends meet, we will have found "the middle" of the path. From the end
points of that diagonal part, we can generate the rest recursively.
This only requires linear space.
The overall (runtime) complexity is:
O(N * D^2 + 2 * N/2 * (D/2)^2 + 4 * N/4 * (D/4)^2 + ...)
= O(N * D^2 * 5 / 4) = O(N * D^2),
(With each step, we have to find the middle parts of twice as many regions as
before, but the regions (as well as the D) are halved.)
So the overall runtime complexity stays the same with linear space, albeit
with a larger constant factor.