Parallel Scroll
Imagine a text file with two different versions. What if we wanted to view them side-by-side and also scroll them so that contents always match while scrolling?
First what pops up in my mind is github, bitbucket diff pages. Changes are highlighted and if we have two versions of one document, what diff tool does is that while scrolling it matches the accompany side so that user easily perceives the difference and this way the scroll is kind of “clever”.
How should “clever” scroll work?
When we have one document with two versions A-initial and B-changed if we scroll on side A and there is additional lines on side B for the corresponding part of the document, than side A should stop while changes is being scrolled on the side B but if there is deleted part on side B than it should stop while A continues scrolling. The same but vise versa is true when the B side is scrolled and A is an accompany.
All this already works just fine with current implementations of side-by-side view when documents are structured and of the same type.
Now imagine that we have two versions of unstructured HTML document, where you can’t count on HTML tags or other structural parts, but the idea is that you A and B sides still have to match while scrolling.
Proportionally scrolling algorithm
Lets start with simplified version and assume that we already have HTML elements defined by id or class, that can be used as markers. In this case algorithm for the scroll is following: for every n-th marker on both side of the document we calculate the distance from n-1 to n then calculate the proportion P=(n-(n-1))/(m-(m-1)) and while scrolling one side of the document for every pixel scrolled we multiply it by P to calculate scroll amount for the accompany side. This is quite simple, but lets have a closer look with an example:
Lets say we have two versions of the document A and B and we have markers: am1, am2, am3 and bm1, bm2, bm3. If we start scrolling on the A side from am1 to am2 , proportion will be P=(bm2-bm1)/(am2-am1) and X is the scroll amount, thus side B will be scrolled by P*X and vise versa if we start scrolling on B side proportion will be P=(am2-am1)/(bm2-bm1) and side A will be scrolled by P*X. When we reach the end of am1-am2 part of the document and move to am2-am3 we’ll just recalculate proportion for the scroll, rest remains the same. Thus our algorithm will always match the markers and hence the documents will be scrolled nicely as long as markers are placed correctly.
Placing markers
Using previous algorithm task is now shrinked to just placing markers correctlly. So how we do that?
Lets divide algorithm for placing markers in 4 steps.
markersFindAndPair()
To find usable markers we have two predefined variables:
requiredLength = 60 // Length of the marker
skipForwardMinSize = 1000 // Minimal distance between markers
Because our document is HTML our algorithm should not perceive tags as a part of the text and HTML structure should not be broken by placing markers.
On this step we create an object every time we find useful marker and save those for both(A,B) sides in markers array.
marker.string = markerString;
marker.startIndexA = startIndexA;
marker.endIndexA = endIndexA;
marker.startIndexB = startIndexB;
marker.endIndexB = endIndexB;
markers.push(marker);
markersDeduplicate()
While searching for marker we have requiredLength variable and we use that to avoid duplicates in markers array but chances are that there will still be duplicates anyway, so we filter our array with marker.string to be unique thus we will have markers array without duplicates.
markersLIS()
For the accompany side with the variable startIndex we calculate Longest Increasing Subsequence and we filter markers array with the result. This will give us markers array with correctly ordered markers, meaning: there won’t be any markers by same index that is at the start for A side and at the bottom for B.
markersPlace()
For this step we have markers array which is filtered from duplicates and markers are correctly matching for both sides of the document.
For placing markers we just have to use startIndex and endIndex variables and using string replace function wrap the content with following span.
<span class="marker marker-{id}">marker.string</span>
Rest is the proportionally scrolling algorithm we already discussed.
To sum up, simple change in task caused huge changes in algorithm but still without any diff algorithms we are able to match unstructured texts and get almost the same result — match documents during the scroll.