Last time, we investigated the HTML5 viewer for a document delivery DRM system, rehosting the viewer to give us unlimited access to documents – but only through the standard print procedure, which inserts watermarks and copyright information. This time, we'll investigate how we can interact with the JavaScript in the viewer to directly and automatically convert the document into a standard format.

As a ‘hip’, ‘with it’ modern HTML5/JavaScript single-page application, this viewer renders its preview of the document to an SVG container:

<div id="previewDisplay">
<svg id="previewSvg">...</svg>
<i class="fas fa-angle-left" id="previewPageLeft"></i>
<i class="fas fa-angle-right" id="previewPageRight"></i>
</div>


A consequence of this is that the SVG preview data forms a complete, high-quality, scalable vector version of the document data, perfect for conversion to a PDF format. We don't need to reverse-engineer the document format to work out how to render it, since the JavaScript code has done it for us already!

Naturally, we could go through page by page and manually save the SVG data to a file, but we can do better. Examining the JavaScript window object, we notice that the developers have carelessly left various functions strewn about the global namespace. One of these is conveniently titled changePage. And from our explorations in the previous part, we know how to get the total number of pages in the document (PageCount).

With this, the developers have kindly given us all we need to write a simple script to iterate over all the pages and get the SVG data. We store it in a single JSON array, and download it using FileSaver.js to condense everything into a single file:

var script = document.createElement('script');
script.src = 'https://cdn.jsdelivr.net/npm/file-saver@2.0.2/dist/FileSaver.min.js';
document.body.appendChild(script);

var data = [];

for (var i = 0; i < viewer1.H.m.PageCount; i++) {
data.push(document.getElementById('previewSvg').outerHTML);
changePage(1);
}

var blob = new Blob([JSON.stringify(data)], {type: "application/json;charset=utf-8"});
saveAs(blob, "svgs.json");


All we need now is a simple Python script to extract each of the SVG images, use the Inkscape CLI tool to convert them to PDF, and combine them into a single PDF file using Ghostscript:

import json
import os
import subprocess

with open('svgs.json', 'r') as f:
data = json.load(f)

pdfs = []
for i, content in enumerate(data):
with open('tmp.svg', 'w') as f:
f.write(content)
pdfs.append('tmp' + str(i+1) + '.pdf')
subprocess.run(['inkscape', '-A', pdfs[-1], 'tmp.svg'])

subprocess.run(['gs', '-sDEVICE=pdfwrite', '-o', 'Output.pdf'] + pdfs)

for pdf in pdfs:
os.remove(pdf)
os.remove('tmp.svg')