Porting Python to Rust/WebAssembly: OpenTally dev log

Background and motivation

pyRCV2 is software for open-source election counting. It is intended to be usable across multiple platforms – as a web application, for the convenience of users less technologically inclined, and as a standalone CLI desktop application, for better performance. To this end, pyRCV2 is written in Python, and transpiled to Javascript using Transcrypt.

To date, this has had a number of advantages. Perhaps most notably, Python enabled rapid prototyping and highly interactive debugging, thanks in part to its flexible nature as a duck-typed interpreted language. Progress has been fast, and (lack of testing and validation excepting) pyRCV2 is probably the most feature-complete software for STV election counting that is publicly available.

However, this workflow has come with a number of disadvantages. Generally speaking, Transcrypt is only able to transpile Python code, not any native extensions; therefore, pyRCV2 would need to be written without using native extensions, which would be very slow. This was foreseen, and I designed pyRCV2 to interface with specialised Python native extensions and Javascript libraries, on desktop and web respectively. However, this does introduce a potential sources of bugs, at the interface layer, and in implementation differences between the extensions/libraries.

Secondly, although the use of transpilation should make development easier by obviating the need to separately maintain desktop and web codebases, subtleties of Transcrypt impair this somewhat. Transcrypt transpiles Python code in a crude manner by walking the AST with minimal contextual knowledge. In order to support operator overloading (used heavily within pyRCV2's counting logic) every use of an operator (including indexing an array!) must then be converted to a function call. This would be very expensive, and so operator overloading must be toggled off and on when needed for individual sections within the code. This introduces a highly frustrating source of bugs, where operator overloading is incorrectly disabled for particular code in Javascript, leading to correct behaviour in Python but incorrect behaviour in Javascript.

Finally, pyRCV2 has reached something of a performance bottleneck. Despite passing through to specialised platform-specific extensions/libraries for the most performance-sensitive tasks, the counting of large elections feels slower than it could be. Counting a ≈350,000 vote election can take up to 15 seconds in Python and 30 seconds in Javascript. While JIT compilation (PyPy) can bring the Python figure down to about 3 seconds, speeding up the interpreted Javascript is much more challenging.

It is this final issue – performance – that ultimately prompted me to look into alternatives.

Initial proofs-of-concept

Supposing that the poor performance of the transpiled Javascript can be chalked up to interpreted-language overhead, a natural solution presents itself in the form of WebAssembly. Particularly for numerical data, compilation to WebAssembly can result in performance gains of several orders of magnitude!

I also wanted to preserve the ability for the same codebase to be run on both desktop and web, so it would be ideal to find a language that can compile for both (not merely have an interpreter hosted on WebAssembly!). The list of options is surprisingly short, but does include Rust.

I know of Rust (primarily from /r/programmingcirclejerk…) though have never worked with it before. While fearless concurrency is perhaps of limited relevance to this project, zero-cost abstractions and a focus on performance have clear value. Rust's promised memory safety guarantees also have appeal to me as someone who usually works with high-level languages and would probably write more memory leaks than features were I to be using C.

I am keenly aware, though, of the need not to get carried away with the new shiny thing! Perhaps Rust/WebAssembly would not result in any performance gains, and any time spent rewriting pyRCV2 would be wasted! Some proofs-of-concept are in order.

The first proof of concept involved reading and parsing a BLT file, including all the ballot weights. This involves file/string parsing, and instantiating a large amount of number objects (in this proof of concept, using rational arithmetic). In Python, this could take over 2 seconds for a large election.

I wrote a quick Python script to call the BLT reading functions of the pyRCV2 library, and ran it on the 2019 Tasmanian Senate election. The election includes 351,988 votes, 44 candidates and 6 seats. Across 5 runs, the mean execution time (±95%CI) was 2.36 (±0.02) seconds. Using PyPy, this was reduced to 0.720 (±0.005) seconds.

I then sketched up an equivalent Rust program to do the same. To my happy surprise, the mean execution time (±95%CI) was a mere 0.178 (±0.002) seconds! This represents an over 3× increase in speed compared with PyPy!

Based on these promising results, I wrote an additional proof-of-concept, also including the distribution of first preferences. This is typically the most computationally expensive stage of the count, as every ballot paper must be examined. The Rust implementation was again faster, so I followed this up with one further proof-of-concept, running through an entire election count in Rust, compared with pyRCV2 (revision 339e149) with equivalent options --numbers rational --surplus-order order --method uig --round-votes 0 --round-quota 0.

This time, the Rust implementation was still faster than PyPy, but only by a disproportionately narrow margin. Profiling using perf revealed that a large amount of CPU time was spent cloning the state of the count after each stage. In pyRCV2, storing copies of previous stage states is used for breaking ties, and for rolling back an election to meet constraints. However, this represents a substantial amount of information, and was clearly very computationally expensive in Rust. Breaking ties based on previous stage states could be implemented without cloning the entire state, and so I adjusted the Rust code to not clone the states.

The results are summarised in the figure below. Error bars are shown, but some are too small to be seen.

For the full count, the mean execution time (±95%CI) was reduced from 2.17 (±0.04) seconds with PyPy to 0.460 (±0.001) seconds with Rust, nearly a 4× increase in speed.

Progress to date

Following the success of these proofs-of-concept, a substantial portion of pyRCV2's feature set has now been ported to Rust/WebAssembly as OpenTally. By way of benchmarking, the figure below compares the performance of pyRCV2 revision 339e149 (Python/PyPy/Transcrypt) with OpenTally revision de7a2ad (Rust/WebAssembly), across various different numeric representations. The election was the aforementioned 2019 Tasmanian Senate election, and was counted using Senate rules (--quota droop --quota-criterion geq --round-quota 0 --pp-decimals 2 --round-votes 0 --surplus uig --surplus-order by_order --exclusion by_value).

For context, Float represents native 64-bit floating-point numbers in all platforms. Fixed represents fixed-point arithmetic, using custom implementations based on decimal (Python), big.js (Transcrypt) and ibig (Rust/WebAssembly). Rational represents rational arithmetic, using fractions (Python), BigRational.js (Transcrypt), rug/GMP (Rust) and num-rational (WebAssembly).

Interestingly, in Python/PyPy, the count using fixed-point arithmetic appears to be significantly slower than using rational arithmetic. Probably this is due to inefficiencies introduced by my custom code, but may speak to the efficiency of Python's fractions library.

For a more apples-to-apples comparison, the following graph compares only the fastest desktop platforms:

The Rust implementation is significantly faster across each mode; roughly between 3× and 10× faster. Each election is counted in Rust in under 1 second.

And turning to the entire point of this endeavour, the following graph compares the web-based platforms:

Again, the Rust-based WebAssembly implementation is significantly faster across the board; roughly between 4× and 11× faster. Using floating- and fixed-point arithmetic, the election is counted in the browser using WebAssembly in under 1 second, more than an order of magnitude better than the ≥10 seconds seen using Transcrypt.

WebAssembly is also faster than Transcrypt using rational arithmetic, but the difference is smaller than the other modes, and smaller than the corresponding difference between desktop platforms. This is likely due to the use of the pure Rust num-rational instead of rug/GMP, as rug/GMP requires libc and is not available for WebAssembly.

Tradeoffs

These performance gains seen with WebAssembly are not without tradeoffs. Obviously, older browsers do not support WebAssembly, potentially reducing the accessibility of OpenTally. Additionally, the compiled WebAssembly is larger than the equivalent Javascript produced by Transcrypt. One factor contributing to this is Rust's monomorphisation of generics, which effectively results in one copy of the count implementation per numeric mode.

At the time of writing, pyRCV2 is approximately 102.2 KiB of Javascript when minified. When gzip-compressed (level 9), this amounts to 21.4 KiB to download.

On the other hand, OpenTally is currently 520.6 KiB of binary WebAssembly, and 17.2 KiB of minified Javascript. When gzip-compressed (level 9), this sums to 153.9 KiB to download.

Note also that OpenTally currently implements only a subset of pyRCV2's features. As OpenTally receives further development, this difference in file size is bound only to grow.

OpenTally is currently being developed at https://yingtongli.me/opentally/.