Porting Python to Rust/WebAssembly: OpenTally dev log
Background and motivation
To date, this has had a number of advantages. Perhaps most notably, Python enabled rapid prototyping and highly interactive debugging, thanks in part to its flexible nature as a duck-typed interpreted language. Progress has been fast, and (lack of testing and validation excepting) pyRCV2 is probably the most feature-complete software for STV election counting that is publicly available.
It is this final issue – performance – that ultimately prompted me to look into alternatives.
I also wanted to preserve the ability for the same codebase to be run on both desktop and web, so it would be ideal to find a language that can compile for both (not merely have an interpreter hosted on WebAssembly!). The list of options is surprisingly short, but does include Rust.
I know of Rust (primarily from /r/programmingcirclejerk…) though have never worked with it before. While fearless concurrency is perhaps of limited relevance to this project, zero-cost abstractions and a focus on performance have clear value. Rust's promised memory safety guarantees also have appeal to me as someone who usually works with high-level languages and would probably write more memory leaks than features were I to be using C.
I am keenly aware, though, of the need not to get carried away with the new shiny thing! Perhaps Rust/WebAssembly would not result in any performance gains, and any time spent rewriting pyRCV2 would be wasted! Some proofs-of-concept are in order.
The first proof of concept involved reading and parsing a BLT file, including all the ballot weights. This involves file/string parsing, and instantiating a large amount of number objects (in this proof of concept, using rational arithmetic). In Python, this could take over 2 seconds for a large election.
I wrote a quick Python script to call the BLT reading functions of the pyRCV2 library, and ran it on the 2019 Tasmanian Senate election. The election includes 351,988 votes, 44 candidates and 6 seats. Across 5 runs, the mean execution time (±95%CI) was 2.36 (±0.02) seconds. Using PyPy, this was reduced to 0.720 (±0.005) seconds.
I then sketched up an equivalent Rust program to do the same. To my happy surprise, the mean execution time (±95%CI) was a mere 0.178 (±0.002) seconds! This represents an over 3× increase in speed compared with PyPy!
Based on these promising results, I wrote an additional proof-of-concept, also including the distribution of first preferences. This is typically the most computationally expensive stage of the count, as every ballot paper must be examined. The Rust implementation was again faster, so I followed this up with one further proof-of-concept, running through an entire election count in Rust, compared with pyRCV2 (revision 339e149) with equivalent options
--numbers rational --surplus-order order --method uig --round-votes 0 --round-quota 0.
This time, the Rust implementation was still faster than PyPy, but only by a disproportionately narrow margin. Profiling using perf revealed that a large amount of CPU time was spent cloning the state of the count after each stage. In pyRCV2, storing copies of previous stage states is used for breaking ties, and for rolling back an election to meet constraints. However, this represents a substantial amount of information, and was clearly very computationally expensive in Rust. Breaking ties based on previous stage states could be implemented without cloning the entire state, and so I adjusted the Rust code to not clone the states.
The results are summarised in the figure below. Error bars are shown, but some are too small to be seen.
For the full count, the mean execution time (±95%CI) was reduced from 2.17 (±0.04) seconds with PyPy to 0.460 (±0.001) seconds with Rust, nearly a 4× increase in speed.
Progress to date
Following the success of these proofs-of-concept, a substantial portion of pyRCV2's feature set has now been ported to Rust/WebAssembly as OpenTally. By way of benchmarking, the figure below compares the performance of pyRCV2 revision 339e149 (Python/PyPy/Transcrypt) with OpenTally revision de7a2ad (Rust/WebAssembly), across various different numeric representations. The election was the aforementioned 2019 Tasmanian Senate election, and was counted using Senate rules (
--quota droop --quota-criterion geq --round-quota 0 --pp-decimals 2 --round-votes 0 --surplus uig --surplus-order by_order --exclusion by_value).
For context, Float represents native 64-bit floating-point numbers in all platforms. Fixed represents fixed-point arithmetic, using custom implementations based on decimal (Python), big.js (Transcrypt) and ibig (Rust/WebAssembly). Rational represents rational arithmetic, using fractions (Python), BigRational.js (Transcrypt), rug/GMP (Rust) and num-rational (WebAssembly).
Interestingly, in Python/PyPy, the count using fixed-point arithmetic appears to be significantly slower than using rational arithmetic. Probably this is due to inefficiencies introduced by my custom code, but may speak to the efficiency of Python's fractions library.
For a more apples-to-apples comparison, the following graph compares only the fastest desktop platforms:
The Rust implementation is significantly faster across each mode; roughly between 3× and 10× faster. Each election is counted in Rust in under 1 second.
And turning to the entire point of this endeavour, the following graph compares the web-based platforms:
Again, the Rust-based WebAssembly implementation is significantly faster across the board; roughly between 4× and 11× faster. Using floating- and fixed-point arithmetic, the election is counted in the browser using WebAssembly in under 1 second, more than an order of magnitude better than the ≥10 seconds seen using Transcrypt.
WebAssembly is also faster than Transcrypt using rational arithmetic, but the difference is smaller than the other modes, and smaller than the corresponding difference between desktop platforms. This is likely due to the use of the pure Rust num-rational instead of rug/GMP, as rug/GMP requires libc and is not available for WebAssembly.
Note also that OpenTally currently implements only a subset of pyRCV2's features. As OpenTally receives further development, this difference in file size is bound only to grow.
OpenTally is currently being developed at https://yingtongli.me/opentally/.