Reducing binary size with DIY dynamic dispatch: OpenTally dev log

Background

OpenTally is open source software for the counting of preferential voting elections. One feature of OpenTally is support for various different representations of numbers, from floating-point arithmetic through to exact rational representations.

In OpenTally, these are implemented using generics. For example, consider the following (simplified) example of one STV-related function:

/// Determine if the given candidate meets the quota
fn meets_quota<N: Number>(quota: &N, count_card: &CountCard<N>) -> bool {
	return count_card.votes >= *quota;
}

Most STV-related functions, then, are generic on the type of numeric representation. In Rust, generic functions are implemented using monomorphisation. The consequence is effectively that each type of numeric representation carries an entire copy of the STV counting code within the compiled binary.

For example, let us examine the contents of the compiled OpenTally WebAssembly binary to see what functions are taking this most space:

$ twiggy top html/opentally_bg.wasm
Shallow Bytes │ Shallow % │ Item
──────────────┼───────────┼──────────────────────────────────────────────────────────────────
        93328 ┊     7.99% ┊ "function names" subsection
        43191 ┊     3.70% ┊ data[0]
        13080 ┊     1.12% ┊ num_bigint::biguint::multiplication::mac3::ha37477f8913e22e7
        12014 ┊     1.03% ┊ core::slice::sort::recurse::hf136fbd531783fe0
         9793 ┊     0.84% ┊ opentally::stv::gregory::distribute_surpluses::h6eb01d642cc78e1b
         9104 ┊     0.78% ┊ opentally::stv::gregory::exclude_candidates::ha2137153cbdca7ad
         8814 ┊     0.75% ┊ opentally::stv::gregory::distribute_surpluses::hc660939b0891ae99
         8653 ┊     0.74% ┊ opentally::stv::gregory::distribute_surpluses::h7d4a0e91381a201e
         8149 ┊     0.70% ┊ sha2::sha256::soft::compress::hffa8886738bfb499
         8124 ┊     0.70% ┊ opentally::stv::gregory::exclude_candidates::h918bf262647b6830
         7946 ┊     0.68% ┊ opentally::stv::gregory::exclude_candidates::h8677c5d8e9352129
...

We can see that, for example, the functions opentally::stv::gregory::distribute_surpluses and opentally::stv::gregory::exclude_candidates have been duplicated multiple times, one for each type of numeric representation. The same is true of most of the STV-related functions.

In an earlier dev log we noted that OpenTally was quite large – 153.9 KiB gzip-compressed, compared to only 21.4 KiB for a previous JavaScript implementation. Since then, with the addition of further features, the difference has only grown, with OpenTally now clocking in at about 196.5 KiB gzip-compressed.

It is natural to surmise that some of this difference may be due to Rust's handling of generics. In the JavaScript, we can make use of duck typing to reuse the same code for different numeric representations, but Rust's monomorphisation results in one separate implementation per type.

Rust provides an alternative to monomorphisation in the form of trait objects with dynamic dispatch, but this cannot be used when methods have the return type Self, which our number types make heavy use of (e.g. fn add(self, rhs: Self) -> Self). Instead, we can implement our own quasi-‘dynamically dispatched’ wrapper around our numeric representations, instead of using generics.

The approach

The general approach is to implement an additional numeric representation, which we call DynNum, as a wrapper around the other numeric representations:¹

pub union DynNum {
	fixed: Fixed,
	gfixed: GuardedFixed,
	float64: NativeFloat64,
	rational: Rational,
}

Based on a global static flag, DynNum then passes through arithmetic operations to the underlying concrete types. For example:

static mut KIND: NumKind = NumKind::Fixed;

impl Add for DynNum {
	type Output = Self;
	fn add(self, rhs: Self) -> Self::Output {
		unsafe { // static mut and union field read are unsafe
			match KIND {
				NumKind::Fixed => {
					DynNum { fixed: self.fixed + rhs.fixed }
				}
				NumKind::GuardedFixed => {
					DynNum { gfixed: self.gfixed + rhs.gfixed }
				}
				// ... etc.
			}
		}
	}
}

In this way, there only needs to be a single instance of meets_quota and other STV-related functions implemented on DynNum, avoiding the duplication caused by monomorphisation. The tradeoff is that each numeric operation will involve a branch and additional indirection, which might impact performance.

Benchmarks

With the use of DynNum instead of monomorphisation, the uncompressed size of the compiled WebAssembly falls from 1.1 MiB (revision c070ec8) to 603.3 KiB (revision dde7952 in the dynnum branch), a reduction of 47%. When gzip compressed, the equivalent fall is from 272.0 KiB to 189.4 KiB. The smaller reduction in compressed size of 30% is understandable, given that the monomorphised instances of each function are probably highly similar.

In terms of performance, benchmarks were performed under the same terms as an earlier dev log, with the 351,988-vote 2019 Tasmanian Senate election and preset Australian Senate STV rules. Only WebAssembly implementations were compared. The mean execution time (±95%CI) was 4.57 (±0.07) seconds with monomorphisation, compared with 4.90 (±0.09) seconds with DynNum, which represents a slight 6.6% decrease in speed.

Discussion

The results of the benchmarks show that the DynNum strategy reduced compressed code size by 30% with a measurable but minimal impact on performance. Due to the small magnitude of the size reduction in absolute terms (less than 100 KiB compressed), and additional complexity of managing the DynNum code, this will not be merged into the master branch of OpenTally at this time, but remains a promising option for the future.

Footnotes

There are some hidden details here, such as that most of these fields must be wrapped by ManuallyDrop and we must explicitly handle dropping them, but we omit those details in this discussion for simplicity. ↩