Motivation

Double-entry bookkeeping is the foundation of modern accounting. I have recently been developing DrCr, open-source software for double-entry bookkeeping, which has proved fruitful – a further article about the software will be forthcoming. In simple terms, generating accounting reports from double-entry transaction data can be sketched out as follows:

  1. Get transaction data from the database
  2. Compute running totals across all transactions, for each account (trial balance)
  3. Transfer entries from the trial balance to the balance sheet or income statement, as appropriate

This was initially implemented in an imperative style, with code flow essentially mirroring this sequence. For example, GetDBTransactions() is called by GetTrialBalance(), which is in turn called by GetBalanceSheet(), generating the final report.

Complexity began to emerge when implementing facilities to calculate income tax and programmatically generate additional transactions representing income tax expenses, tax withholding, etc. The income tax step depends on the existing account balances (trial balance), and the transactions generated must in turn be reflected in an updated trial balance. In a stepwise imperative style, the sketch now looks like:

  1. Get transaction data from the database
  2. Compute running totals across all transactions, for each account (trial balance)
  3. Generate income tax transactions from the trial balance
  4. Update the trial balance using the income tax transactions
  5. Transfer entries from the updated trial balance to the balance sheet or income statement

Several additional considerations conspire to make the stepwise way of thinking about the process increasingly inapposite. Firstly, it is desirable to allow for extensibility, so that a user-supplied plugin can supply its own steps and modify the transactions or balances at any stage of the process – for example, the income tax steps could be implemented as a plugin so it could be disabled if not required, or swapped out for a different plugin if the user has different tax requirements.

Secondly, it is desirable to generate comparative reports; for example, one report for each month of the year. Using a naive stepwise approach, we could repeat the set of steps 12 times, one for each month, but this would be inefficient due to duplicating work. The closing balances at the end of January, for example, are also the opening balances at the start of February, so we need compute this only once, and not once in each month.

Thirdly, particularly if generating comparative reports, we could improve efficiency by running time-consuming operations in parallel. For example, the balances for each month could each be obtained from the database in parallel. However, for other steps, we do not easily know which steps are independent of others (and can be parallelised) and which are not.

Fourthly, generating trial balances by iterating over transactions and tallying running totals is computationally inefficient. The efficiency of this step could be improved by caching trial balances in the database. Step 1 can then be skipped entirely, and step 2 can be obtained directly from the database. However, because of the need for extensibility, it is possible that some subsequent plugin-supplied step may require the individual transaction data. If we naively start at step 1 and proceed step-by-step, we do not easily know at step 1 whether it can be skipped or not.

Based on these considerations, it becomes apparent that the process might be more aptly represented as a graph, rather than a linear sequence of steps. For example:

Graph sketching calculation steps

Approach

In DrCr, abstractions are implemented to describe the steps of generating reports. 3 core types of reporting products are identified, which abstracts the idea discussed above that some steps, say, depend on balance information, whereas others require more granular transaction information. The 3 types of reporting products are:

  • transaction list (Transactions)
  • cumulative balances for all accounts (trial balance) at a particular date (BalancesAt)
  • net balances between two dates (BalancesBetween)

For example, a balance sheet (‘as at xxx’) is generated from BalancesAt data, whereas an income statement (‘for the year ended xxx’) is generated from BalancesBetween data.

Reporting steps generate one or more reporting products, and may in turn depend on zero or more previous reporting products. For example, if CombineOrdinaryTransactions represents all transactions prior to income tax, the CalculateIncomeTax reporting step depends on CombineOrdinaryTransactions.BalancesBetween, and in turn generates CalculateIncomeTax.Transactions. When the income tax plugin is enabled, the balance sheet will depend on CalculateIncomeTax.BalancesAt, which combines CombineOrdinaryTransactions.BalancesAt with the transactions generated by CalculateIncomeTax.

Finally, reporting step builders describe how to dynamically generate reporting steps based on other reporting steps and reporting products. For example, the CalculateIncomeTax reporting step generates CalculateIncomeTax.Transactions, whereas the balance sheet depends instead on CalculateIncomeTax.BalancesAt. A built-in reporting step builder called UpdateBalancesAt notices that CalculateIncomeTax generates Transactions from BalancesAt, and describes how to combine the two to produce an updated BalancesAt product.

The following graph – generated by DrCr! – illustrates all reporting steps and reporting products required to generate a basic balance sheet and income statement. A rectangular node indicates a reporting step, a round node indicates a reporting product, and a dashed border indicates a reporting step generated by a reporting step builder.

Graph of DrCr calculation steps

Generating a report in DrCr proceeds by first specifying the target reporting products. For example, if we require a balance sheet, we know we must generate the AllTransactions.BalancesAt product. We then recursively identify steps which build that product (considering reporting step builders when required), and the reporting products those steps depend on. The recursive process terminates when we reach steps (such as getting balances from the database) which have no dependencies. This yields a directed acyclic graph, such as that above.

A topological sorting of the graph yields an order in which we can execute the reporting steps to generate the final product. Alternatively and more efficiently, we can spawn threads to execute in parallel every step which either has no dependencies or whose dependencies are already satisfied, queueing further steps for execution once their dependencies are generated.

To illustrate how this method is extensible, we can add more steps to the graph:

  1. In addition to DBBalances we can add another step called PostUnreconciledStatementLines, which CalculateIncomeTax will also depend on. The balances for PostUnreconciledStatementLines must be generated by iterating over transactions, and (unlike DBBalances) are not directly obtained from the database.
  2. We add steps CurrentYearEarningsToEquity and RetainedEarningsToEquity, which transfer the balances of all income statement accounts to equity on the balance sheet. These steps are applied only for the balance sheet, and not for the income statement.

This yields the following graph – also generated by DrCr:

Graph of more complex DrCr calculation steps

Note that:

  1. The graph correctly shows that PostUnreconciledStatementLines.BalancesAt depends on PostUnreconciledStatementLines.Transactions, whereas DBBalances.BalancesAt has no dependencies because it can be directly generated and no other step depends on the underlying transactions.
  2. The graph correctly shows that CurrentYearEarningsToEquity and RetainedEarningsToEquity are executed only when generating the balance sheet, and not the income statement.

Using this approach on a database with 10,000 postings, DrCr can generate 12 comparative balance sheet reports in 200 milliseconds, compared with several seconds under the naive stepwise implementation.

Source code for this implementation is in the DrCr repository.