The Complexity of Usage Based Billing

It all started off with long, convoluted sales contracts that you had to sign in order to start using a product. Then came the revolution of SaaS subscription models, where you pay a monthly fee regardless of your usage of the service. Unfortunately, a subscription model doesn’t always operate in the interest of the customer or the company. As a byproduct of the limitations of subscription models, we've witnessed the rise of the usage-based billing model. It has the potential to revolutionize the way we consume and pay for services.

What is Usage Based Billing

Usage-based billing is the strategy of monetizing a product in a way that is proportional to the amount a customer consumes it. This creates an alignment of incentives for both the consumer and the producer. A well-known example of this is Uber’s pricing model that has a base price and an amount that increases based on the distance you travel. From startups to tech giants such as AWS, GCP and Azure, a lot of companies have adopted the usage based billing model and its adoption will only increase as more people realize its value. The question then is, what does it take for your business to bill like the giants? Well, the first step is to meter your customers.
‍

What is Metering?

Metering is at the core of usage based billing. Metering is the process of shipping useful data that can then be aggregated and transformed to a bill according to a price plan. The data that I am talking about is found in different parts of your system such as your infrastructure (data storage, network transfer), applications (number of active users, number of events, number of images, number of API calls), and maybe even external services (number of logins, number of emails). Once you grab those meters you need to store them, aggregate them and create prices that can be mapped to the meters and finally use those prices to charge the customers.

Ok, so this sound easy enough right? Well, not really. There are several difficulties in implementing a system to actually do this accurately. Let’s take a look as to what makes this problem so intricate and time consuming and why these giants have hundreds of engineers working on this.

‍

What makes it Complex?

From start to finish, building a usage based billing system isn’t easy. There are several unknowns that are difficult to foresee but cause a lot of problems if they aren’t addressed in the beginning. I can hopefully walk you through the major problems (definitely non-exhaustive) that are important to keep in mind.

Setting up the metering pipeline

There are several sources of data in your system as I discussed before and most companies want to charge based on some combination of those values. How do you collect this data? You need to setup a resilient and scalable data pipeline that any part of your system can send metrics to. If you’ve built this sort of thing before, you know that doing this while ensuring availability and high-throughput is not easy. If your pipeline is down even for a little bit then your entire system will be blocked on sending metering data till it’s up again. Some common solutions to this problem include creating a simple pub-sub mechanism or having a central data lake that parts of your system can directly write to. Each of these have their own tradeoffs. With a centralized data lake you will create a potential bottleneck if there are many input sources whereas with a pub-sub mechanism you will need to handle replication of your input queue to maintain scalability.

‍

So much storage!

‍

Ok, so now you have that metering pipeline all good to go but where are you actually going to store SO MANY metrics. If you have a backend that is rapidly growing and new meters being added everyday, your data growth rate is definitely going to be super-linear if not downright exponential. These metrics are only relevant for post-processing and billing so where are you going to store this data so that you can actually use it down the line. The primary difficulty is the COST combined with data retrieval speeds. Imagine trying to post process the data to create invoices for thousands of customers while being bottlenecked by your slow storage layer. Additionally, you need to carefully manage how to tier/expire your data to walk the fine line between fast reads/writes and the cost of doing so. Again, many options are available including object stores such as AWS S3 all the way up to data warehouses such as Snowflake.

‍

Not so fun aggregations

Pipeline, done. Storage, done. Time to get into the actual meat of the calculations. Aggregations for these varied streams of metrics can be done real-time and/or as a post processing job that is run whenever you want to bill your customer. What sort of aggregations are we talking about here? We need to ensure that our metric streams are labelled or tagged correctly so that they can be attributed to the right customer. Additionally, there should be relevant information about what sort of aggregation functions to use to actually combine the same metric scattered across time.

The first problem is the architecture that can be used for such aggregations has to be scalable to handle high ingest rates while not throttling output. Distributed compute systems such as Spark and fleets of AWS Lambda (or any serverless system offering almost infinite scalability) are good candidates for such an architecture. The bigger problem is that even if you know which system you to use, you need to thoroughly understand it so that you can partition the data well enough in order to make the maximum use of your architecture.

Will it be accurate?

Even if you have gotten far enough to have a well functioning system that can ingest, store and process your usage data, you definitely need to consider the accuracy as well as the edge cases of the system. Issues include:

Double counting: It is imperative to prevent duplicate metrics from double counting in the users bill.
Late arriving data: It is possible that all the data from the previous billing cycle has not arrived when it’s time to charge the user, how do you deal with grace periods?
Correcting old measurements: Sometimes systems behave erroneously and send out wrong metrics but we definitely don’t want to overcharge or undercharge the customer because of this. There needs to be an easy-to-use method of retroactively correcting metrics in order to ensure system integrity.
Proration: Most customers want to be charged fairly. This means billing by the second, so they get charged for EXACTLY what they are using and nothing more. Implementing per second prorating has challenges related to time dependent metrics (discussed below).

‍

Dealing with time dependent metrics

The different intricacies of time dependent metrics are generally overlooked when thinking about post processing data. Let’s take a simple example:

A very commonly used usage based metric to charge the customer is storage. Assume the unit of storage is GB. A customer is using 10 GB of storage, how much do they owe? It clearly depends on the time the storage is allocated (per hour? per month?). So how can you handle this? You could start a timer in your backend but what about if they add another GB? Do you start another timer and if so how do aggregate them?

This example asks a lot of open-ended questions but gets the point across that gauge meters (almost always time dependent) have a lot of overlooked engineering complexity.

Need to handle billing and invoicing

Now it's time to actually charge the customer, luckily for us great payment processors such as Stripe, Paddle and others already exist. Integrating with them is pretty easy but what about some important questions like:

How do I charge for dynamic usage?
What if the customer wants detailed invoicing based on their metered usage?
How do I make the invoice compliant to different standards?
Can I customize the invoice or the email that gets sent with it?

Luckily, this definitely seems to be the easier aspect of building a usage based system.

A Few Parting Words

I wanted to write this in order to go over some of the major problems when trying to either migrate from a purely subscription model or trying to create a usage-based model from scratch. In no way is the list mentioned above exhaustive but hopefully it gave you a deeper insight into traps to watch out for when you are building your solution or looking for an off the shelf product. It’s good to have these problems in the back of your head to ensure that the solution that you end up going with addresses these problems.
‍