SaaS Inception: The gotchas of dogfooding
With SaaS inception, we dogfood our product and benefit from using it.
It all started off with long, convoluted sales contracts that you had to sign in order to start using a product. Then came the revolution of SaaS subscription models, where you pay a monthly fee regardless of your usage of the service. Unfortunately, a subscription model doesn’t always operate in the interest of the customer or the company. As a byproduct of the limitations of subscription models, we've witnessed the rise of the usage-based billing model. It has the potential to revolutionize the way we consume and pay for services.
Usage-based billing is the strategy of monetizing a product in a way that is proportional to the amount a customer consumes it. This creates an alignment of incentives for both the consumer and the producer. A well-known example of this is Uber’s pricing model that has a base price and an amount that increases based on the distance you travel. From startups to tech giants such as AWS, GCP and Azure, a lot of companies have adopted the usage based billing model and its adoption will only increase as more people realize its value. The question then is, what does it take for your business to bill like the giants? Well, the first step is to meter your customers.
Metering is at the core of usage based billing. Metering is the process of shipping useful data that can then be aggregated and transformed to a bill according to a price plan. The data that I am talking about is found in different parts of your system such as your infrastructure (data storage, network transfer), applications (number of active users, number of events, number of images, number of API calls), and maybe even external services (number of logins, number of emails). Once you grab those meters you need to store them, aggregate them and create prices that can be mapped to the meters and finally use those prices to charge the customers.
Ok, so this sound easy enough right? Well, not really. There are several difficulties in implementing a system to actually do this accurately. Let’s take a look as to what makes this problem so intricate and time consuming and why these giants have hundreds of engineers working on this.
From start to finish, building a usage based billing system isn’t easy. There are several unknowns that are difficult to foresee but cause a lot of problems if they aren’t addressed in the beginning. I can hopefully walk you through the major problems (definitely non-exhaustive) that are important to keep in mind.
There are several sources of data in your system as I discussed before and most companies want to charge based on some combination of those values. How do you collect this data? You need to setup a resilient and scalable data pipeline that any part of your system can send metrics to. If you’ve built this sort of thing before, you know that doing this while ensuring availability and high-throughput is not easy. If your pipeline is down even for a little bit then your entire system will be blocked on sending metering data till it’s up again. Some common solutions to this problem include creating a simple pub-sub mechanism or having a central data lake that parts of your system can directly write to. Each of these have their own tradeoffs. With a centralized data lake you will create a potential bottleneck if there are many input sources whereas with a pub-sub mechanism you will need to handle replication of your input queue to maintain scalability.
Ok, so now you have that metering pipeline all good to go but where are you actually going to store SO MANY metrics. If you have a backend that is rapidly growing and new meters being added everyday, your data growth rate is definitely going to be super-linear if not downright exponential. These metrics are only relevant for post-processing and billing so where are you going to store this data so that you can actually use it down the line. The primary difficulty is the COST combined with data retrieval speeds. Imagine trying to post process the data to create invoices for thousands of customers while being bottlenecked by your slow storage layer. Additionally, you need to carefully manage how to tier/expire your data to walk the fine line between fast reads/writes and the cost of doing so. Again, many options are available including object stores such as AWS S3 all the way up to data warehouses such as Snowflake.
Pipeline, done. Storage, done. Time to get into the actual meat of the calculations. Aggregations for these varied streams of metrics can be done real-time and/or as a post processing job that is run whenever you want to bill your customer. What sort of aggregations are we talking about here? We need to ensure that our metric streams are labelled or tagged correctly so that they can be attributed to the right customer. Additionally, there should be relevant information about what sort of aggregation functions to use to actually combine the same metric scattered across time.
The first problem is the architecture that can be used for such aggregations has to be scalable to handle high ingest rates while not throttling output. Distributed compute systems such as Spark and fleets of AWS Lambda (or any serverless system offering almost infinite scalability) are good candidates for such an architecture. The bigger problem is that even if you know which system you to use, you need to thoroughly understand it so that you can partition the data well enough in order to make the maximum use of your architecture.
Even if you have gotten far enough to have a well functioning system that can ingest, store and process your usage data, you definitely need to consider the accuracy as well as the edge cases of the system. Issues include:
The different intricacies of time dependent metrics are generally overlooked when thinking about post processing data. Let’s take a simple example:
A very commonly used usage based metric to charge the customer is storage. Assume the unit of storage is GB. A customer is using 10 GB of storage, how much do they owe? It clearly depends on the time the storage is allocated (per hour? per month?). So how can you handle this? You could start a timer in your backend but what about if they add another GB? Do you start another timer and if so how do aggregate them?
This example asks a lot of open-ended questions but gets the point across that gauge meters (almost always time dependent) have a lot of overlooked engineering complexity.
Now it's time to actually charge the customer, luckily for us great payment processors such as Stripe, Paddle and others already exist. Integrating with them is pretty easy but what about some important questions like:
Luckily, this definitely seems to be the easier aspect of building a usage based system.
There’s always room to spice it up by adding discounts, coupons and free trials. You want to provide an easy way for your customers to try out your product while still making them stick to a price plan once the free trial is over. One common method of free trials is usage based. You need to figure when a customer hits a usage cap and then stop their free trial, determine the usage tier they fit in and starting billing them. As you can imagine, tracking this logic is not straightforward.
What about discounts/promo codes on usage based metrics? There’s definitely some nuances here too. Let’s say you give a customer a 15% off discount promo code, when this customer applies it, should it count towards the entire billing cycle, should it automatically prorate the usage amount to the remaining part of the cycle? These interesting cases need to be dealt with while generating the customer invoice and charging them but also the system needs to be flexible enough to handle a variety of options.
As a result of how dynamic usage based billing is, enhanced visibility has become a necessity for end customers, vendors and regulators.
In order to achieve enhanced visibility, your usage based billing system needs to be augmented to have a dashboard layer, a long term archival layer and also a way of computing value metrics that are useful for your business.
I wanted to write this in order to go over some of the major problems when trying to either migrate from a purely subscription model or trying to create a usage-based model from scratch. In no way is the list mentioned above exhaustive but hopefully it gave you a deeper insight into traps to watch out for when you are building your solution or looking for an off the shelf product. It’s good to have these problems in the back of your head to ensure that the solution that you end up going with addresses these problems.
This blog is the first in a series of blogs about the complexity of usage based billing. Stay tuned for more!
Please feel free to contact me at firstname.lastname@example.org if you have any questions or want to learn about how Octane solves this problem end-to-end!
With SaaS inception, we dogfood our product and benefit from using it.