Usage-Based Billing on Kubernetes

by Ben Hundley

Multi-tenant cloud infrastructure is the new wave. Monolithic applications are being replaced with microservices, and distributed architectures have produced smaller, more manageable components.

This movement is due in part to containers and Kubernetes -- the paradigm they promote is that software should be treated like “cattle, not pets,” which drives the ecosystem towards larger-scale automation that requires less manual system administration.

But an underappreciated aspect of container orchestration is how it can maximize the number of (isolated) processes running per machine, minimizing unused compute resources and dramatically reducing IT expenses.

Multi-tenancy and accounting troubles

In 2013, I started work on a project that would eventually become a fairly successful Elasticsearch hosting company. We had no official affiliation with Elastic, but at the time they were focused on high-value support contracts, which left a hole for our humble (and cheap!) hosting-based operation to fill.

The first version of our platform was built around a single shared Elasticsearch cluster. That meant we had to handle multi-tenancy at the application level, implementing RBAC in a frontend service that would appropriately segregate users.

Multi-tenant Elasticsearch v1

Unfortunately, version one was fraught with issues. First, it wasn’t exactly stable. One bad (or just reckless) actor could crash the cluster or congest the frontend.

hosted Elasticsearch v1 on fire

But even worse was our pricing system. To avoid a complex pay-as-you-go billing infrastructure, we created pricing tiers based on a customer’s document count. This was confusing for customers since they had to choose a subscription ahead of time, but more importantly, it wasn’t profitable.

As it turns out, counting the number of documents stored by a customer is a poor means of approximating cost -- it only very roughly translates to on-disk bytesize. But even the amount of data stored didn’t tell the whole story.

At the time, we had one choice: every customer needed their own cluster. Instead of a shared Elasticsearch instance, each cluster would comprise its own isolated set of VMs.

hosted Elasticsearch v2

This meant that we had to raise our prices, but it also ensured we would have consistent profit margins on the compute resources we sold. We essentially mirrored the on-demand pricing of the cloud by charging a premium on the base server price.

As the system’s operator, version two (a.k.a. “Dedicated” clusters) was a lifesaver. Sure, someone could still kill their cluster, but it was their cluster. In other words, we had moved the multi-tenancy from the application layer to the infrastructure layer. We didn’t have to have a custom frontend or RBAC implementation; we could just give each cluster its own load balancer.

But unfortunately, sometime in 2014, a salesman for AWS would tell us that we were the most under-utilized (read: wasteful) account in their Southeastern US territory. I was both thrilled and embarrassed by the impact we had made.

hosted Elasticsearch v2 wasteful

More with less

“One of the key advantages of using virtualization in server consolidation, is the possibility to seamlessly ‘pack’ multiple under-utilized systems into a single physical host, thus achieving a better overall utilization of the available hardware resources.“ (from Wikipedia)

In a way, developments in containers (LXC, Docker, etc.) were attempts to solve some of the issues of virtualization. The details are out of the scope of this article, but what’s important is that from a cost perspective, containerization and virtualization provide similar benefits -- we can do more with less.

And so, in 2015, our bloated Elasticsearch service made the switch to Kubernetes, in pursuit of a more responsible footprint.

hosted Elasticsearch on Kubernetes

This final implementation of our service is still running today with minimal human help. When it was rolled out over 5 years ago, we had to migrate hundreds of clusters and many terabytes of data. But we immediately reduced our total footprint by tens of thousands of dollars worth of idle RAM and CPU. Some of those savings were passed to the customer, and some were used to improve our bottom line. It was a major improvement for everyone involved.

A nagging issue

We had a profitable, and now economical multi-tenant service. But there was still an annoying variable to our bill: network costs.

At first I assumed network costs would be negligible -- something like a fraction of a percent of our bill. But as we started to acquire some higher-profile customers, we noticed network charges start to eat into our income.

We got a bill every month telling us exactly how many GBs worth of internet egress we accrued, but that didn’t help us pinpoint our high-traffic customers. That would require better telemetry -- we would need to use our monitoring data to extrapolate charges on a per-customer basis. But even with that, we didn’t have a way to pass metered (pay-as-you-go / all-you-can-eat) charges to the user. Our system was based on the capacity you actively reserved (how many pods you had running).

In the end, variable network charges weren’t an issue that killed our business. But they were an enormous headache -- passing costs to the user, consistently and accurately, seemed to be a pipe dream.

Octane, cloud-native billing infrastructure

Octane pricing models

At Octane, we’re working to make these issues a thing of the past. Our platform is designed to accurately track costs in your Kubernetes environments, enabling proper accounting for multi-tenant internal platforms (IT showback / chargeback), as well as the ability to model and apply custom pricing to customers of a SaaS.

We provide utility billing infrastructure for modern software applications -- invoice compilation, long-term archival, and tools to slice, dice, and visualize in real time. You can get started today by signing up, or if you have any questions you can contact