Home of internet privacy

ExpressVPN’s “License Expired” app error: What actually happened?

On Thursday, June 29, 2017, we experienced a technical problem that caused some customers to incorrectly see a “license expired” message in the ExpressVPN apps. Affected customers were required to log out and log back into the apps to regain access to the VPN.

This post explains what caused the problem and the steps we’re taking to avoid such events from reoccurring.

What went wrong? The “license expired” error timeline:

System Diagram

To understand the root causes and follow-ups, here is a simplified version of the architecture of the affected system:

Why the “license expired” error happened

Cascading failures occurred in:

In summary:

The causes were a combination of misconfiguration and fragile design in a rarely used feature. Unfortunately, the bug triggered a state reserved for the rarely used volume discounting feature which impacted a large number of customers.

Follow-ups we’re taking to prevent such problems re-occurring

  1. We’re updating our apps to:
    • Change the definition of the “license expired” state to be defined positively. Apps will enter the license expired state only when specific error codes are present and not when data is absent.
    • Improve the definition of good quality data. Ignore incomplete data and try again later.
  2. In the backend system that created the invalid data, we are:
    • Adding integration tests to include the configuration data used in production. These tests must pass before new versions of software or configuration data is put into production.
    • Changing our management of configuration data workflow. One reason for the invalid configuration was because the configuration data is encrypted, which makes it more difficult for developers to inspect. ExpressVPN uses a system called Ansible to manage and encrypt configuration. A separate blog post will describe our new practices for managing encrypted configuration data.
  3. In the API servers that pass data to client apps, we’ll add a feature to verify the quality of data. If the data doesn’t meet certain criteria, including size and completeness, the system will ignore updates and alert responsible engineers.
  4. We’ll make adjustments to our development process for new features that will:
    • Ensure all states are defined positively.
    • Ensure integration tests also include configuration data for the production environment.
    • Test plans for automation and monitoring. In addition to testing the functional accuracy of code, we’ll also check the quality of data.

ExpressVPN would like to apologize to customers affected by the expired license problem. We’re eager to learn from these mistakes, and we’re proud of our Support Team for noticing and responding to this issue very quickly.