Role-Based Authentication

Authentication can be tricky, but a good role-based implementation can offer more than just a security solution; used correctly, it can help deliver tremendous user and developer experiences.

The problem lies in how we approach authentication. In most development processes, it’s considered a technical requirement. However, in this article, I’ll make a case for doing the opposite. By viewing authentication and authorization as features, we enable ourselves to deliver better solutions whenever they’re concerned.

There’s an ever-present need, in the majority of projects, for production-ready authentication. To address this need, a plethora of tools — called identity providers — have emerged. They aim to provide out-of-the-box authentication solutions, almost all of which provide some concept of “role-based authentication.”

To put it briefly, role-based authentication is a way of assigning one or more labels to a user so that our application can determine if the user has permission to access a given resource.

Keycloak, which is a solution for managing identity and access, implements this mechanism the OAuth way: by including a known value at a known location in the user’s access token. An application can then check for this value to determine if access should be granted.

How We Started

The idea for this article came about when we were implementing changes that would allow a new group of user access to 3AP Admin, our internal web dashboard. So for the rest of this post, we’ll be referring to this scenario.

At that time, we had a single role for each of the two types of users in the system: regular employees, and admins. This project aimed to improve our client experience by allowing our clients access to some of our internal tooling, such as employee details, team assignments, and billing info for our AI in a Box solution, Conperi.

However, we quickly ran into the issue of how we should authenticate the new users. The existing roles were far too broad, as we wanted the new users to only have access to a subset of the platform’s features. It was clear we needed a new role: the clientUser. This clientUser role would be added to Keycloak, and any non-employee user would automatically be granted the role.

Making Changes

Adding a new role was relatively simple in concept, but executing it proved to be more complex. The application code enforced the existing roles, so adding a new role meant writing changes to the codebase of numerous microservices, reviewing those changes, and deploying every changed service. Thanks to our fast CI/CD process, deployment wouldn’t be an issue. Instead, the biggest hindrance would be the time it took to change the existing code and review those changes.

This process highlighted a few problems. First, adding or changing roles required slow code changes that would then require a not-insignificant time investment in order to be reviewed by a member of the dev team — this additional effort required by the team contributed to our reluctance to change the roles. Second, and perhaps more importantly, this code-based configuration was unable to differ per environment gracefully.

To address the second issue, we added code switches that required different roles in our dev and test environments where external integration was being tested. In our production environment, where we weren’t yet ready to allow external users, these switches could be disabled.

Of course, this resulted in additional code changes and follow-up tasks to remove the code when it was no longer necessary. As these follow-up tasks were more about code cleanliness than feature delivery, they were naturally a lower priority, so it was easy to allow them to fall down into the backlog in favor of more visible features. Clearly, this wasn’t the ideal solution, so we set about identifying ways to improve the process. What we settled on was “have more roles.”

Impact

As counterintuitive as it sounds, having more roles solved both problems. Although it’ll require some changes to our codebase going forward, it didn’t result in any radical changes to our existing code or architecture.

The plan was to adopt a unique role for each feature within the application. Our employee directory feature, for example, would no longer be available to employees. Instead, it would be available to users with the people role. Down the line, read and write access could be further divided into people.read and people.write.

The beauty of this solution is that it required no code changes to adopt. The adoption is iterative, with new roles introduced to features as they’re interacted with. This minimizes unnecessary changes to long-functional code and allows the adoption to happen silently in the background, with little-to-no focus taken from feature delivery.

The 3AP Admin platform consists of approximately a dozen distinct features, each of which would require multiple roles to expose many operations in a fine-grained manner. These changes dovetailed perfectly into our existing micro-frontend architecture, but it was undeniable that handling dozens of roles would be no easy task. Luckily, Keycloak offers the perfect solution: Groups.

Keycloak’s Groups are a way of grouping roles together so they can be applied in bulk. This means we can create a clientUser group that replaces the previous clientUser role. This group can then be configured to include each of the roles that a client needs to access their features. What’s more is we can do the same for employees and admins down the line. When we create new groups, we can make use of the existing roles.

With this solution in place, we can make changes to which features a user can access without a single code change. This dramatically reduces the time needed to make changes, especially in production.

In addition to the objective reasons listed above, this solution delivers additional benefits. A/B testing (or green/blue testing) can easily be implemented by assigning a subset of users to a new group. And canary releases and beta testing can be done in both production and preproduction environments by adding additional roles to select users.

Closing Thoughts

Adopting this strategy is ongoing in our 3AP Admin project. Still, it’s already proven effective and has been embraced fully in some of our other projects — including in our newest digital solution, where we’re practicing API-first development, with roles as a central consideration.

In the end, what I’ve set out here is no novel solution; huge products already use role-based authentication to secure complex APIs, and you’ll likely already know if adopting this approach would benefit your own projects. Fine-grained authentication can enable a degree of specificity that can benefit any project, and rethinking how authentication is considered can often lead to additional discoveries in big and small projects alike. If you’re on the fence, I’d love to encourage you to adopt this approach.