This post captures the results of an interview with Greg DeArment, Head of Infrastructure at Palantir.
Topics we covered included how Palantir manages the production environment for their modern data platform Foundry, with a special focus on Kubernetes.
Gourley: Greg before we get started can you tell me a bit about yourself?
DeArment: I joined Palantir in 2011 as a Forward Deployed Engineer, where I worked with many of our early commercial customers and was involved in moving our customer environments to the cloud. In late 2014, Mark Elliot (Palantir’s Chief Architect) and I had come to the conclusion that we needed to start treating the way in which we deploy our Foundry platform as a software problem itself. From this, we formed the Deployment Infrastructure product group, which I have led since. Prior to Palantir, I had spent a few years working on Modeling, Simulation and Wargaming platforms, ISR data analytics platforms for the government and some time at an automated futures trading software startup.
Gourley: What does the Head of Infrastructure do at Palantir?
DeArment: I am oversee a group of engineers who are responsible for building the systems we use to run, manage and monitor the Foundry and Gotham platforms Palantir’s customers use. This includes our managed cloud hosting environments, configuration and continuous integration / continuous deployment (CI/CD) and telemetry infrastructure. My responsibilities include ensuring we are making the right engineering investments to support the platform capabilities we are pursuing in the Foundry platform, and enabling new hosting environments in which our customer want to run Foundry, including multi-cloud and on-premise data centers. A big focus for me over the last 18 months has been adapting to Kubernetes.
Gourley: Can you provide any context around how the rise of public clouds has shaped your views of infrastructure?
DeArment: For years, we’ve believed that public clouds provide organizations who are looking to become data driven a meaningful way to accelerate their digital transformation. Public clouds offer organizations a way of moving faster but this comes with increased information security risk that must be addressed and managed.
Gourley: I agree of course, but seems like public clouds also have many potential security benefits, depending on design choices like encryption, identity management, authorization etc.
DeArment: Yes and this had made security a priority design requirement in all of Palantir’s engineering activities. Said another way, our customers put their most sensitive data in Foundry, and as such, security will always be a priority in how we build product at Palantir. We had to make heavy investments up-front in order to secure our public cloud environments and enable our customers to realize the benefits of public cloud without sacrificing security.
Gourley: One of the major trends in cloud computing has been the ability for enterprises to operate in multiple clouds, including their own internal clouds. Do you see this as a future driver of customer needs?
DeArment: As long as businesses and governments will seek better ways to serve their missions there will be a continuous drive to more efficient use of resources. In the cloud this means a continued commoditization of the capabilities of the major cloud providers and a continuous improvement of enterprise private cloud capabilities as well. We started offering our managed cloud hosting offerings in 2014 on top of AWS, so we could offer Foundry is a SaaS offering. And we built this with an ability to leverage the community standards of containers and Kubernetes which enable orchestration across clouds, public and private.
Gourley: You mentioned Kubernetes. Can you provide more context on Kubernetes, starting at the basics?
DeArment: The simple way to think of Kubernetes is as an operating system for managing workloads and services. It is far more, but that is one of the most wonderful features of this open-source platform. A more detailed description would be to say Kubernetes makes containerized applications super easy to develop, integrate, deploy, scale and manage across a wide variety of infrastructure.
Gourley: Can you provide a bit more on this new container approach?
DeArment: Modern applications are developed to run in “containers”, a way of packing applications and necessary dependencies in a portable, standardized format. This makes deploying easier and more repeatable across environments than deploying software directly on bare bones operating systems. Enterprise architects and developers are probably familiar with the container solution Docker, but there are many others and even a standard for containers called OCI for Open Container Initiative (OCI). Whatever the container solution, there is a need for better orchestration and management of the containers. This is where Kubernetes comes in.
Gourley: Why should enterprises be interested in using Kubernetes?
Over the last several years, there has been a industry-wide shift in how modern software platforms are developed. Software is moving from large monolithic services to micro-service oriented architectures. This has led to the prevalence of geographically distributed teams as a means to drive innovation and velocity across an organization. However, with many teams and services all moving at a faster pace, it has created the need for the infrastructure to handle the increased complexity of dependencies between services, configuration management and a higher rate of upgrades in production environments. Increasing the rate of upgrades is a blessing for both developers and customers (because they get improvements and new features continually rather than once every 0.5 years), but is challenging for the deployment infrastructure. Kubernetes provides a sophisticated set of features to manage the complexity the new era of software platforms require to be successful.
Gourley: You mentioned security and how Kubernetes can help. Can you tell me more?
DeArment: Absolutely. Running services inside of containers provides a level of isolation between processes that is nearly impossible to achieve without making sacrifices in performance, flexibility or complexity. This is especially important for data-driven platforms which are responsible for executing user-provided code, such as Foundry. Containers provide a way to prevent the work of two users from interfering with one another, whether there is malicious intent or not.
It’s important to remember that Kubernetes was essentially designed for a single-tenant, trusted environment — one where all the users who have access to run workloads are trusted actors. This can be quite concerning from a infrastructure security perspective when executing code on behalf of users. Compromising a single user’s credentials can make it easy for a malicious actor to gain access to other users’ data. While it is possible to configure Kubernetes to run in a more secure way, and there is a open-source multi-tenancy working group, we’ve found the process of securing it to a point where we are comfortable to be quite involved.
Gourley: Why is Kubernetes important to the future of Palantir’s infrastructure plans?
DeArment: Over the past few years, our internal engineering efforts and those of the open-source community converged around similar ideas and concepts — deployment of small stateless services which rely on the deployment infrastructure to provide configuration management, service discovery and automation for continuous deployment of changes to the environment. Many companies invested in a similar space but we feel Kubernetes best aligns with how we think about deploying and managing software and our existing infrastructure. As Kubernetes became more mature, we began integrating it into our deployment and cloud platform plans in order to improve the security posture of the Foundry platform and lower the total cost of ownership of the infrastructure necessary to run it.
Most open-source compute platforms today, such as Hadoop Yarn, lead to a trade-off between security and robustness of the toolset users have at their disposal to empower their business. With Kubernetes, we can enable Foundry users to work with the tools of their choice without compromising the security posture of the platform and putting at risk the security of our customers’ data. Additionally, by using Kubernetes as the heart of Foundry’s compute infrastructure, we are able to obtain substantially more intelligent scheduling of user workloads, which gives rise to improvements in performance and cost efficiency of up to 50% for our customers.
Gourley: What have your experiences been like integrating Kubernetes into Palantir’s infrastructure?
DeArment: In 2017, we began integrating Kubernetes into our 2nd generation cloud platform we call Rubix. As we did this, a number of other design decisions impacted our experiences with Kubernetes, such as the decision to rebuild every host in our cloud platform every 48 hours. There are a number of reasons to do this, but the primary motivation was to improve our security posture against advanced persistent threats — making it no longer sufficient for an attacker to compromise a host once, they must do it over and over again in order to retain their foothold. Ephemeral infrastructure which is constantly in flux provides better security but has introduced a number of challenges with how Kubernetes and related services handle dynamic infrastructure. Using Kubernetes features like StatefulSets (a feature that is useful for applications which store state on disk, such as databases) must be done carefully when the the underlying infrastructure is in a constant state of change. We also ran into a number of unexpected networking issues – side effects of frequent changes to the network and service’s implicit dependency on long lived network connections.
Like most advances in technology and approach, there are challenges as well. Kubernetes and the tech it relies on can be complicated for infrastructure engineers. The technology is still relatively new and requires regular upgrades of core infrastructure services in order to address issues we come across.
Gourley: Is there any advice you would offer to other enterprises who are considering a move to Kubernetes?
DeArment: It’s important to keep in mind that modern technology often requires a modern operating system to run on top of. Palantir has historically run Foundry on RedHat’s CentOS Linux. However, the latest version of CentOS Linux relies on a version of the Linux Kernel (the heart of the operating system) that is 5+ years old and is missing important features we wanted to use. We recently switched Linux distributions in order to take advantage of more performant Docker filesystem drivers and certain networking features. Many enterprises may find that the operating systems they have historically relied on aren’t modern enough and will struggle to support containers and Kubernetes in performant way.
Kubernetes is an great platform but complex to configure and operate. Administrators should be sure to understand what their engineers and users need access to how to secure the system accordingly.
Gourley: We started this discussion by talking about how the rise of public clouds has shaped your views of infrastructure. Now I want to ask the opposite question, how has your infrastructure shaped your view of public clouds?
DeArment: Using Kubernetes as the new substrate for our infrastructure will make it easier for us to diversify our cloud providers to include Microsoft Azure and continue to realize our engineering investments for our customers who aren’t able to operate in public cloud due to complicated regulatory or security requirements.
Gourley: Kubernetes is open-source. Do you participate in the community by contributing code back to the project?
DeArment: We’ve contributed a number of bug fixes back to Kubernetes and other related open-source projects that we’ve encountered over the past year of using it in production environments (including a fix for one of the StatefulSet bugs mentioned prior). Additionally, Palantir co-authored Apache Foundation proposal for Spark on Kubernetes and was a major contributor to the implementation after being accepted as an official Apache project. We’ve also built up a lot of tooling and infrastructure we found necessary to run Kubernetes in production and are looking to start open-sourcing much of that in the future.
Gourley: Thanks Greg for the time and context. Very much appreciated.
DeArment: My pleasure.