Although the roles of the SRE and site platform engineer share some similarities and are at times conflated, they’re still distinct.
Platform engineers are responsible for designing, developing and maintaining the underlying platform that the application runs on including the infrastructure, operating systems, databases and other components that enable the application to function. SREs, on the other hand, focus on the reliability, scalability and performance of the application itself.
“The self-serviceability aspect comes under the realm of a platform engineering team that is trying to provide self-service capabilities for product teams to consume,” Gartner’s Betts said. “SRE is going to be involved in looking at some of the tools that are used to help with that, but their focus is very much on removal of repeatable manual tasks that could potentially go wrong.”
However, SREs can be placed within platform engineering teams to help with some of the tasks.
“As the SRE teams mature, they get into the platform side of the business where they’re actually calling out gaps in the self-service capabilities so the development teams and the product teams can fix it and benefit from it,” Red Hat’s Raghavan said.
While in large organizations, there’s a division between the two roles, the more resource-constrained ones might have the same person performing both roles, according to Ellis.
Gear up your SRE
Here are some of the tools to help gear the SRE up for battle as provided by Forrester’s report “Role Profile: Site Reliability Engineer”:
-
- Automation: SREs will need to use scripting, code, or orchestration tools to manage a system or environment. This can include tools like Ansible, CircleCI, GitLab, Jenkins, and Google Cloud Build.
- App modernization: This can be used to migrate legacy applications to newer ones through revising the code base or rewriting the code using Docker, Git, Google Cloud Run, Kubernetes, and more.
- Chaos engineering: SREs can use this method to find faults in a system by injecting specific faults in a testing or production environment using Chaos Machine, Chaos Mesh, Chaos Monkey, Chaos Toolkit, and more.
- Networking: This is all about Analyzing the communication process among various computing devices or computer systems using Nagios, Netdata, SolarWinds, Terraform, and more.
- Observability: SREs need to manage observability to monitor and generate insights about a platform, site, or environment under management using DataDog, Dynatrace, Google Error Reporting, New Relic, and a host of others.
- Security: SREs also take part in safeguarding an environment through strategies, policies, processes, and technology at every part of the life cycle using tools like Chef InSpec, Google Cloud Audit Logs, Sysdig, and Virus Total.
To read more, click here.