Senior Site Reliability Engineer
Notarize
Software Engineering
Remote
Posted on Dec 5, 2024
Proof is the world's first identity-assured transaction management platform and we are on a mission to digitize trust for all of life’s most critical transactions. Developed by the same market leaders and experts who brought notarization online with Notarize℠, Proof offers trust in a digital world by verifying identities and securing transactions to protect businesses and their customers. Since 2015, we’ve completed many of the world’s first digital commerce transactions, including the first online real estate closing, online mortgage closing, online auto sale, and online will and we're still just getting started!
This role focuses on improving the reliability of our ECS-based production systems. If you're ready to shift away from Kubernetes management to a robust ECS platform, this might be the role for you (some Kubernetes experience is valued).
At Proof we have two distinct environments:
Production: Run on AWS ECS, emphasizing high availability and performance.
Test: Runs on AWS EKS, providing a flexible, containerized platform for development and testing.
What you’ll do as a Senior SRE at Proof:
- Ensure high availability of our ECS-based production environments, improving our 99.9...% uptime
- Oversee Kubernetes test clusters, with a focus on simplicity and high uptime for Jenkins and test environments
- Support and enhance our AWS ECS production environments, ensuring smooth operation and scalability
- Design and manage active-active cross-regional deployments for robust disaster recovery
- Collaborate with developers on Azure ADX dashboards and build actionable Grafana alerts/playbooks
- Streamline CI/CD processes
- Participate in incident response activities and on-call rotation. With our architectural choices, we’ve successfully kept on-call notifications very low
- Implement security and Devops best practices across teams.
What we’re looking for:
- Containerization expertise: Hands-on experience production containerized workloads
- Kubernetes experience is valued, but this role emphasizes AWS ECS for production workloads
- Proven track record with active-active cross-regional architectures
- Proficiency with Terraform and/or other IaC tools
- Experience with observability tools (e.g., Prometheus, Grafana) and log management solutions
- Expertise in optimizing CI/CD pipelines
- Strong scripting and automation skills
- Collaborative, self-directed, and proactive problem-solving mindset
- Comfortable setting priorities and taking initiative.
Preferred Qualifications:
- Experience with cost optimization in cloud environments
- Knowledge of networking and security best practices in cloud-native environments
- Familiarity with auto-scaling strategies for ECS
- Experience with PostgreSQL in a multi-region production environment.
Our stack:
- React: Front-end monolith
- Ruby on Rails: Backend monolith
- Java: A handful of backend services
- PostgreSQL, Redis, S3: state
- HSM/PKI/CA infrastructure
- Jenkins w/ Groovy & Python
- AWS Lambda
Proof is committed to building an inclusive environment for people of all backgrounds and everyone is encouraged to apply. We are an equal opportunity employer and do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status. We'd love to hear from you.