• Site Reliability Engineer

    Job Locations US-WI-Madison
    Req No.
    Regular Full-Time
  • Summary of Major Responsibilities

    This position is focused on providing strategic direction on and execution of infrastructure, security, continuous integration, deployment, and IT operations practices, scaling and metrics, as well as running day-to-day operations of production and development infrastructure for a cloud based hosted platforms.


    The Site Reliability Engineer will work with other Software Engineers, Database Engineers, and Product Managers to analyze system and network loads to address stability and performance challenges, and collaborate with others to operate various systems. The Site Reliability Engineer performs ongoing application support by diagnosing and resolving issues, maintaining applications, and evaluating and recommending options for improving performance, maintainability and operability.  This also includes streamlining processes to increase system scalability and reliability, improve efficiency, and minimize errors.

    Essential Duties and Responsibilities

    • Ability to work with and use Amazon Web Services or other Cloud technology platforms, leveraging CloudFormation, Terraform, Packer, Ansible, Python, Bash, Java, JavaScript, Linux
    • Understanding of security and encryption best practices
    • Responsible for designing, building, maintaining, and scaling production services and server farms across multiple data centers for complex and data-intensive cloud services
    • Design and enhance software architecture to improve scalability, service reliability, capacity, and performance
    • Write automation code for provisioning and operating infrastructure at massive scale. You are not an operator, you’re an experienced software engineer focused on operations
    • Work with development teams to make sure the applications fit nicely within the infrastructure and scalability/reliability is designed and implemented from the grounds up; You will work with QA on building pipelines and automation for delivering and deploying applications to production
    • On-call rotation supporting the infrastructure
    • Roll up the sleeves to troubleshoot incidents, formulate theories and test your hypothesis, and narrow down possibilities to find the root cause
    • Participate in postmortem reviews and remediation recommendation
    • Identify bad trends before they become problems; respond to automated system alerts, effectively troubleshoot system errors and work incidents to return systems to normal operating conditions
    • Author and update high-quality documentation of all relevant specifications, systems and procedures
    • Other duties as assigned


    Minimum Requirements

    • College diploma in CS/Engineering/Sciences or equivalent experience
    • 1+ years of experience with design capabilities using modern technologies
    • Track record in successfully addressing performance, scalability and latency challenges
    • Experience in developing systems architecture


    We are an equal employment opportunity employer. All qualified applicants will receive consideration for employment without regard to age, color, creed, disability, gender identity, national origin, protected veteran status, race, religion, sex, sexual orientation, and any other status protected by applicable local, state or federal law. Applicable portions of the Company’s affirmative action program are available to any applicant or employee for inspection upon request.


    Sorry the Share function is not working properly at this moment. Please refresh the page and try again later.
    Share on your newsfeed