This opening is more than 10 months old and is very likely already filled.
Find other jobs
Fyusion is a leading machine learning and computer vision company focused on automotive inspections and related applications. Our patented 3D format enables anyone to capture and display interactive 3D images using their smartphone, and enables significant added functionality with deep visual understanding and machine learning-driven analysis.Founded in 2014, Fyusion is now part of the Cox Automotive family. Our team includes some of the world's top researchers and developers in light field imaging and AI, continuing to push boundaries and innovate at the highest level from our San Francisco research center.Fyusion is seeking an awesome DevOps Reliability Engineer (intersection of DevOps & SRE) to join our Web and Cloud Infrastructure team. We are a close-knit team that enjoys challenges and solving real world problems. You will have a key role in solving those problems, helping to shape our core automation, data processing, and deployment practices. You will leverage deep knowledge of Amazon Web Services, as well as automated build and orchestration tools such as Terraform and Kubernetes, to develop and maintain a wide range of infrastructure components—including web stacks, database systems, security tools, and networking/cloud environment configurations.Further, you will proactively seek out system weaknesses and find ways to fix them beforethey cause production issues using monitoring data, watching trends, and using Chaos Engineering.We understand this is a complex role, and do not expect you to be an expert in every tool we use. However, we do expect you to be motivated and open to continual self-improvement, adapting to new tools and overcoming new challenges as they come. If you are looking to be challenged, enjoy wearing multiple hats, and thrive in a fast-paced, agile environment, we think you’ll love this role! Here's what you will be doing: Actively troubleshoot any issues that arise during testing and production, catching and solving issues before launchAutomating work including infrastructure needs, testing, failover solutions, failure mitigation, and much moreMonitor and troubleshoot highly scalable and distributed server clusters that perform various functions, from web-servers to machine learning processingParticipate in SRE activities, (chaos engineering gamedays, disaster recovery scenarios etc).Manage code deployments, fixes, updates, and related processesWork with a close-knit team and brainstorm on the best ways to tackle complex problems in infrastructure, security and monitoringProvide technical guidance and educate team members and coworkers on monitoring and logging. (Have an interesting idea or solution? Present it!)Automating any software maintenance processes which previously required a manual procedure. Here's what we are looking for: 3+ years experience with software engineering, software development, or system operations on high available and high traffic environmentsStrong experience with Linux-based infrastructures, Linux/Unix administration, and AWSExperience with databases such as MySQL (or sql based), Elasticsearch, RedisExperience administering linux servers as well as docker based infrastructure (like Kubernetes, EKS, etc.) in a highly available environmentExperience of scripting languages such as Python, BashExperience with message broker/queue technologies like RabbitMQExperience with modern monitoring, logging and observability tools in complex distributed systems such as with Grafana, New Relic, Splunk, Elastic stack, Datadog, Prometheus, etc.Practical experience with infrastructure-as-code (with tools like Terraform, Chef, Ansible, etc.).Good understanding of cybersecurity fundamentals and best practices.Containerizing and clustering (Dockerfiles, docker-compose, Helm, Kubernetes, etc.)Stellar problem-solving and troubleshooting skills with the ability to spot issues before they become problems.Excellent oral and written communication skills.Process-oriented with great documentation skillsSolid team player!Here's what we can offer you:A competitive compensation, health, vision and dental benefits with premiums paid by Fyusion, generous PTO plan, company holidays (including your birthday), and the chance to be part of a pioneering technology team! We offer some amazing perks for those working from our SF HQ: commuter benefits, company catered lunches, a fully stocked snack pantry, tons of company off-sites, and a pup friendly workplace.If you read this job description and saw your name all over this, apply! If you read this, and think that you might need some help hitting all of the points, please apply! We have an entire team who is happy to help and share our knowledge with you. The benefits do not apply to contract or internship positions.
Let us send you new openings similar to Devops Reliability Engineer straight to your Inbox.
Weekly or Daily. 7-day free trial 💌
The ability to work remotely increases employee happiness by 20 percent.