Senior Software Engineer - Data Infrastructure
[As of June 2020,. This position can be performed remotely from anywhere in the world, regardless of any location that might be specified above.]
The vast majority of human knowledge is still not on the internet. Most of it is trapped in the form of experience in people's heads, or buried in books and papers that only experts can access. More than a billion people use the internet, yet only a tiny fraction contribute their knowledge to it. We want to democratize access to knowledge of all kinds — from politics to painting, cooking to coding, etymology to experiences — so if someone out there knows something, anyone else can learn it. Our mission is to share and grow the world's knowledge, and we're building a world-class team to help us achieve this mission.
About the Team:Our small engineering team works on challenging problems every day. We have a culture that's rooted in constantly learning and improving, and our engineers are encouraged to think big and experiment with new ideas. Using continuous deployment, we quickly see our changes in the product and make fast iterations. Our engineers focus on creating polished products and writing high quality code by designing APIs and abstractions that are extensible and maintainable. Everyone on the engineering team has a huge impact on our product and our company.
About the Role:Our data infrastructure team maintains, operates and expands data ecosystem at Quora which includes Data warehousing, Streaming infrastructure, Distributed cluster-computing framework, Distributed query engines, Messaging systems, Data pipelines & Automation Tools. In this role you will be responsible for contributing to different aspects of data pipeline development and operational stability of the production big data systems. We leverage existing open source technologies like Spark, Flink, Kafka, HDFS, Hbase, Hive, Presto, Airflow and also build our own systems for experimentation & time-series analysis. As a member of our team you would spend time designing and scaling our distributed data systems, working closely with other teams to identify and execute on new use cases & evangelize the correct use of data at the company. We are looking for someone who will be excited by the prospect of optimizing, enhancing or even re-designing our company’s data architecture/pipelines to support our next generation data initiatives.
- Design, implement, maintain and optimize data pipelines, architectures and data sets.
- Collaborate with data scientists, platform engineers, and business partners to understand data needs and drive key data infrastructure decisions.
- Bring your expertise to help model structured & unstructured data. Own these data models at a high level & be a data consultant for partner teams.
- Own the data definitions & lineage across different data platforms and maintain systems of record for operational and non operational data stores.
- Engineer reusable capabilities, abstractions & resilience in data pipelines for DML, DDL, ETL & Data flows which can be leveraged across teams.
- Be a data mentor & a team player with strong communication, prioritization, and adaptability skills.
- Ability to be available for meetings and impromptu communication during Quora's "coordination hours" (Mon-Fri: 9am-3pm Pacific Time). Members of our Infrastructure Engineering team are not required to work the full coordination hours, but should anticipate that they will need to be available Mon-Fri from either 11am-2pm PST or noon-3pm PST at minimum. Learn why
- Proficiency in any/all of the programming languages: Python/Java/Scala & strong query authoring skills in SQL.
- Must have 5+ years of experience building data pipelines, including data ingestion, cleaning, processing, transforming, staging & loading.
- Proficiency with big data processing frameworks: Spark, Flink, Hive, Hadoop, Kafka, EMR, Presto.
- Operational mindset with ability to do Problem diagnosis, Root cause analysis, SLA compliance, Performance tuning and Incident Management in Data Infrastructure.
- Experience building data-intensive applications (high velocity/high volume).
- Experience with SQL/NoSQL data store & data lake operations.
- Flexible and positive team player with outstanding interpersonal skills.
- Passion for Quora's mission and goals.
- Hands-on experience with AWS technologies like S3, Redshift, EMR/EC2, Athena, Snowflake.
- Familiarity in designing and operating a streaming platform (eg. Kafka, Flink, Spark)
- Data wrangling & Data tooling ability
We are an equal opportunity employer and value diversity at our company. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.