Cloud Data Engineer π₯
Neurons Lab is famous for its Cloud Data Engineers (DE), combining depth knowledge of Data Engineering & Analytics w/ Development Operations (DevOps) mindset. Our experts help Customers not only understand the value their data hides now but also establish scalable, performant, and cost-efficient processes to uncover it continuously. W/ 4+ years of hands-on experience w/ various types of data (geospatial, medical, ...) and verticals (GIS, HCLS, ..), Neurons Lab DEs are here to help Customers achieve their business goals.
About projectThe customer is developing a comprehensive, first-in-class Geospatial Data Warehouse SaaS. Geospatial data is traditionally hard to work with and requires years of experience with highly-specialized tools. These tools are highly fragmented - each solves only a little piece of the puzzle, uses different (often incompatible) formats for data, and exposes it through interfaces not supported by standard analytical tools. The customer aims to unite this fractured world and give its Customers a powerful, convenient instrument to collect, process, store, analyze, and visualize their geospatial data. All to help them uncover the extra value hidden in their data without the undifferentiated heavy lifting of running their own infrastructure or studying rocket science.
Challenges & Responsibilities- Architecting, building, and maintaining cost-efficient, scalable cloud environments for customers.
- Understanding the customer's business objectives and creating cloud-based solutions to facilitate those objectives.
- Conducts Well-Architected Reviews and audits customer AWS Accounts
- Participate in the SOW creation process along with the Sales team, Delivery Manager, and engineers
- Conducts customer-facing architecture assessment meetings to gather business and technical requirements, aligns those cloud best practices, architect a solution and, write Epics and User Stories in collaboration with the Team
- Participates in internal and external projects Stand-Ups as a Product Owner from the project's inception to completion
- Support quality development practices and pursue new and better ways to build and deploy software and ML/AI models
Foundational/must-have skills
:- [ ]
DataOps
:- [ ] data management: AWS Lambda
- [ ] data pipelines: Amazon EMR, Dask@Yarn
- [ ] data lakes: AWS Lake Formation, Glue Data Catalog / Apache Hive Metastore; Amazon S3, Athena; Apache Iceberg, Parquet
- [ ] data warehouses: Amazon Redshift, Aurora for PostgreSQL
- [ ] data sources and destinations: Amazon RDS / Aurora for PostgreSQL
- [ ]
DevOps
: AWS CDK | Amazon VPC, API Gateway; REST APIs | Git | AWS IAM, KMS, Secrets Manager - [ ]
Data Analytics
: Python | Pandas, NumPy; AWS SDK for Python aka boto3, AWS SDK for Pandas aka awswrangler | SQL: Trino / Presto / Amazon Athena SQL; Amazon Redshift SQL; PostgreSQL
Advanced / nice-to-have skills
:- [ ]
DataOps
:- [ ] data management: AWS Lambda for Python | Amazon EventBridge | AWS Step Functions; Apache Airflow, Astronomer
- [ ] data lakes: Apache Atlas | Apache GeoParquet, Arrow, GeoArrow
- [ ] data lake houses: Amazon Redshift Spectrum
- [ ] data sources and destinations: Amazon RDS / Aurora for MySQL, DynamoDB, DocumentDB; not-in-AWS PostgreSQL, not-in-AWS MySQL, OGC 2.0 (WFS / Web Feature Service), OpenGIS (WMS / Web Map Service), OpenGIS (WMTS / Web Map Tile Service), not-in-AWS MongoDB, Esri Geodatabase / ArcGIS
- [ ] data formats: GeoJSON, GeoTIFF, Shapefile, netCDF, HDF5 | GML, GPX, KML, WKT, WKB, IMG, CSV, or other GDAL-supported vector and raster formats
- [ ]
DevOps
: AWS CDK for Python | Amazon CloudWatch, SNS | AWS Marketplace, Application Cost Profiler | AWS X-Ray, Lambda Powertools; Amazon OpenSearch Service; Sentry, Datadog | AWS Backup | AWS Transit Gateway, PrivateLink | GitLab; AWS CodeBuild, CodeArtifact, CodeDeploy, CodePipeline | Amazon Cognito, Cognito External Identity Providers - [ ]
Data Analytics
: GeoPandas, dask-geopandas; GDAL / Fiona, GEOS / Shapely; Xarray; Rasterio, rioxarray; SciPy; Numba; Pandera; jsonschema; great_expectations | SQL: PartiQL; Amazon Redshift UDFs in Python and AWS Lambda; PostGIS for PotstgreSQL | NoSQL: Mongo (MongoDB Atlas on AWS) | Spark (Scala) / PySpark (Python) - [ ] Collaborative, consultative, and committed to continuous improvement and learning in a team-based environment
- [ ] Excellent interpersonal communication skills to explain complex technical topics in an easily comprehensible manner
- [ ] Customer-focused, analytical, detail-oriented, and a positive βcan-doβ attitude
- One or more AWS certifications, preferably. AWS Solutions Architect Professional AWS Data and Analytics - Specialty or AWS Certified Machine Learning - Specialty
- 4+ years of commercial, hands-on experience in many domains (Data Analytics, Engineering, DevOps, Machine Learning)
Allocation: 0.5 FTE (part-time)
Stock options: no
Time zone: no matter
Candidate's Location: remote