About Us:
We are a rapidly growing startup developing a satellite imagery analysis platform that delivers actionable insights to the agriculture, commodity trading, and insurance sectors. Our platform uses real-time satellite data and machine learning models to provide predictions for crop health, yield forecasting, and risk assessment.
We are seeking a talented Data Engineer to design, build, and maintain data pipelines for ingesting, processing, and transforming large geospatial datasets. You will play a critical role in ensuring that our platform can handle vast amounts of satellite data efficiently, making it accessible for data scientists, GIS experts, and end users.
Responsibilities:
As a Data Engineer, you will be responsible for designing and managing the infrastructure that handles the large-scale geospatial data used in our platform. Your primary responsibilities will include:
- Design and develop scalable data pipelines to ingest, process, and transform satellite imagery and geospatial datasets from multiple sources (e.g., Sentinel-2, Landsat, commercial providers).
- Implement ETL (Extract, Transform, Load) processes to clean and organize raw satellite data for downstream processing by data scientists and machine learning models.
- Ensure the reliability, efficiency, and scalability of data pipelines, handling real-time and historical satellite data while minimizing latency and maximizing throughput.
- Collaborate with data scientists, GIS specialists, and software engineers to ensure the availability of data for model training, analysis, and end-user applications.
- Work with cloud infrastructure (e.g., AWS, Google Cloud, Azure) to manage storage and compute resources effectively for processing large datasets.
- Implement data quality checks, monitoring systems, and error-handling mechanisms to ensure the accuracy and consistency of ingested data.
- Optimize database performance for geospatial data storage and retrieval, leveraging tools like PostGIS, BigQuery, or Elasticsearch to manage large-scale spatial data.
- Work with containerization and orchestration tools (e.g., Docker, Kubernetes) to deploy scalable data processing solutions in a cloud environment.
- Assist in the management of geospatial data lakes and other storage solutions, ensuring data is accessible and easy to query for both internal and external users.
- Stay up-to-date with new technologies and trends in data engineering, big data processing, and geospatial analytics to continuously improve our data infrastructure.
Requirements:
- 3+ years of experience as a Data Engineer, with a focus on building data pipelines and managing large-scale datasets.
- Proficiency in programming languages such as Python, Java, or Scala for data processing and pipeline development.
- Strong experience with ETL processes and tools like Apache Airflow, Luigi, NiFi, or similar workflow orchestration frameworks.
- Hands-on experience working with cloud platforms (AWS, Google Cloud, Azure), specifically in setting up data pipelines, storage, and compute resources.
- Knowledge of geospatial data processing tools and libraries such as GDAL, Rasterio, Fiona, Geopandas, or similar for working with satellite imagery and spatial datasets.
- Experience working with SQL and NoSQL databases (e.g., PostGIS, BigQuery, MongoDB, Elasticsearch) to store and query large datasets.
- Familiarity with distributed computing frameworks (e.g., Apache Spark, Dask, or Hadoop) for processing large geospatial datasets.
- Strong understanding of data security and privacy best practices, particularly in managing sensitive satellite and geospatial data.
- Experience with containerization (Docker) and orchestration tools (Kubernetes) for deploying scalable data infrastructure.
- Ability to work effectively in a remote, collaborative environment and communicate complex technical concepts to cross-functional teams.
- Bonus:
- Experience working with satellite data and integrating it into data pipelines.
- Familiarity with machine learning pipelines and integration of geospatial data into ML workflows.
- Experience with real-time data streaming technologies (e.g., Kafka, Kinesis).
Benefits:
- Competitive salary with remote working flexibility.
- Work on a mission-driven project that leverages satellite imagery to address key challenges in agriculture, commodities, and insurance.
- Collaborate with a global team of data scientists, GIS experts, and engineers on a platform that uses cutting-edge data engineering and machine learning technologies.
- Access to state-of-the-art cloud infrastructure and geospatial data processing tools.
- Opportunities for career growth and professional development within a fast-growing startup.
How to Apply:
To apply, please submit your resume, a cover letter, and any relevant links to projects, repositories, or work demonstrating your experience with data engineering, particularly in managing large-scale geospatial data or satellite imagery.
We are committed to building a diverse and inclusive team and encourage applications from all backgrounds. Join us in building the infrastructure that will power the future of satellite imagery analysis and data-driven decision-making!