Job Category: Data Engineer

Job Type: Part Time

Job Location: Remote

About Us:

We are a rapidly growing startup developing a satellite imagery analysis platform that delivers actionable insights to the agriculture, commodity trading, and insurance sectors. Our platform uses real-time satellite data and machine learning models to provide predictions for crop health, yield forecasting, and risk assessment.

We are seeking a talented Data Engineer to design, build, and maintain data pipelines for ingesting, processing, and transforming large geospatial datasets. You will play a critical role in ensuring that our platform can handle vast amounts of satellite data efficiently, making it accessible for data scientists, GIS experts, and end users.

Responsibilities:

As a Data Engineer, you will be responsible for designing and managing the infrastructure that handles the large-scale geospatial data used in our platform. Your primary responsibilities will include:

Design and develop scalable data pipelines to ingest, process, and transform satellite imagery and geospatial datasets from multiple sources (e.g., Sentinel-2, Landsat, commercial providers).
Implement ETL (Extract, Transform, Load) processes to clean and organize raw satellite data for downstream processing by data scientists and machine learning models.
Ensure the reliability, efficiency, and scalability of data pipelines, handling real-time and historical satellite data while minimizing latency and maximizing throughput.
Collaborate with data scientists, GIS specialists, and software engineers to ensure the availability of data for model training, analysis, and end-user applications.
Work with cloud infrastructure (e.g., AWS, Google Cloud, Azure) to manage storage and compute resources effectively for processing large datasets.
Implement data quality checks, monitoring systems, and error-handling mechanisms to ensure the accuracy and consistency of ingested data.
Optimize database performance for geospatial data storage and retrieval, leveraging tools like PostGIS, BigQuery, or Elasticsearch to manage large-scale spatial data.
Work with containerization and orchestration tools (e.g., Docker, Kubernetes) to deploy scalable data processing solutions in a cloud environment.
Assist in the management of geospatial data lakes and other storage solutions, ensuring data is accessible and easy to query for both internal and external users.
Stay up-to-date with new technologies and trends in data engineering, big data processing, and geospatial analytics to continuously improve our data infrastructure.

Requirements:

3+ years of experience as a Data Engineer, with a focus on building data pipelines and managing large-scale datasets.
Proficiency in programming languages such as Python, Java, or Scala for data processing and pipeline development.
Strong experience with ETL processes and tools like Apache Airflow, Luigi, NiFi, or similar workflow orchestration frameworks.
Hands-on experience working with cloud platforms (AWS, Google Cloud, Azure), specifically in setting up data pipelines, storage, and compute resources.
Knowledge of geospatial data processing tools and libraries such as GDAL, Rasterio, Fiona, Geopandas, or similar for working with satellite imagery and spatial datasets.
Experience working with SQL and NoSQL databases (e.g., PostGIS, BigQuery, MongoDB, Elasticsearch) to store and query large datasets.
Familiarity with distributed computing frameworks (e.g., Apache Spark, Dask, or Hadoop) for processing large geospatial datasets.
Strong understanding of data security and privacy best practices, particularly in managing sensitive satellite and geospatial data.
Experience with containerization (Docker) and orchestration tools (Kubernetes) for deploying scalable data infrastructure.
Ability to work effectively in a remote, collaborative environment and communicate complex technical concepts to cross-functional teams.
Bonus:
- Experience working with satellite data and integrating it into data pipelines.
- Familiarity with machine learning pipelines and integration of geospatial data into ML workflows.
- Experience with real-time data streaming technologies (e.g., Kafka, Kinesis).

Benefits:

Competitive salary with remote working flexibility.
Work on a mission-driven project that leverages satellite imagery to address key challenges in agriculture, commodities, and insurance.
Collaborate with a global team of data scientists, GIS experts, and engineers on a platform that uses cutting-edge data engineering and machine learning technologies.
Access to state-of-the-art cloud infrastructure and geospatial data processing tools.
Opportunities for career growth and professional development within a fast-growing startup.

How to Apply:

To apply, please submit your resume, a cover letter, and any relevant links to projects, repositories, or work demonstrating your experience with data engineering, particularly in managing large-scale geospatial data or satellite imagery.

We are committed to building a diverse and inclusive team and encourage applications from all backgrounds. Join us in building the infrastructure that will power the future of satellite imagery analysis and data-driven decision-making!

Apply for this position