Address
Washington D.C.

Work Hours
Monday to Friday: 7AM - 7PM
Weekend: 10AM - 5PM

Data Engineer

Job Category: Data Engineer
Job Type: Part Time
Job Location: Remote

About Us:

We are a rapidly growing startup developing a satellite imagery analysis platform that delivers actionable insights to the agriculture, commodity trading, and insurance sectors. Our platform uses real-time satellite data and machine learning models to provide predictions for crop health, yield forecasting, and risk assessment.

We are seeking a talented Data Engineer to design, build, and maintain data pipelines for ingesting, processing, and transforming large geospatial datasets. You will play a critical role in ensuring that our platform can handle vast amounts of satellite data efficiently, making it accessible for data scientists, GIS experts, and end users.


Responsibilities:

As a Data Engineer, you will be responsible for designing and managing the infrastructure that handles the large-scale geospatial data used in our platform. Your primary responsibilities will include:

  • Design and develop scalable data pipelines to ingest, process, and transform satellite imagery and geospatial datasets from multiple sources (e.g., Sentinel-2, Landsat, commercial providers).
  • Implement ETL (Extract, Transform, Load) processes to clean and organize raw satellite data for downstream processing by data scientists and machine learning models.
  • Ensure the reliability, efficiency, and scalability of data pipelines, handling real-time and historical satellite data while minimizing latency and maximizing throughput.
  • Collaborate with data scientists, GIS specialists, and software engineers to ensure the availability of data for model training, analysis, and end-user applications.
  • Work with cloud infrastructure (e.g., AWS, Google Cloud, Azure) to manage storage and compute resources effectively for processing large datasets.
  • Implement data quality checks, monitoring systems, and error-handling mechanisms to ensure the accuracy and consistency of ingested data.
  • Optimize database performance for geospatial data storage and retrieval, leveraging tools like PostGIS, BigQuery, or Elasticsearch to manage large-scale spatial data.
  • Work with containerization and orchestration tools (e.g., Docker, Kubernetes) to deploy scalable data processing solutions in a cloud environment.
  • Assist in the management of geospatial data lakes and other storage solutions, ensuring data is accessible and easy to query for both internal and external users.
  • Stay up-to-date with new technologies and trends in data engineering, big data processing, and geospatial analytics to continuously improve our data infrastructure.

Requirements:

  • 3+ years of experience as a Data Engineer, with a focus on building data pipelines and managing large-scale datasets.
  • Proficiency in programming languages such as Python, Java, or Scala for data processing and pipeline development.
  • Strong experience with ETL processes and tools like Apache Airflow, Luigi, NiFi, or similar workflow orchestration frameworks.
  • Hands-on experience working with cloud platforms (AWS, Google Cloud, Azure), specifically in setting up data pipelines, storage, and compute resources.
  • Knowledge of geospatial data processing tools and libraries such as GDAL, Rasterio, Fiona, Geopandas, or similar for working with satellite imagery and spatial datasets.
  • Experience working with SQL and NoSQL databases (e.g., PostGIS, BigQuery, MongoDB, Elasticsearch) to store and query large datasets.
  • Familiarity with distributed computing frameworks (e.g., Apache Spark, Dask, or Hadoop) for processing large geospatial datasets.
  • Strong understanding of data security and privacy best practices, particularly in managing sensitive satellite and geospatial data.
  • Experience with containerization (Docker) and orchestration tools (Kubernetes) for deploying scalable data infrastructure.
  • Ability to work effectively in a remote, collaborative environment and communicate complex technical concepts to cross-functional teams.
  • Bonus:
    • Experience working with satellite data and integrating it into data pipelines.
    • Familiarity with machine learning pipelines and integration of geospatial data into ML workflows.
    • Experience with real-time data streaming technologies (e.g., Kafka, Kinesis).

Benefits:

  • Competitive salary with remote working flexibility.
  • Work on a mission-driven project that leverages satellite imagery to address key challenges in agriculture, commodities, and insurance.
  • Collaborate with a global team of data scientists, GIS experts, and engineers on a platform that uses cutting-edge data engineering and machine learning technologies.
  • Access to state-of-the-art cloud infrastructure and geospatial data processing tools.
  • Opportunities for career growth and professional development within a fast-growing startup.

How to Apply:

To apply, please submit your resume, a cover letter, and any relevant links to projects, repositories, or work demonstrating your experience with data engineering, particularly in managing large-scale geospatial data or satellite imagery.


We are committed to building a diverse and inclusive team and encourage applications from all backgrounds. Join us in building the infrastructure that will power the future of satellite imagery analysis and data-driven decision-making!

Apply for this position

Allowed Type(s): .pdf, .doc, .docx