{"id":1962,"date":"2024-10-22T04:10:46","date_gmt":"2024-10-22T04:10:46","guid":{"rendered":"https:\/\/groundinsightanalytics.com\/?post_type=awsm_job_openings&#038;p=1962"},"modified":"2024-10-22T04:10:47","modified_gmt":"2024-10-22T04:10:47","slug":"data-engineer","status":"publish","type":"awsm_job_openings","link":"https:\/\/groundinsightanalytics.com\/?awsm_job_openings=data-engineer","title":{"rendered":"Data Engineer"},"content":{"rendered":"\n<p class=\"wp-block-paragraph\"><strong>About Us:<\/strong><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">We are a rapidly growing startup developing a <strong>satellite imagery analysis platform<\/strong> that delivers actionable insights to the <strong>agriculture<\/strong>, <strong>commodity trading<\/strong>, and <strong>insurance<\/strong> sectors. Our platform uses <strong>real-time satellite data<\/strong> and <strong>machine learning models<\/strong> to provide predictions for <strong>crop health<\/strong>, <strong>yield forecasting<\/strong>, and <strong>risk assessment<\/strong>.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">We are seeking a talented <strong>Data Engineer<\/strong> to design, build, and maintain data pipelines for ingesting, processing, and transforming large geospatial datasets. You will play a critical role in ensuring that our platform can handle vast amounts of satellite data efficiently, making it accessible for <strong>data scientists<\/strong>, <strong>GIS experts<\/strong>, and <strong>end users<\/strong>.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Responsibilities:<\/strong><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">As a <strong>Data Engineer<\/strong>, you will be responsible for designing and managing the infrastructure that handles the <strong>large-scale geospatial data<\/strong> used in our platform. Your primary responsibilities will include:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Design and develop scalable data pipelines<\/strong> to ingest, process, and transform satellite imagery and geospatial datasets from multiple sources (e.g., <strong>Sentinel-2<\/strong>, <strong>Landsat<\/strong>, <strong>commercial providers<\/strong>).<\/li>\n\n\n\n<li>Implement <strong>ETL (Extract, Transform, Load)<\/strong> processes to clean and organize raw satellite data for downstream processing by data scientists and machine learning models.<\/li>\n\n\n\n<li>Ensure the <strong>reliability, efficiency, and scalability<\/strong> of data pipelines, handling real-time and historical satellite data while minimizing latency and maximizing throughput.<\/li>\n\n\n\n<li>Collaborate with <strong>data scientists<\/strong>, <strong>GIS specialists<\/strong>, and <strong>software engineers<\/strong> to ensure the availability of data for model training, analysis, and end-user applications.<\/li>\n\n\n\n<li>Work with cloud infrastructure (e.g., <strong>AWS<\/strong>, <strong>Google Cloud<\/strong>, <strong>Azure<\/strong>) to manage storage and compute resources effectively for processing large datasets.<\/li>\n\n\n\n<li>Implement <strong>data quality checks<\/strong>, monitoring systems, and error-handling mechanisms to ensure the accuracy and consistency of ingested data.<\/li>\n\n\n\n<li>Optimize database performance for geospatial data storage and retrieval, leveraging tools like <strong>PostGIS<\/strong>, <strong>BigQuery<\/strong>, or <strong>Elasticsearch<\/strong> to manage large-scale spatial data.<\/li>\n\n\n\n<li>Work with <strong>containerization and orchestration tools<\/strong> (e.g., <strong>Docker<\/strong>, <strong>Kubernetes<\/strong>) to deploy scalable data processing solutions in a cloud environment.<\/li>\n\n\n\n<li>Assist in the management of <strong>geospatial data lakes<\/strong> and other storage solutions, ensuring data is accessible and easy to query for both internal and external users.<\/li>\n\n\n\n<li>Stay up-to-date with new technologies and trends in <strong>data engineering<\/strong>, <strong>big data processing<\/strong>, and <strong>geospatial analytics<\/strong> to continuously improve our data infrastructure.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Requirements:<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>3+ years of experience<\/strong> as a Data Engineer, with a focus on building <strong>data pipelines<\/strong> and managing large-scale datasets.<\/li>\n\n\n\n<li>Proficiency in programming languages such as <strong>Python<\/strong>, <strong>Java<\/strong>, or <strong>Scala<\/strong> for data processing and pipeline development.<\/li>\n\n\n\n<li>Strong experience with <strong>ETL processes<\/strong> and tools like <strong>Apache Airflow<\/strong>, <strong>Luigi<\/strong>, <strong>NiFi<\/strong>, or similar workflow orchestration frameworks.<\/li>\n\n\n\n<li>Hands-on experience working with <strong>cloud platforms<\/strong> (AWS, Google Cloud, Azure), specifically in setting up data pipelines, storage, and compute resources.<\/li>\n\n\n\n<li>Knowledge of <strong>geospatial data processing<\/strong> tools and libraries such as <strong>GDAL<\/strong>, <strong>Rasterio<\/strong>, <strong>Fiona<\/strong>, <strong>Geopandas<\/strong>, or similar for working with satellite imagery and spatial datasets.<\/li>\n\n\n\n<li>Experience working with <strong>SQL<\/strong> and <strong>NoSQL databases<\/strong> (e.g., <strong>PostGIS<\/strong>, <strong>BigQuery<\/strong>, <strong>MongoDB<\/strong>, <strong>Elasticsearch<\/strong>) to store and query large datasets.<\/li>\n\n\n\n<li>Familiarity with <strong>distributed computing frameworks<\/strong> (e.g., <strong>Apache Spark<\/strong>, <strong>Dask<\/strong>, or <strong>Hadoop<\/strong>) for processing large geospatial datasets.<\/li>\n\n\n\n<li>Strong understanding of <strong>data security<\/strong> and <strong>privacy best practices<\/strong>, particularly in managing sensitive satellite and geospatial data.<\/li>\n\n\n\n<li>Experience with <strong>containerization<\/strong> (Docker) and <strong>orchestration<\/strong> tools (Kubernetes) for deploying scalable data infrastructure.<\/li>\n\n\n\n<li>Ability to work effectively in a <strong>remote, collaborative environment<\/strong> and communicate complex technical concepts to cross-functional teams.<\/li>\n\n\n\n<li><strong>Bonus<\/strong>:\n<ul class=\"wp-block-list\">\n<li>Experience working with <strong>satellite data<\/strong> and integrating it into <strong>data pipelines<\/strong>.<\/li>\n\n\n\n<li>Familiarity with <strong>machine learning pipelines<\/strong> and integration of geospatial data into ML workflows.<\/li>\n\n\n\n<li>Experience with <strong>real-time data streaming technologies<\/strong> (e.g., <strong>Kafka<\/strong>, <strong>Kinesis<\/strong>).<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Benefits:<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Competitive salary with <strong>remote working flexibility<\/strong>.<\/li>\n\n\n\n<li>Work on a <strong>mission-driven project<\/strong> that leverages satellite imagery to address key challenges in agriculture, commodities, and insurance.<\/li>\n\n\n\n<li>Collaborate with a global team of <strong>data scientists<\/strong>, <strong>GIS experts<\/strong>, and <strong>engineers<\/strong> on a platform that uses cutting-edge <strong>data engineering<\/strong> and <strong>machine learning<\/strong> technologies.<\/li>\n\n\n\n<li>Access to state-of-the-art <strong>cloud infrastructure<\/strong> and <strong>geospatial data processing tools<\/strong>.<\/li>\n\n\n\n<li>Opportunities for <strong>career growth<\/strong> and professional development within a fast-growing startup.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>How to Apply:<\/strong><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">To apply, please submit your <strong>resume<\/strong>, a <strong>cover letter<\/strong>, and any relevant links to projects, repositories, or work demonstrating your experience with <strong>data engineering<\/strong>, particularly in managing large-scale geospatial data or satellite imagery.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<p class=\"wp-block-paragraph\">We are committed to building a diverse and inclusive team and encourage applications from all backgrounds. Join us in building the infrastructure that will power the future of <strong>satellite imagery analysis<\/strong> and <strong>data-driven decision-making<\/strong>!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>About Us: We are a rapidly growing startup developing a satellite imagery analysis platform that delivers actionable insights to the agriculture, commodity trading, and insurance sectors. Our platform uses real-time satellite data and machine learning models to provide predictions for crop health, yield forecasting, and risk assessment. We are seeking a talented Data Engineer to [&hellip;]<\/p>\n","protected":false},"author":1,"template":"","meta":{"iawp_total_views":68},"class_list":["post-1962","awsm_job_openings","type-awsm_job_openings","status-publish","hentry","job-category-data-engineer","job-type-part-time","job-location-remote"],"blocksy_meta":[],"_links":{"self":[{"href":"https:\/\/groundinsightanalytics.com\/index.php?rest_route=\/wp\/v2\/awsm_job_openings\/1962","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/groundinsightanalytics.com\/index.php?rest_route=\/wp\/v2\/awsm_job_openings"}],"about":[{"href":"https:\/\/groundinsightanalytics.com\/index.php?rest_route=\/wp\/v2\/types\/awsm_job_openings"}],"author":[{"embeddable":true,"href":"https:\/\/groundinsightanalytics.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"wp:attachment":[{"href":"https:\/\/groundinsightanalytics.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=1962"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}