Join Sepal AI, where we craft the most challenging tests for AI based on real-world software systems! We are looking for a skilled Data Engineer with over 3 years of experience and a strong systems mindset to assist in building evaluation environments for AI in dynamic log analysis contexts.
What You'll Do:
• Design and implement analytical schemas and pipelines utilizing tools such as BigQuery, ClickHouse, Snowflake, Redshift, and other high-performance columnar databases.
• Work on complex, distributed queries across large log and telemetry datasets.
• Create and manage synthetic datasets that reflect real-world DevOps, observability, or cloud infrastructure logs.
• Tune and optimize distributed query execution plans to prevent timeouts and minimize over-scanning.
Who You Are:
• 3+ years of experience in data engineering or backend systems roles.
• Deep expertise in analytical databases and OLAP engines with a specialization in large-scale query optimization, schema design, and performance tuning.
• Experienced with log ingestion pipelines such as FluentBit, Logstash, or Vector, and skilled in schema design for observability systems.
• Strong SQL skills: able to analyze performance issues and identify inefficient query patterns.
• Bonus: Experience with Python, Docker, or synthetic data generation.
Pay: $50 - 85/hr based on experience
Remote, flexible hours
Project timeline: 5-6 weeks