We are a fast-growing SaaS company in the EduTech space, operating across Singapore, Vietnam, and Malaysia. With almost 2,000 schools using our platform, our mission is to transform early childhood education through data-driven technology.
As we scale, building a reliable and high-quality data foundation is core to our product and business. We are expanding our data team and seeking a skilled Data Engineer to design, build, and maintain our data warehouse and pipelines.
⸻
About the Role
We are looking for an experienced Data Engineer to lead the development of our enterprise data warehouse and data pipelines. You will work with AWS Glue (Spark), Athena, and various internal/external data sources (CRMs, product analytics tools, operational databases) to build a scalable, consistent, and trusted data platform.
This role is critical for enabling analytics, reporting, AI insights, and cross-product data visibility.
⸻
Key Responsibilities
- Design, build, and manage a scalable Data Warehouse that supports analytics, reporting, and product features
- Develop efficient ETL/ELT pipelines using AWS Glue (PySpark), Python, and SQL.
- Integrate data from multiple sources such as CRMs, product analytics tools, internal systems, and third-party platforms.
- Build data models (star/snowflake schemas) optimised for BI tools such as Power BI.
- Ensure high data quality through validation, monitoring, logging, and data profiling.
- Implement best practices for data versioning, lineage, and governance.
- Work closely with product, engineering, analytics, and operations teams to translate business needs into data solutions.
- Optimise Glue jobs and Spark transformations for performance and cost efficiency.
- Maintain and improve the overall data architecture and storage strategy (S3, Glue Catalog, Athena).
- Document data pipelines, schemas, and operational procedures clearly.
⸻
Requirements
- Bachelor's degree in Computer Science, Information Systems, or related field.
- Strong experience in data engineering, data warehousing, and building ETL/ELT pipelines.
- Hands-on experience with AWS Glue (Spark) and AWS Athena.
- Strong SQL skills (preferably PostgreSQL).
- Proficiency in Python and data-related libraries.
- Experience designing dimensional data models and working with large datasets.
- Solid understanding of distributed computing concepts and data pipeline performance tuning.
- Experience working with APIs or webhook-based integrations.
- Strong problem-solving skills with high attention to data quality and reliability.
- Good communication and documentation skills.
⸻
Good to Have
- Experience with SaaS products, education systems, or operational data (school/CRM/finance).
- Familiarity with BI tools such as Power BI or Looker.
- Experience with event/analytics platforms (e.g., Mixpanel, Segment).
- Knowledge of data governance, compliance, and privacy frameworks (PDPA, GDPR).
- Experience with cloud platforms (AWS preferred; GCP/Azure is a plus).