Hello! I'm Denis Moura, a Senior Data Engineer with over 4 years of experience in building scalable and resilient data pipelines and data platforms. My passion lies in working with data and using modern technologies to solve complex data problems. I have extensive experience with data models, SQL, AWS, and a deep love for Python. You can find me on LinkedIn and explore my projects here on GitHub, although I haven't been much active in my public projects lately.
- Programming Languages: Python, SQL, JavaScript
- Big Data Technologies: Hive, S3, Google Storage, Presto, Athena, BigQuery, Spark
- Data Warehousing & ETL: Snowflake, Airflow, AWS Glue, AWS Step Functions, Lambda Functions, Kafka
- Cloud Platforms: AWS, GCP
- Data Modeling & Quality: Data Lakes, Delta Lakes, Data Governance, Data Validations
- Tools: Terraform, Docker, Kubernetes, Git, GitHub Actions
- Data Visualization: Sigma Computing, PowerBI, Looker Studio, Metabase
- Methodologies: Agile/Scrum
- Developed a data migration platform with custom, reusable operators in Airflow for large-scale ETL batch processes, reducing AWS costs significantly.
- Managed multiple data projects migrating data from on-premises solutions and third-party APIs to Snowflake, handling up to 10 million records per day using Kafka for batch and streaming pipelines.
- Created various reports and dashboards using Spotfire, Sigma Computing, and PowerBI.
Technologies: Python, Airflow, Snowflake, AWS, Git, GitHub Actions, Terraform, Docker, Kubernetes, Kafka
- Led a data migration project to create a data lake for 70 Terabytes of genomic data on AWS, employing S3, Glue, Athena, Lake Formation, and EMR to build a custom Delta Lake structure.
- Developed and maintained numerous reports and dashboards for internal clients, streamlining genomics pipeline monitoring and final user results analysis.
Technologies: Python, AWS Glue, AWS Step Functions, AWS Athena, Terraform, Docker, Git, PySpark
- Developed and maintained a microscopy solution, automating robot movements and enhancing camera focus using Python and C libraries.
- Spearheaded an international data science project using Python for COVID-19 network analysis and led an on-premises to cloud data lake migration project using Airflow and AWS.
Technologies: Python, Airflow, AWS, Network Science, Kubernetes, Deep Learning, Computer Vision
- Ph.D. in Applied Biology (Bioinformatics), Universidade Federal de Pernambuco, 2022
- M.Sc. in Applied Biology (Neuroscience & Bioinformatics), Universidade Federal de Pernambuco, 2018
- B.Sc. in Biology, Universidade Federal de Pernambuco, 2015
- Data Migration Platform: Developed a reusable data migration platform in Airflow, optimizing cost and performance for a global client.
- Genomic Data Lake: Led the creation of a genomic data lake, enhancing data governance and compliance with legislation.
- On-premises to AWS Data Lake: Led the creation of a data lake in AWS, moving daily and almost real time data from On-Premises to AWS using Airflow.
- Microscopy Automation: Built and maintained an automated microscopy solution, contributing to advanced research capabilities.
Feel free to reach out via Email or connect with me on LinkedIn.
Thanks for visiting my GitHub profile! Explore my repositories and feel free to contribute or reach out if you have any questions or collaboration ideas.