AWS Data Engineer · Mumbai

Building scalable
data systems
on the cloud.

2+ years crafting high-performance data pipelines, data mesh architectures, and cloud-native ETL solutions at Capgemini — with a focus on AWS, PySpark, and Infrastructure as Code.

About

Who I am

I'm Sarthak, an AWS Data Engineer based in Mumbai with a B.E. in Computer Science (9.46 CGPA) from St. Francis Institute of Technology.

At Capgemini, I've designed and deployed scalable data mesh infrastructure, migrated legacy ETL pipelines to modern AWS architectures, and helped teams ship data products reliably — with a focus on performance, cost, and governance.

I thrive at the intersection of cloud engineering, data architecture, and automation — always looking for ways to cut complexity and ship faster.

2+
Years of professional experience
20+
ETL pipelines migrated
40%
Data processing speed improvement
99.5%
Pipeline uptime maintained

Skills

What I work with

☁️

AWS Cloud Services

S3LambdaGlue AthenaEMRRedshift KinesisDynamoDBStep Functions
⚙️

Data Engineering

ETL PipelinesData Mesh Data WarehousingApache Spark Real-time ProcessingStreaming Analytics
💻

Programming

PythonPySparkSQL
🔧

DevOps & IaC

AWS CDKCloudFormation GitLab CI/CDServerless CodePipeline
🗄️

Databases

MySQLNoSQL Data LakesRedshift
🤝

Process & Tools

AgileJira ServiceNowStakeholder Comms

Experience

Where I've worked

Aug 2024 – Jul 2025
Capgemini

Associate Consultant

  • Architected AWS data mesh (SDLF v2) across business units, improving governance compliance by 40%
  • Migrated 20+ legacy ETL pipelines to modern data mesh; 100+ additional in progress
  • Optimized PySpark Glue ETL jobs — achieved 30% CPU reduction and monthly cost savings
  • Automated IaC deployment using AWS CDK & CloudFormation, reducing deployment time by 20%
  • Established GitLab CI/CD pipelines for dev, staging, and production environments
Sep 2022 – Aug 2024
Capgemini

Senior Analyst

  • Built ETL pipelines using Lambda, Glue, Athena — improved data processing speed by 40%
  • Migrated 20+ table prep flows to AWS SDLF with comprehensive QA and error handling
  • Automated JSON-to-PySpark conversion scripts, cutting migration time by 60%
  • Implemented Amazon Redshift data warehousing for BI and analytics requirements

Projects

What I've built

🧠
Jupyter Notebook

NLP Mini Project

Natural Language Processing project exploring text analysis and machine learning techniques.

🗣️
Jupyter Notebook

Speech / English to SQL

Converts natural language or speech input into SQL queries — bridging human language and databases.

🎮
Python

Memory Game

A Python-based memory card game — a fun personal project exploring game logic and UI.

🤖
Jupyter Notebook

LLMs from Scratch

Step-by-step implementation of a ChatGPT-like LLM in PyTorch — exploring the internals of large language models.

Certifications

Credentials & achievements

🔷

AZ-900

Microsoft Azure Fundamentals

☁️

GCDL

Google Cloud Digital Leader

🔄

Agile

Agile Software Development

Star Performer

Service Delivery Award — 2025

Let's work together

I'm open to Data Engineering roles, cloud architecture projects, and interesting collaborations. Feel free to reach out — I typically respond within a day.

Send an email →