• Responsible for implementation and ongoing administration of Hadoop infrastructure.
• Responsible for Cluster maintenance, trouble shooting, Monitoring and followed proper backup & Recovery strategies.
• Provisioning and managing the life cycle of multiple clusters like EMR & EKS. Infrastructure monitoring, logging & alerting with Prometheus/Grafana/Splunk
• Spark Coding ( Intermediate )
• SQL Performance Tuning
• Hadoop Queue allocation/distribution in Hadoop/Cloudera environments
• Performance tuning of Hadoop clusters and Hadoop workloads and capacity planning at application/queue level. Responsible for Memory management, Queue allocation, distribution experience in Hadoop/Cloud era environments.
• Should be able to scale clusters in production and have experience with 18/5 or 24/5 production environments. Monitor Hadoop cluster connectivity and security, File system (HDFS) management and monitoring.
• Investigates and analyzes new technical possibilities, tools, and techniques that reduce complexity, create a more efficient and productive delivery process, or create better technical solutions that increase business value. Involved in fixing issues, RCA, suggesting solutions for infrastructure/service components.
• Responsible for meeting Service Level Agreement (SLA) targets, and collaboratively ensuring team targets are met.
• Ensure all changes to the Production systems are planned and approved in accordance with the Change Management process.
• Collaborating with application teams to install operating system and Hadoop updates, patches, version upgrades when required.
• Maintain central dashboards for all System, Data, Utilization, and availability metrics