From Experience to Infrastructure Engineering: A Tale of Kubernetes Certifications

Photo by Ian Taylor on Unsplash

From Experience to Infrastructure Engineering: A Tale of Kubernetes Certifications

Learning Kubernetes the hard way.

Joining PayU Credit has been an incredible journey, where I have the honour of leading a team of exceptional engineers aligned with our business vision to create cutting-edge applications. As the Director of Engineering overseeing customer-facing web and mobile apps, my role extends beyond crafting seamless user experiences to ensuring our team’s continuous growth and productivity across various technological fronts.

For instance, our approach to app development is characterized by innovation and efficiency. Take our flagship LazyPay and Paysense Android apps, for example, both meticulously crafted using Jetpack Compose with Kotlin Coroutines at their core. Similarly, our LazyPay iOS app leverages SwiftUI and Combine to deliver highly responsive user interactions.

But our commitment to excellence doesn’t stop at the front-end. Across both mobile and web applications, we’ve prioritized modularization, laying the groundwork for a scalable architecture that fosters rapid development. By embracing modular frameworks, libraries, micro-apps, and micro-frontends, our team has not only streamlined development processes but also significantly reduced time-to-market for new features and enhancements.

Behind the scenes, our Mobile Backend Service, aptly named Sauron, forms the backbone of our app ecosystem. With a focus on performance, scalability, and availability, the team ensures that Sauron operates at peak efficiency to support our applications’ demands. My interest in Kubernetes(K8s) stemmed from a direct involvement in optimizing Sauron’s infrastructure costs, a journey that has further deepened my appreciation for scalable and cost-effective solutions.

As I delve deeper into the actual problem and its solution, I will also go through the journey of why and how I earned the CKA and CKAD K8s certifications in detail. This journey reflects my ongoing commitment to mastering new technologies and driving continuous improvement. It’s a testament to the belief that growth lies beyond our comfort zones, and I strive to inspire others to embrace this philosophy fearlessly.

The more that you read, the more things you will know. The more that you learn, the more places you will go. — Dr Seuss

The Trigger: Sauron Excessive Logging Issue 📈

Our Mobile Backend Service, Sauron, functions as a Backend-for-Frontend (BFF) layer, orchestrating interactions with our internal microservices. While devoid of any business logic, Sauron optimizes caching and transforms service responses to align with client requirements. Despite its pivotal role, Sauron was burdened with a substantial log ingestion volume, consuming approximately 600GB of logs daily, which surged to a staggering 1.1TB during peak repayment cycles.

To manage this influx, we had initially over-provisioned infrastructure resources, including production clusters and Elasticsearch, to accommodate the heightened demand. However, the escalating costs prompted us to reassess our approach. Recognizing the need to curb infrastructure expenses, our team embarked on a mission to reduce log ingestion.

Reducing the log ingestion by ~90% 🚀

Our efforts to optimize log ingestion yielded substantial benefits, leading to significant cost savings through infrastructure scaling. We implemented several key initiatives to achieve these results:

  1. Eliminated aspect logging from API calls to reduce unnecessary verbosity.

  2. Trimmed down cache hit-and-miss logs, focusing only on essential data.

  3. Reduced excessive response logging to minimize log size.

  4. Implemented smarter logging levels (Verbose, Debug, Warning, Error) to prioritize critical information.

  5. Identified and removed redundant API calls and their corresponding logs.

  6. Utilized local in-memory cache to reduce the need for multiple internal service calls.

  7. Established a meticulous and automated code review process to identify and address logging issues proactively.

  8. Tweaked the Filebeat configuration using drop fields.

These initiatives were implemented gradually and methodically, ensuring minimal disruption to production environments and preserving our debugging capabilities. Over the past six months, our efforts have led to a remarkable reduction in log size, reaching 50GB and 100GB on regular and repayment cycle days, respectively. Moreover, we achieved a 60% reduction in infrastructure usage, resulting in substantial cost savings for the organization.

Infra and Cost Optimisation on Sauron(Mobile Backend Service)

Unforeseen Downtime: Navigating Challenges in Optimization ⚙️

However, our journey wasn’t without its challenges. Despite our best efforts, one particular initiative led to an unexpected hiccup. Following the implementation of various optimizations, we conducted a scale-down activity to assess performance and scalability. Unfortunately, this decision resulted in an unforeseen production downtime for a couple of hours.

As we investigated the root cause, we discovered that our assumptions about the behaviour of the Horizontal Pod Autoscaler(HPA) were flawed. The HPA failed to trigger auto-scaling due to preset limits on the number of requests each POD could manage. Despite the performance metrics like CPU and memory remaining within acceptable thresholds, the POD couldn’t process the incoming requests as expected. This revelation was a pivotal ‘Aha!’ moment for me, prompting a deep dive into Kubernetes and a pursuit of relevant certifications.

We discovered that the HPA wasn’t equipped to meet our specific needs and wasn’t suitable for our use cases. We needed triggers beyond standard performance metrics, such as the number of requests or queue lengths, to effectively auto-scale our deployments. To prevent similar incidents in the future, we implemented Keda, an event-driven auto scaler capable of dynamically adjusting scaling based on multiple factors and attributes.

Why Kubernetes Matters: A Personal Journey 💡

  • Cloud-native apps: As mobile apps increasingly rely on cloud-based services, understanding K8s can help me to design and deploy cloud-native applications that scale and perform well.

  • Backend Service: Since we faced an issue during one of the scale-down activities of our Sauron service earlier, knowing Kubernetes from the ground zero would help me to communicate effectively to our backend team in optimising the services further.

  • Containerization: K8s is built around containerisation(e.g. Docker) and that’s why understanding it better can allow to package the app’s code & dependencies into a single light-weight container, making it easier to manage and deploy.

  • Micro-services architecture: K8s is well-suited for the microservices architecture which is increasingly popular in the front-facing apps. Knowing it better can help in designing and deploying the scalable distributed systems that consist of multiple services communicating with each other.

  • DevOps collaboration: K8s is a key tool for DevOps teams. By learning it, I can now collaborate more effectively with my DevOps colleagues and improve the overall efficiency of our team.

  • Career Growth: Knowing K8s can open up new learning and career opportunities in cloud computing, DevOps and Distributed systems. While I don’t aim to pursue my career in DevOps, having an advanced knowledge around the infra-structure and its internal workings can open doors to many untouched realms of engineering.

  • Better communication with Stakeholders: With K8s knowledge, I can collaborate more effectively with the stakeholders, including product managers, customers and executives about the technical aspects of the backend service. I can now debate around the infra capabilities and its concepts in a more clear and concise manner.

  • Improved debugging skills: Understanding K8s can help in debugging production issues much faster and more efficiently as I will have a deeper understanding of the underlying infrastructure and services. This can in-turn help in improving the overall quality and reliability of our mobile and web apps.

  • Future-proofing: K8s is a rapidly evolving field, and learning it can help me future-proof my skills and stay ahead of the curve in the industry. Having expertise on the mobile engineering aided with K8s knowledge can help me in taking advantage of new technologies and trends as they emerge.

My Certification Journey 🎓

Learning Kubernetes isn’t just a journey — it’s an adventure filled with challenges, discoveries, and moments of triumph. As someone entrenched in the world of experience engineering, diving into Kubernetes meant immersing myself in every layer of its infrastructure. From the fundamental building blocks to the intricate inner workings, I embarked on a journey to unravel the mysteries of Kubernetes and unlock its full potential.

But learning Kubernetes wasn’t just about acquiring knowledge — it was about pushing boundaries and expanding horizons. As I delved deeper into the intricacies of container orchestration, I realized the immense value of formalizing my expertise through certification. It wasn’t just about validating my skills — it was about embracing a new chapter in my journey as a technologist, driven by a passion for continuous learning and growth.

Road to CKAD: Certified Kubernetes Application Developer

I dedicated my off-hours and any available leisure time outside of my regular work schedule to diligent practice. This commitment bore fruit when I successfully passed the CKAD certification exam on my first attempt, achieving an impressive score of 88 out of 100. 🥳

Earners of this designation demonstrated the skills, knowledge and competencies to perform the responsibilities of a Kubernetes Application Developer. Earners are able to define application resources and use core primitives to build, monitor, and troubleshoot scalable applications and tools in Kubernetes. The skills and knowledge demonstrated by earners include Core Concepts, Configuration, Multi-Container Pods, Observability, Pod Design, Services & Networking, State Persistence.

Road to CKA: Certified Kubernetes Administrator

Achieving my CKAD certification was a moment of genuine satisfaction, considering I ventured into a domain completely unfamiliar to me. As I delved deeper into Kubernetes concepts, my interest in mastering the intricacies of Kubernetes intensified. This curiosity led me to set my sights on obtaining the CKA: Certified Kubernetes Administrator Certification, a challenging yet rewarding pursuit.

Transitioning from CKAD to CKA was a natural progression, considering my solid foundation in Kubernetes Application Development. The CKA exam, with its emphasis on both administrative and development aspects of Kubernetes, posed a new challenge. This certification requires a comprehensive understanding of security, roles, cluster debugging, high availability, networking, and more. After diligently preparing for several months following my CKAD certification, I undertook the CKA exam and was thrilled to achieve a score of 99 out of 100, exceeding even my own expectations. 🥳

Earners of this designation demonstrated the skills, knowledge and competencies to perform the responsibilities of a Kubernetes Administrator. Earners demonstrated proficiency in Application Lifecycle Management, Installation, Configuration & Validation, Core Concepts, Networking, Scheduling, Security, Cluster Maintenance, Logging / Monitoring, Storage, and Troubleshooting

Unlocking New Horizons: Embracing the Kubernetes Journey 🌱

As I reflect on my journey from being a mobile engineer to earning CKA and CKAD certifications, I realize that Kubernetes has become an integral part of my skillset. The knowledge and experience I gained have not only helped me in my current role but have also opened up new opportunities for growth and collaboration. If you’re a developer irrespective of your domain expertise and looking to expand your horizons, I encourage you to embark on the Kubernetes journey. It may seem daunting at first, but with persistence and dedication, you’ll find that the benefits far outweigh the challenges. Here are some key takeaways from my experience:

  • Kubernetes is not just for ops teams; it’s for anyone who wants to build scalable, reliable, and efficient systems.

  • Learning Kubernetes requires practice, patience, and persistence.

  • The Kubernetes community is vast and supportive; don’t be afraid to ask questions or seek help.

  • Kubernetes is a constantly evolving field; stay curious and keep learning.

My journey with Kubernetes has been a transformative experience that has helped me grow both professionally and personally. I hope that my story will inspire you to take the leap and join the Kubernetes community. Embrace the journey, and get ready to unlock new possibilities in your career! 🚀

During the recent LazyPay app overhaul, we strategically re-architected our API design, adopting a hybrid approach that seamlessly blends data and user interface elements. This shift enhances scalability and fosters extensibility, ensuring our app remains agile and adaptable to evolving user needs.

Our dedicated team meticulously crafted the LazyPay Android and iOS apps, leveraging the latest technology stacks to deliver a seamless and immersive user experience. Check them out today on the respective app stores and share your valuable feedback! ❤️