With a strong skill set in architecting and implementing event-driven streaming, ETL systems, and data modeling in cloud environments, I have a proven track record of delivering high-performance solutions across diverse business domains. I thrive on tackling complex challenges and optimizing systems for high availability and maintainability working in numerous product led companies. One of my strengths lies in architecting data pipelines from end to end, optimizing system performance and reducing costs. For ex. experimenting and fine-tuning state descriptors and object size, resulting in an impressive 40% reduction in cloud computation costs. Additionally, I've developed best practices for designing Spark jobs, resulting in reduction in operational costs and execution time. I'm well-versed in cloud technologies, like GCP/AWS using Dataproc/EMR, Airflow, Database technologies (row vs Column) Terraform, and Kubernetes. I have extensive experience in building efficient data pipelines and integrating third-party data sources, leveraging Python, Kafka, and Spark to ensure maintainability and streamlined operations. As as experienced developer and data engineer I have experience in data modelling using cloud technologies (creating the data pipelines) and also creating realtime streaming applications using modern frameworks(like Flink). As a mentor, I'm dedicated to sharing my expertise and supporting others in their professional growth. I help beginners enter into the world of data engineering, I can provide valuable guidance in architecting robust systems, optimizing data pipelines, implementing best practices, and overcoming data-related challenges. I have a strong passion for continuous improvement and can help mentees navigate the ever-evolving landscape of data engineering.

My Mentoring Topics

  • Data engineering
  • Apache Spark
  • Apache Airflow
  • Cassandra
  • Programming (Python/Java)
  • Database
  • Cloud/Infrastructure
  • Design Review
  • Project management
  • Data modelling
  • Designing data pipelines in cloud
  • Career Change
A.
29.August 2023

J.
1.August 2023

Vivek is very knowledgeable and has a lot of experience which helped him guide our conversation. He was happy to share his insights and answer all my questions.

G.
31.July 2023

I needed to clear few technical/architectural level questions with Vivek, so I setup a session with him. Thanks for the time Vivek. Admire the technical depth in your answers.

I.
30.July 2023

Vivek's mentoring was good! His expertise in architecting ETL systems, and cloud-based data modeling is exceptional. With his guidance, I understood optimizing pipelines, cutting costs and improving performance. As a mentor, he's patient, supportive, and dedicated. I would recommend him to folks if you want to venture into data engineering!

B.
30.July 2023

Mentoring with Vivek was an absolute game-changer! His exceptional skill set in architecting event-driven streaming, ETL systems, and data modeling in the cloud left a lasting impact. With his guidance, I optimized data pipelines, achieving significant cost reductions and improved performance. Vivek's expertise in Spark jobs and cloud technologies like GCP/AWS proved invaluable. As a mentor, he was patient, supportive, and deeply committed to continuous improvement. I wholeheartedly recommend learning from him to elevate your data engineering skills and excel in this dynamic field!

N.
8.July 2023

I had a great session. he cleared all my doubts.he was super helpful and humble. Vivek is the perfect mentor. he provided all the steps for my preparation.I am short of words to say thank you. thank you so much sir for all the help and guidance.looking forward to meeting you soon:)

Designing Data-Intensive Applications - The Big Ideas Behind Reliable, Scalable, and Maintainable Systems
Martin Kleppmann

Key Facts and Insights The book explores the underlying principles of data systems and how they are used to build reliable, scalable, and maintainable applications. It outlines the importance of distributed systems in handling data-intensive applications and how to deal with the challenges associated with them. The book emphasizes on the trade-offs involved in choosing particular data structures, algorithms, and architectures for data-intensive applications. It provides a detailed explanation of the three main components of data systems: storage, retrieval, and processing. It presents an in-depth understanding of consistency and consensus in the context of distributed systems. The book discusses various data models, including relational, document, graph, and many more, along with their suitable use cases. It also examines the concept of stream processing and batch processing, their differences, and when to use each. It underlines the significance of maintaining data integrity and the techniques to ensure it. It offers comprehensive coverage of the replication and partitioning strategies in distributed systems. The book provides a balanced view of various system design approaches, explaining their strengths and weaknesses. Lastly, the book does not recommend one-size-fits-all solutions. Instead, it equips the reader with principles and tools to make informed decisions depending on the requirements of their projects. In-Depth Analysis of the Book "Designing Data-Intensive Applications" by Martin Kleppmann is a comprehensive guide to understanding the fundamental principles of data systems and their effective application in designing reliable, scalable, and maintainable systems. It provides an exhaustive account of the paradigms and strategies used in data management and their practical implications. Understanding Data Systems The book begins by introducing the basics of data systems, explaining their role in managing and processing large volumes of data. It delves into the three main components of data systems: storage, retrieval, and processing. Each component is explored in detail, providing the reader with a clear understanding of its functionality and importance in a data system. Data Models and Query Languages The book delves into the various data models used in data-intensive applications, such as relational, document, and graph models. It provides a comparative analysis of these models, highlighting their strengths and weaknesses, and the specific use cases they are best suited for. Additionally, it discusses the role of query languages in data interaction, explaining how they facilitate communication between the user and the data system. Storage and Retrieval The book explains the techniques and data structures used for efficiently storing and retrieving data. It underlines the trade-offs involved in choosing a particular approach, emphasizing the importance of taking into account the specific requirements of the application. Distributed Data The book delves into the complexities of distributed data. It outlines the significance of distributed systems in handling data-intensive applications and discusses the challenges associated with them, such as data replication, consistency, and consensus. It also provides solutions to these challenges, equipping the reader with strategies to effectively manage distributed data. Data Integrity The book underscores the significance of maintaining data integrity. It provides an in-depth understanding of the concept and discusses techniques to ensure it, such as atomicity, consistency, isolation, and durability (ACID) and base properties. Stream Processing and Batch Processing The book examines the concept of stream processing and batch processing. It discusses their differences, the challenges associated with each, and the scenarios where one would be preferred over the other. Conclusion In conclusion, "Designing Data-Intensive Applications" is a comprehensive guide that provides readers with a deep understanding of data systems. It equips them with the knowledge to make informed decisions when designing data-intensive applications, based on the specific requirements of their projects. The book's strength lies in its balanced view of various system design approaches, offering a holistic understanding of the dynamics involved in managing data. It is an essential read for anyone seeking to delve into the world of data systems.

View
Domain-driven Design - Tackling Complexity in the Heart of Software
Eric Evans, Eric J. Evans

Key Facts and Insights from the Book Domain-Driven Design (DDD) is a software development approach that focuses on the core domain and domain logic, rather than the technology used in implementing systems. DDD uses a model-driven design where the model encapsulates complex business rules and processes. This model becomes an essential part of the language used by both the team and the business experts. Ubiquitous Language is a key concept in DDD, a common language that is developed by the team for describing system functionalities. It bridges the gap between the technical team and the business experts. DDD promotes Bounded Contexts, which define the boundaries within which a model is applicable and where the Ubiquitous Language is valid. DDD uses strategic design tools like Context Mapping and Distillation to manage complexities and focus on the core domain. Entities, Value Objects, Aggregates, and Services are fundamental building blocks in DDD to model the domain. DDD advocates for a collaborative and iterative process involving domain experts, which leads to a deep understanding of the domain and a model that accurately reflects it. Repositories are used in DDD to provide an illusion of a collection of all objects of a certain type. An In-Depth Analysis of the Book In his book, Eric Evans provides a comprehensive guide to tackling complex software projects using Domain-Driven Design (DDD). The book is divided into four major parts: Putting the Domain Model to Work, The Building Blocks of a Model-Driven Design, Refactoring Toward Deeper Insight, and Strategic Design. In Putting the Domain Model to Work, Evans introduces the concept of a Domain Model, an abstraction that represents the knowledge and activities that govern the business domain. He emphasizes the importance of the model being a collaboration between technical and domain experts, and not just a schema for data. The section also introduces the concept of Ubiquitous Language, a common, rigorous language between developers and domain experts. This language, used in diagrams, writing, and conversation, reduces misunderstandings and improves communication. The Building Blocks of a Model-Driven Design is where Evans lays out the elements used to construct a model: Entities, Value Objects, Services, Modules, Aggregates, and Repositories. Entities are objects defined by their identity rather than their attributes. Value Objects, on the other hand, are described by their attributes and don't have an identity. Services are operations that don't naturally belong to an object, and Repositories provide a way to access Entities and Value Objects. Refactoring Toward Deeper Insight delves into the iterative nature of DDD. It discusses how to incorporate new insights into the model and refine the model to make it reflect the domain with greater clarity and depth. One of the key techniques mentioned here is Model-Driven Design. The last part, Strategic Design, discusses managing the complexity of large systems. It introduces the concept of Bounded Context, which defines the applicability of a model within specific boundaries. Context Mapping is then used to understand the relationship between different bounded contexts. The book also discusses the concept of Distillation, where the most valuable concepts in a model are identified and isolated, to ensure they don't get lost in the complexity. Evans' book provides a comprehensive methodology for tackling complex domains. By focusing on the core domain, modeling it accurately, and continuously refining the model, software developers can create systems that provide real business value and are adaptable to changing business needs. Domain-Driven Design is not just a technical approach, but a way of thinking, a mindset that puts the domain and its complexity at the heart of software development.

View
Stream Processing with Apache Flink - Fundamentals, Implementation, and Operation of Streaming Applications
Fabian Hueske, Vasiliki Kalavri

Key Facts and Insights The book provides an in-depth understanding of stream processing, a real-time data processing technique. It introduces Apache Flink, a powerful open-source platform for stream processing. The book covers the fundamentals, implementation, and operation of streaming applications using Apache Flink. It offers a thorough understanding of the architecture and components of Apache Flink. The authors provide detailed explanations of the basic principles of stream processing, such as event time processing and fault tolerance. It offers practical examples and case studies to show the application of the theoretical concepts discussed. The book includes code samples for different use cases and scenarios to provide hands-on experience. It discusses various deployment scenarios of Apache Flink, including in the cloud, on-premises, and hybrid. The book offers advanced topics such as state management and dynamic scaling. The authors provide advice on how to troubleshoot and optimize Apache Flink applications. The book also covers the future development and potential of stream processing and Apache Flink. Detailed Analysis "Stream Processing with Apache Flink" by Fabian Hueske and Vasiliki Kalavri is an exhaustive guide on stream processing using Apache Flink. The authors are both experienced contributors to the Apache Flink project, making this book a reliable source of information and technical insights. Stream processing is a technique used to process real-time data, and Apache Flink is a powerful, open-source platform that allows for efficient stream processing. The book starts with an introduction to stream processing, explaining its importance in today's rapidly evolving digital landscape. The authors do an excellent job of explaining the challenges of dealing with real-time data and the role of stream processing in addressing these challenges. The authors introduce Apache Flink as an effective solution for stream processing tasks. They offer a comprehensive overview of the architecture and components of Apache Flink, making it easy for readers, even those without prior knowledge of Apache Flink, to understand the platform. A significant part of the book focuses on the fundamentals of stream processing. The authors discuss crucial concepts such as event time processing, windowing, and fault tolerance. They provide detailed explanations of these principles, enhancing the reader's understanding of stream processing. The book also includes code samples for different use cases and scenarios. These practical examples help readers apply the theoretical concepts discussed in the book, providing them with hands-on experience. The authors also discuss the deployment scenarios of Apache Flink. They cover different deployment options, including deploying in the cloud, on-premises, and hybrid environments. This part of the book is particularly useful for professionals looking to implement Apache Flink in their organizations. Advanced topics such as state management and dynamic scaling are also covered in the book. These topics are essential for building robust and scalable stream processing applications. The authors provide advice on how to troubleshoot and optimize Apache Flink applications, which can be invaluable for developers working with Apache Flink. The book concludes with a discussion on the future development and potential of stream processing and Apache Flink. The authors share their insights on the future trends in stream processing, making the book relevant for those interested in the future of data processing. In conclusion, "Stream Processing with Apache Flink" is an essential resource for anyone interested in stream processing. It offers a comprehensive guide to understanding and implementing Apache Flink, making it a valuable addition to the library of any data professional.

View
Spark: The Definitive Guide - Big Data Processing Made Simple
Bill Chambers, Matei Zaharia

Key Facts and Insights from "Spark: The Definitive Guide - Big Data Processing Made Simple" Introduction to Apache Spark: The book offers a comprehensive introduction to Apache Spark, its architecture, and its components including Spark SQL, Spark Streaming, MLlib, and GraphX. Data Processing: It delves into the concept of distributed data processing, explaining how Spark can handle large amounts of data efficiently. Programming in Spark: The authors provide a thorough understanding of programming in Spark using both Python and Scala, with practical examples and use cases. DataFrames and Datasets: This book describes how DataFrames and Datasets can be used for structured data processing in Spark. MLlib: It provides an in-depth understanding of MLlib, the machine learning library in Spark, and how to use it for creating machine learning models. Spark Streaming: There is a comprehensive guide to Spark Streaming, explaining how to perform real-time data processing. Performance Tuning: The book provides effective strategies for tuning Spark applications for maximum performance. Spark Deployment: Readers will learn about deployment options for Spark applications, including standalone, Mesos, and YARN. Spark SQL: The book gives a thorough coverage of Spark SQL, including data manipulation and querying. GraphX: The book offers insights into GraphX, a graph processing framework in Spark. Future of Spark: The final part of the book discusses the future of Spark and big data processing. In-depth Summary and Analysis "Spark: The Definitive Guide" by Bill Chambers and Matei Zaharia is a comprehensive resource for anyone interested in learning about Apache Spark, a powerful open-source unified analytics engine for large-scale data processing. The authors begin by introducing Apache Spark, explaining its architecture, and the various components, such as Spark SQL, Spark Streaming, MLlib, and GraphX. They explain how Spark allows for distributed data processing, emphasizing its ability to handle large amounts of data swiftly and efficiently. This sets the stage for understanding the importance of Spark in the world of big data. The book then dives into programming in Spark, using both Python and Scala. The authors provide practical examples and use cases, which make the concepts clear and easy to understand. They discuss the use of RDD (Resilient Distributed Dataset), which is the fundamental data structure of Spark. The authors then explain the concept of DataFrames and Datasets, which simplify structured data processing in Spark. They provide detailed examples and use cases, demonstrating how these structures can be used to manipulate and process data. One of the most valuable sections of the book is the one on MLlib. The authors delve into the machine learning library in Spark, explaining how to utilize it for creating machine learning models. They discuss various algorithms available in MLlib, and how to implement them. The book also provides a comprehensive guide to Spark Streaming, which allows for real-time data processing. The authors discuss how to use DStream API and Structured Streaming API to process live data streams. As performance is a key aspect of any application, the book provides effective strategies for tuning Spark applications for maximum performance. It also discusses various deployment options for Spark applications, such as standalone, Mesos, and YARN, helping readers understand the pros and cons of each. The book provides a thorough coverage of Spark SQL, including data manipulation and querying. It explains how Spark SQL integrates with DataFrames and Datasets, providing a unified interface for structured data processing. The authors also offer insights into GraphX, a graph processing framework in Spark. They discuss how to use GraphX to process and analyze graph data, providing practical examples. The final part of the book discusses the future of Spark and big data processing, giving an outlook on upcoming features and improvements in Spark. In conclusion, "Spark: The Definitive Guide" is a comprehensive resource that covers all aspects of Apache Spark. It is a must-read for anyone interested in big data processing, providing insights, practical examples, and strategies for effectively using Spark. It not only equips readers with the knowledge to use Spark but also inspires them to explore further and make their contributions to this exciting field.

View
Mastering Linux Shell Scripting, - A practical guide to Linux command-line, Bash scripting, and Shell programming, 2nd Edition
Mokhtar Ebrahim, Andrew Mallett

Key Facts and Insights: Bash scripting basics: The book begins with the basics of Bash scripting, a powerful tool for automating tasks in Unix and Unix-like operating systems. Command-line techniques: The reader is introduced to the practical techniques of using the Linux command-line interface, which is an essential skill for Linux administrators and developers. Shell programming: In-depth coverage is given to shell programming, a method of using the shell's abilities to automate tasks. Script debugging and testing: The book provides valuable insights into script debugging and testing, which is crucial for ensuring the correct operation of scripts. Advanced topics: Advanced topics such as process signaling and management, regular expressions, and sed and awk programming are also covered. Real-world examples: The book includes numerous real-world examples and exercises to reinforce the concepts and techniques discussed. Updated and expanded: This second edition has been updated and expanded to cover the latest trends and techniques in Linux shell scripting and command-line use. Analysis and Conclusions: Mastering Linux Shell Scripting, second edition by Mokhtar Ebrahim and Andrew Mallett, is a comprehensive and practical guide to the world of Linux command-line, Bash scripting, and shell programming. Drawing on their combined expertise and experience, the authors have created a resource that is both informative for beginners and invaluable for seasoned professionals. The book commences with an introduction to the basics of Bash scripting. Bash, or the Bourne Again SHell, is the default shell for most Linux distributions and is a powerful tool for automating tasks. The authors skillfully guide the reader through the process of creating and executing Bash scripts, laying a solid foundation for the more advanced topics discussed later in the book. From here, the book delves into the practical techniques of using the Linux command-line interface. The command-line is an essential tool for Linux administrators and developers, offering a level of control and flexibility not found in graphical user interfaces. The book covers a wide range of command-line techniques, from navigating directories and manipulating files to managing processes and troubleshooting system issues. The next section of the book is dedicated to shell programming. Shell programming, or shell scripting, is a method of automating tasks using the shell's capabilities. The authors provide an in-depth look at the various aspects of shell programming, including variables, control structures, functions, and arrays. A critical aspect of scripting that the book addresses is script debugging and testing. Debugging is the process of identifying and resolving issues in a script, while testing ensures that the script behaves as expected under various conditions. The book provides valuable insights and techniques for both these areas, which are crucial for ensuring the correct operation of scripts. The book also covers advanced topics such as process signaling and management, regular expressions, and sed and awk programming. These topics are essential for those who wish to delve deeper into the world of Linux shell scripting and command-line use. One of the book's strengths is its inclusion of numerous real-world examples and exercises. These practical applications serve to reinforce the concepts and techniques discussed, aiding the reader in their understanding and mastery of the material. The second edition of Mastering Linux Shell Scripting has been updated and expanded, reflecting the latest trends and techniques in the field. This makes it an essential resource for anyone looking to keep their skills and knowledge up to date. In conclusion, Mastering Linux Shell Scripting is a comprehensive and practical guide to Linux command-line, Bash scripting, and shell programming. Whether you are a beginner looking to learn the basics, or a seasoned professional seeking to enhance your skills, this book is a valuable resource that will aid you in your journey.

View