[Due to some personal matters, I am not available for new mentoring sessions until further notice]
Since I started working in 2007, I worked in different roles in the IT in telecom companies in the Middle East and now in AWS. My past and current roles include: Unix/Linux System Administrator and Engineer, Virtualisation and Cloud Engineer, Incident Management Engineer, DevOps/SRE and Cloud Operations Engineer and Team Lead.
Beside my broad technical knowledge in IT infrastructure, I also gained good experience in IT operations, support and service management. I enjoy sharing what I have learnt with others and I always try my best to simplify things and to use common sense when I discuss topics. I always try to raise the bar in the way I communicate with people.
My Mentoring Topics
- Linux Administration.
- Virtualisation and Cloud Computing.
- Team work and communication.
- Site Reliability Engineering.
- IT Operation Excellence.
- IT support and service management.
- Interview preparation.
Networking for Systems Administrators
Michael W Lucas
Key Facts and Insights: Networking Basics: Understanding the fundamentals of networking is crucial for systems administrators. This includes understanding the OSI model, IP addressing, subnets, and routing. TCP/IP: The book goes into detail about TCP/IP, explaining how it works and why it's important for systems administrators to understand. Switches and Routers: The book explains the role of switches and routers in a network, including how they work and how to configure them. Firewalls and VPNs: The book discusses the importance of firewalls and VPNs in network security, including how to set them up and manage them. Network Troubleshooting: The book provides practical advice on troubleshooting network issues, including common problems and their solutions. Network Monitoring: The book emphasizes the importance of monitoring network performance and provides guidance on how to do it effectively. Network Security: The book provides an overview of network security, including how to protect a network from threats and how to respond if a breach occurs. IPv6: The book covers the basics of IPv6, including its benefits and how to implement it. Wireless Networking: The book discusses the unique challenges and solutions associated with wireless networking. Virtualization: The book discusses the impact of virtualization on networking, including its benefits and challenges. In-Depth Analysis: "Networking for Systems Administrators" by Michael W Lucas is a comprehensive guide that provides systems administrators with a deep understanding of the intricacies of networking. It covers a range of topics, from the basics of networking to more complex subjects like network security and virtualization. The first few chapters of the book are dedicated to the fundamentals of networking. Lucas does an excellent job of explaining complex topics like the OSI model, IP addressing, subnets, and routing in a way that is easy for readers to understand. These foundational concepts are crucial for systems administrators, as they form the basis for more advanced topics covered later in the book. One of the strengths of this book is its detailed explanation of TCP/IP, which is arguably the most important protocol suite for systems administrators to understand. Lucas explains how TCP/IP works, why it is important, and how to troubleshoot common TCP/IP issues. The book also provides valuable information on the role of switches and routers in a network. Lucas explains how these devices work, how to configure them, and how they interact with other network components. This knowledge is essential for systems administrators, as switches and routers are key pieces of any network. Another highlight of the book is its coverage of firewalls and VPNs. Lucas discusses the importance of these tools in network security and provides practical advice on how to set them up and manage them. This is particularly relevant in today's cybersecurity landscape, where threats are becoming increasingly sophisticated and frequent. Network troubleshooting is another key topic covered in the book. Lucas provides practical advice on how to diagnose and fix common network issues, which is invaluable for systems administrators who are often called upon to resolve network problems. The importance of network monitoring is emphasized throughout the book. Lucas provides guidance on how to effectively monitor network performance, including what metrics to track and how to interpret them. This information is critical for systems administrators, as it allows them to proactively address potential network issues before they become major problems. The book also provides an overview of network security, including how to protect a network from threats and how to respond if a breach occurs. Lucas covers a range of topics, from securing network devices to implementing intrusion detection systems. This information is particularly relevant in today's cybersecurity landscape, where threats are becoming increasingly sophisticated and frequent. The book covers the basics of IPv6, including its benefits and how to implement it. This is a timely topic, as the internet is gradually transitioning from IPv4 to IPv6 due to the exhaustion of IPv4 addresses. The unique challenges and solutions associated with wireless networking are discussed in the book. Lucas provides practical advice on how to set up and manage a wireless network, which is particularly useful for systems administrators who are responsible for managing wireless networks in addition to wired networks. The book concludes with a discussion on the impact of virtualization on networking. Lucas explains the benefits and challenges of virtualization, and provides guidance on how to effectively manage a virtualized network. This is a relevant topic, as virtualization is becoming increasingly prevalent in today's IT environments. In conclusion, "Networking for Systems Administrators" by Michael W Lucas is a comprehensive guide that provides systems administrators with a deep understanding of the intricacies of networking. It covers a range of topics, from the basics of networking to more complex subjects like network security and virtualization. The book is well-organized and written in a way that is easy for readers to understand, making it a valuable resource for systems administrators of all skill levels.View
Site Reliability Engineering - How Google Runs Production Systems
Niall Richard Murphy, Betsy Beyer, Chris Jones, Jennifer Petoff
Key Insights from the Book: Site Reliability Engineering (SRE) is Google's innovative approach to IT operations, aiming to keep systems up and running while allowing for constant updates and improvements. At its core, SRE is about balancing risk — the risk of system instability against the risk of stifling innovation. The concept of error budget is introduced as a means of measuring system reliability and guiding decisions about when to push new changes. The 'Four Golden Signals' — Latency, Traffic, Errors, and Saturation — are key metrics in monitoring system health. SRE emphasizes automation to eliminate toil and improve system resilience and scalability. Incident management and postmortems are critical in learning from system failures and improving reliability. Adopting SRE requires a cultural shift towards treating operations as a software problem. Capacity planning and demand forecasting are essential for effective resource management. Service Level Objectives (SLOs) and Service Level Agreements (SLAs) are key tools in defining and communicating system reliability expectations. The importance of designing for scale and embracing the inevitability of failure are also highlighted. In-depth Analysis: The book begins by introducing Site Reliability Engineering, a novel discipline that Google pioneered to handle the challenges of running large-scale, mission-critical systems. This approach represents a significant departure from traditional IT operations, treating operations as a software problem and leveraging software engineering principles to solve operational issues. SRE seeks to strike a balance between the need for system stability and the drive for rapid innovation. This is accomplished through the concept of an 'error budget', which quantifies the acceptable level of risk and guides decisions on when to push new changes. In essence, if a service is not consuming its error budget, the system is considered overly reliable and is an indication that more risks can be taken with respect to launching new features or changes. A key strength of the SRE approach is its emphasis on measurement and monitoring. The book introduces the 'Four Golden Signals' — Latency, Traffic, Errors, and Saturation — as the fundamental metrics for system health. These signals provide a comprehensive view of system performance and can guide proactive measures to prevent system degradation or failure. Automation is another major theme in the book. SREs are encouraged to spend time on projects that automate manual, repetitive tasks and eliminate what is termed as 'toil'. This not only improves efficiency but also contributes to system resilience and scalability. Incident management and conducting effective postmortems are presented as critical practices in SRE. These processes aim to learn from system failures and turn them into opportunities for improving system reliability. The book also highlights the need for a cultural shift when adopting SRE, particularly in how organizations view failure. Instead of viewing failure as an exception, SRE treats it as an inevitable part of running systems at scale. This mindset shift leads to designing and building systems that are fault-tolerant and resilient. The importance of capacity planning and demand forecasting is also covered. Effective resource management is crucial to maintain system performance while minimizing costs. The book also introduces Service Level Objectives (SLOs) and Service Level Agreements (SLAs) as key tools for defining and communicating system reliability expectations. These agreements provide a clear understanding of what level of service is expected and what will happen if the service level falls below the agreed threshold. In conclusion, "Site Reliability Engineering - How Google Runs Production Systems" provides a comprehensive overview of Google's innovative approach to IT operations. It offers valuable insights and practical guidance for organizations seeking to improve their systems' reliability and efficiency. The book's focus on balancing risk, automating toil, embracing failure, and measuring everything offers a refreshing perspective on operations in the era of cloud computing and DevOps.View