Why you should stop blaming others right away? | Tharun Shiv | SRE Mindset | Become a better Site Reliability Engineering
Manage episode 317596730 series 3112412
Subscribe to the podcast to get latest episodes
1. SRE is all about the right Mindset
a. No blame game
b. Thirst to solve
As an SRE we deal with multiple components and are a bridge between the users and the application. Even though the application is well written, a bigger responsibility falls upon SRE to keep the applications and the services it uses up and running. In this process, there might be a few situations where one of the SRE does a mistake that causes a disruption or even an outage. When this happens, the first thing to happen shouldn't be to blame anyone for the outage, but the following has to be performed.
i. Fix the issue
ii. Write an RCA ( Root Cause Analysis ) that mentions why the issue occurred in the first place, the names can be anonymous.
iii. Mention the first aid and the fix for the issue
iv. Discuss how the issue can be prevented the next time
v. Set an ETA for the fix
Another aspect is to have the right mindset to solve problems. As an SRE you have the responsibility to optimize the infrastructure, fix issues, build automation tools, monitoring tools, and more, which requires a lot of problem-solving skills. Unless you have the thirst to solve the problems, you will only feel more stressed out, or even worse, would cause issues.
2. Communication
a. Overcommunication is not a problem
b. Be kind and show empathy
Are you performing a production activity or even a stage change that could affect other teams? Have you made progress in the project that you are working on? Make sure to keep the necessary stakeholders in sync always. Write emails, send slack messages well in advance before the production activity, just before and after the activity. It might sound like over-communication, but trust me, as the company scales, you need to keep everyone relevant to the component that you are working on in sync. This way, if they have to take any actions from their side, they will do it, or if they face any issues post-activity they'll know who the right person to get in touch with is.
One other important characteristic to have as a human being is to be kind and show empathy. This will apply to all levels of engineering on either side of the conversation, period. Whether someone asks a silly question, or does a mistake, or behaves rudely with you, you should never mirror that behavior.
3. Stay synced with the team
a. Do not miss team meetings
b. Prevent duplication of work
c. Do not compete, but contribute
In this work from home ( WFH ) period, the only time where you have an opportunity to speak to your teammates is during a team meet. The reason why this is special is, you get an opportunity to stay synced with your team on what they all are working on, whether they are blocked on any tasks, how you can contribute to their tasks and also you will be using this opportunity to convey on what you are working on and get help if necessary. This also prevents duplication of work.
4. Shadow teammates on tasks and issues
The best way to learn is by doing it hands-on and the best way to begin would be by watching how it is done. I also believe that the best way to retain the learned information is by performing it repeatedly. This also includes watching your teammates perform the activities. It ensures that the activity is done without any mistakes when there are several eyes to watch it.
5. No Spoon-feeding, do homework
Do not expect all details to be taught by your teammates and seniors. Read the documentation, watch tutorials, read engineering blogs, practice on your own, and suggest improvisations. Even a well-built system will have much more efficient solutions, that you can propose
50 episodes