Use site reliability engineering to address cloud instability

|