Post-mortem: Extended Deployment time on August 28, 2017

During deployment, we were facing some issues and build.opensuse.org was not accessible for a couple of minutes.

This sucks and that's why we want to give you some insight in what happened.

Problems/Timeline

2017-08-23

We updated Rails to 5.1.3 and dropped one of our initializers which is no longer needed. (See #3659)

2017-08-28

09:21 UTC – We installed the newest OBS packages from our Unstable project. During the installation process, our Apache server gets restarted. After restarting, the Rails application crashed in the initializer we dropped as described above.

09:22 UTC – We were waiting for the packages to be completely installed. Including all service restarts, but this didn't fix the problem.

09:24 UTC – We manually restarted our Apache server. The problem was gone afterward.

Analyzing what went wrong

The package installation triggered a restart of the Apache server when old source code files are still present in the file system, thus the Rails initialization still loaded the dropped initializer, which went horribly wrong.

Improvements

We're considering to restart the Apache server during package installation at a later point (%posttrans) and documenting this effect in our deployment guide.

Resolution

We apologize for the downtime we have caused. The server is back again. We've recorded what went wrong and we make it better next time.

« Highlights of the OBS frontend development sprint - Sprint 22

|

 »