Over the past two weeks Revit Cloud Worksharing has experienced multiple outages. Below is a summary explaining the causes of these incidents and actions we are taking to improve the resiliency of the service going forward.
While disruptive and having different root causes, the incidents have provided new insights to the team about how the service operates under unexpected conditions. As part of the rigorous Incident to Improvement (I2I) process used by our teams, team members must discuss and propose changes that can be made to the service to address issues surfaced by incidents. Improvements identified during each I2I become the topmost priority for the team.
As a result of the investigations into the incidents, and particularly the incident on Wednesday, October 24, we have identified one preexisting bug that may have contributed to one or more of the outages and has only come to light as the service has grown. A fix for this bug has been submitted to the code and is undergoing testing.
The team’s investigation has also reinforced the need for a project that is already on the team’s immediate roadmap and which is intended to reduce the impact of similar conditions. The project will help the service shed load during times of delayed responses from core services. Because Revit automatically retries operations when it cannot immediately connect to the server, to you this will appear as increased operation times until the core service degradation is resolved. This project is the team’s top priority for further development work.
These recent outages are not the experiences we want you to have with our services. We are doing everything in our power to prevent future disruptions, and sincerely apologize for any interruptions these incidents may have had on your work. Thank you for your patience this week. We greatly appreciate your continued use of Autodesk services.
Thanks for the detailed writeup Sasha.
Will customers be offered a partial refund for the work interruptions?
Thanks for clarifying. We have 4 sites with over 200 users and we are still experiencing interruptions even today 10/26/18. How are we to be compensated for this downtime? There is no way to calculate our exact losses, but when adding up the hours + the services we provide to our clients.... its substantial.