Last post Jun 13, 2007 09:51 PM by JoshStodola
Jun 13, 2007 04:55 PM|simonmu|LINK
As I promised in a previous post
At the start of this year we decided the time had come to refresh the ASP.NET Web site. The design was tired, the software was old, and the hardware was underpowered. The ASP.NET community deserved better, so we set about improving the site.
For the design we partnered with Goldman Design
On the software side the ASP.NET Web site comprises three main applications (ASP.NET Web
On the hardware side the set up at the start of the year comprised six web servers (some in a load-balanced configuration) and two database servers. These servers were too few and too old, so we purchased four new web servers and two new database servers,
plus additional RAM for some of the existing servers.
At 2:00am EDT on Wednesday, May 16 we set out to launch the refreshed ASP.NET Web site. We disabled the current site and went through a lengthy deployment process that had been planned weeks in advance. One by one we enabled the three main applications.
The ASP.NET Weblogs application faced some configuration errors that were quickly fixed. The ASP.NET Forums application launched with no significant problems. At approximately 5.30am we enabled the new ASP.NET Web application and immediately faced a severe
problem—the application soaked up the CPU and memory resources on its two load-balanced web servers. Worse, this affected the ASP.NET Forums application which had been running fine on its own, but that shared the same web servers.
We focused the team on isolating and fixing the issue but within a few hours it was evident the problem was not a superficial one that could be fixed easily, but a deep-rooted problem that was causing a memory leak. We isolated specific parts of the site
that might have been causing a memory leak, surveyed portions of code that might not be thread safe, and made a number of changes to the live application, but the problem remained too elusive and too deep to be fixed on a live application.
At 1:00pm we decided that the ASP.NET Web application needed to be rolled back, which presented a new problem. During the time we worked on the application, the community had been creating new content on the ASP.NET Weblogs and the ASP.NET Forums. If we
rolled back all three applications in tandem, we risked losing this new content. This was unacceptable so we performed a partial rollback: reverting to the original ASP.NET Web application while staying with the new ASP.NET Weblogs and ASP.NET Forums applications.
Today with the benefit of hindsight we can see that we made two errors, both under the umbrella of testing. We did not properly test the new ASP.NET Web application by subjecting it to a high enough test load before deploying, and we did not properly test
the new design by opening it to community review.
Having learned the first lesson (or relearned, since we all know the importance of testing), Telligent created and ran high-load automated tests to identify the source of the memory leak. It was ultimately found in a function that loaded globalization data
from an XML file. There was a problem with the code that forced it to perform the I/O operation on each and every page request rather than fetch the data from cache. Why did we not catch this problem? Because the code looked okay during review and performed
fine during manual testing. But automated load testing showed that once the site hit 25 concurrent users the memory usage would skyrocket and the processor would peg at 100% usage. After fixing this issue the new site performs very well under load (at least
in testing [:)]).
The second lesson is that a new Web site design should be opened to review by the community. Since we had watched the site slowly evolve, we had lost the objectivity needed to properly judge the site design and feature changes. Once the migration to CS2007
and the new look and feel was applied to the live ASP.NET Forums and live ASP.NET Weblogs, we realized from the vocal feedback from the community (thank you!) that many popular features had been lost in the new software and the new design. Over the past few
weeks we have worked to re-implement all of those features and make changes to the design to address the key concerns. That’s not to say we are finished but that we are working in partnership with you the community to make the site better and more usable.
Where to from here?
Because we rolled back the ASP.NET Web application, we have taken the opportunity to implement some "should have" features that were cut from the version we attempted to launch a few weeks ago. Once we have finished we will place the new ASP.NET Web site
on a sub-domain and open it to community review. While we must accept that design is subjective and a new design will never meet everyone's approval, we can be sure that everyone's favorite feature is still there and still working. Once that is complete we
will complete the rollout and turn off the old design.
In the meantime we will continue to bring online the new servers and make upgrades to the RAM of existing servers. There may be a few occasions when we need to shut down a part of the site while it is transitioned to new servers and you may have seen this
happen already, but we will try to minimize the number and duration of these outages. We will also give you advanced warning if an outage proves necessary.
Once the new ASP.NET Web application has been opened to review and then deployed, and once all the new servers are in production, we will have arrived at where we want to be: a new community-focused Web site running on many fast servers.
Thanks and I hope this provides more context on what we were doing, what happened, and where we are going with the site,
Jun 13, 2007 09:51 PM|JoshStodola|LINK
Thanks, Simon. It helps alot.
I will be looking forward to the final rollout, with hopes that you guys can then focus on less-crucial issues like the color scheme.
Keep up the great work, the community is most definitely improving!!