Reclaiming space on stage.mozilla.org
Posted 3 years, 1 month ago at 22:30. 4 comments
For those who want to skip to my specific proposals — there are 6 — for reclaiming space on stage.mozilla.org, please skip ahead to “Redux”, but if you’re going to comment, please read the whole thing.
Everyday we produce up to 17G worth of new nightly builds for Firefox across all branches. This includes all opt+debug builds in 75+ locales, each on 4 operating systems (OSes), each on 9 different project branches. We do reclaim much of this 17G as we retire l10n builds older than 1 week, but we are still creating 1.3G of new nightly en-US builds that we need to store (essentially) indefinitely. nthomas did some cleanup recently as part of bug 562261 and that has bought us some more time, but this inexorable increase will eventually overrun our disk capacity on the staging server. If we add to this disk usage by nightlies from other products which may have worse nightly hygiene habits and the expected increase in space requirements every night as we add 4 new OSes (adding Linux 64bit, Windows 64bit, OSX10.6 64bit and Android), the problem is magnified.
To date, our solution to this problem has been to do periodic cleanups, usually under duress like bug 562261 (which can be error-prone), or to simply buy more disk. As Justin notes, while disk space may be “cheap”, it is not an infinite resource, either in terms of upfront or management cost. We need an actual policy to govern how long we keep nightlies. We can then use that policy as a baseline to frame further discussions about keeping things for longer periods in special cases.
The good news: there *is* space that can be reclaimed: There are two types of wins to be had here:
- One-time space recovery by deleting or archiving material that is no longer useful, or that can be kept offline and spun up as required.
- Codified policy changes for automatically expiring old content.
We can tackle #1 (one-time space recovery) for Firefox by:
- moving no-longer-supported releases, i.e. anything prior to 3.0 (including firebird) to a true archive. This will free up about 125G of space, and will have the added benefit of not housing unsupported builds next to supported builds, making them a little more difficult for people to stumble upon.
- deleting nightly builds older than a certain date. The Firefox nightly directories are conveniently broken down by year, so it makes it easy to see how much disk space we could reclaim by deleting old nightlies:
2004 16G
2005 53G
2006 191G
2007 129G
2008 177G
2009 236G
2010 216G (so far)
At the risk of making others’ arguments for them, both previous times we attempted to come up with a stage cleanup policy (2006 & 2008) the major concern raised was a need to keep builds in perpetuity to allow regression detection via binary search.
So I say this: I’d like to hear from developers who have actually had to perform binary searches of nightlies to let me know exactly how far back in time they have had to go. In the absence of any other data, I’m going to suggest we remove all nightlies prior to 2007 simply because that corresponds with the start of the hg era (March 2007).
Please note, that when I say “remove” in this context of this discussion, I am advocating for “delete” but understand that I may need to settle for “archive” in whatever form that makes sense to IT (slow disk/tape/???).
Other non-Firefox projects will need to make their own decisions as to how/when to archive older releases, but those projects could also reclaim a lot of space by deleting their older nightlies. Here are the aggregate disk usage number for nightly builds of Calendar (both Sunbird & Lightning), Camino, Mozilla Suite (not even built since 2007), and XulRunner, broken down by year:
2001 472M
2002 6.9G
2003 6.7G
2004 23.2G
2005 57.6G
2006 79G
2007 106G
2008 114G
Here are the nightly usage numbers for Thunderbird:
2003 108M
2004 17G
2005 47G
2006 87G
2007 88G
2008 60G
2009 73G
2010 159G (so far)
Again, there are big space recovery wins to be had here, depending on far back we want our accessible, online repository of nightly builds to be.
In terms of policy changes to curb accumulation, I propose implementing three reforms, the first two of which come out of nthomas’ work in bug 562261.
First, we’ll script the automatic expiry of mar files for nightly builds older than 1 month. Only the most recent complete and partial MARs are required, so this gives us a more-than-adequate buffer to detect and fix problems with the nightly update system. We have proven steps from bug 562261 that can be easily cron-ed to run weekly on weekends or other periods when the staging server is (relatively) idle. This can be done across all products.
Second, the RelEng team will start purging the contents of old candidates directories as part of our release process. For those who are unaware, the candidates directory lives under the nightly directory and holds all the various release files (builds, source, signatures, logs) until we green-light the release. Once the release is official, the important contents are sync-ed over to the releases/ subdir and the candidates dir becomes mostly redundant, modulo a few important logging artifacts. We’ll delete the builds for all but the two most recent candidate dirs, but will preserve the text files/logs that tell us important things like # of builds/changesets/build IDs.
Note that this won’t be an automated procedure: the release engineer responsible for the current release will need to go in and look at the candidates directories involved and make a judgment call as to what to delete. Sometimes the release procedure goes awry or we try something new, and it’s important to be able to keep those examples around until we’ve learned what we can from them.
Other projects that currently use a candidates directory for releases should also consider making this change.
Third, we should agree to revisit nightly storage on a yearly basis. Specifically, we should commit to taking the oldest year’s worth of nightly builds offline in January of each year, e.g. if we’re comfortable with a 3-year online repository of nightly builds, in January 2011 we would take the nightly builds for 2007 offline. As you can see from the YTD numbers for 2010, this is unlikely to actually keep up with increasing storage needs, but it is better than nothing.
Redux:
Here’s a brief summary of my proposals for reclaiming space on stage to make it easier for people to respond *AFTER* reading the above:
1) Move Firefox releases that are no longer supported (< 3.0, including firebird) to separate storage.
2) Remove Firefox nightlies prior to 2007, freeing 260G. These can be deleted if they're not going to be used, or archived if we think they might.
3) Remove nightlies for products other than Firefox prior to 2007, freeing 174G. Again, "remove" can mean either deletion or archiving.
4) Automate the deletion of nightly MAR files older than one month. Only the most recent MAR files are required. This would be done across all products.
5) Delete builds from older candidates directories after official release. This will reclaim up to 13G per build attempt per release. This will be a manual process.
6) For every new year going forward, remove the oldest remaining year of nightlies, e.g. for a 3-year history of nightly builds, remove nightly builds from 2007 in January 2011. This will be a manual process.
Feedback is appreciated, either in the newsgroups or in bug 342972.






As recently as six months ago, I had to go back to builds in 2005 to track down a regression on a security bug. I’d cite the bug, but I can no longer see them. I know others have had to track down regression ranges even further back in the last year. I’d prefer archiving to deleting.
I’d also be opposed to deleting (as opposed to archiving) candidate builds because I know QA has often needed to reference them when verifying bugs in later releases.
You failed to mention Thunderbird, was that included in the list of other products?
In any case, yes we’d like to help clean up too, but it’ll be about a week before we start discussion on it (due to 3.1 rc1 pressures).
Maybe instead of deleting all nightly builds that are older than a certain date, you could delete all but one build per time period. E.g. leave one build per week or one per month. This would still enable binary searches to a decent degree of precision.
We hashed out a revised plan on the newsgroups:
http://groups.google.com/group/mozilla.dev.planning/msg/3ec37a1f5275d502?
It boils down to the following:
1) Keep all releases for all products online and available. There’s no need to remove them.
2) Keep all en-US nightly builds for all products online and available. There’s no need to remove them.
3) Delete nightly artifacts for all products that are not useful inregression hunting. Specifically, this means deleting installer files (linux and windows) and xpis from 2009 and earlier. This could represent a one-time space recovery of almost 900GB. Individual products can opt out of this cleanup with sufficient cause.
4) Automate the deletion of nightly MAR files older than one month. Only the most recent MAR files are required. This would be done across all products. (unchanged)
5) Delete builds from older candidates directories after official release. This will reclaim up to 13G per build attempt per release. This will be a manual process. (unchanged)
6) Automate the removal of nightly artifacts older than 6 months for all products that are not useful in regression hunting.