Posted 1 month, 3 weeks ago at 16:25. 1 comment
If you’ve recently submitted patches to the Mozilla Try server, you may have been dismayed by the turnaround time for your test results. Indeed, last week we had reports from some developers that they were waiting more than 24 hours to get results for a single Try push in the face of backlogs caused by tree closures.
The chief culprit here was Mountain Lion, or OS X 10.8, which is our smallest pool (99) of test machines. It was not uncommon for there to be over 2,000 pending test jobs for Mountain Lion at any given time last week. Once we reach a pending count that high, we cannot make headway until the weekend when check-in volume drops substantially.
In the face of these delays, developers started landing some patches on mozilla-inbound before the corresponding jobs had finished on Try, and worse still, not killing the obsolete pending jobs on Try. That’s just bad hygiene and practice. Sheriffs had to actively look for the duplicate jobs and kill them up to help decrease load.
We cannot easily increase the size of the Mountain Lion pool. Apple does not allow you to install older OS X versions on new hardware, so our pool size here is capped at the number of machines we bought when 10.8 was released over 2 years ago or what we can scrounge from resellers.
To improve the situation, we made the decision this week to disable 10.8 testing by default on Try. Developers must now select 10.8 explicitly from the “Restrict tests to platform(s)” list on TryChooser if they want to run Mountain Lion tests. If you have an existing Mac Try build that you’d like to back-fill with 10.8 results, please ping the sheriff on duty (sheriffduty) in #developers or #releng and they can help you out *without* incurring another full Try run.
Please note that we do plan to stand up Yosemite (10.10) testing as a replacement for Mountain Lion early in 2015. This is a stop-gap measure until we’re able to do so.
Posted 7 months, 3 weeks ago at 17:11. 1 comment
Releng has been much more diligent during our current team week about preparing presentations and, more importantly, recording sessions for posterity.
Sessions are still ongoing, but the list of presentations is in the wiki. We will continue to add links there.
Special thanks to Armen for helping remoties get dialed-in and for getting everything recorded.
Posted 8 months, 1 week ago at 15:16. 4 comments
Armen has a blog post up about the cost savings Mozilla has been able to realize in its continuous integration infrastructure in Amazon over just the last 3 months. This has been a bit of a sea change for release engineering, who have historically been conservative with regards to changing core infrastructure and practices. We’re all coming to grips with the new world order, but I’m quite excited about the possibilities.
Some quick back-of-the-envelope calculations based on other recent numbers from Armen:
- starting with a low-ball estimate of 7,000 pushes/month, if we project the rate of spending from December ($19/push) over an entire year, we end up with $1,596,000.
- at the new rate ($6/push), a year of AWS time will cost only $504,000.
- that’s a yearly savings of $1,092,000.
If history has taught us anything, continued growth will eat in to at least part of that savings, but think of what Mozilla could do with an extra million dollars. Depending on where we hire them, that money could easily buy 5-10 more engineers to continue driving the mission forward.
Posted 1 year ago at 19:47. 1 comment
Mozilla is graciously giving its staff a two-week break over the holidays. What does this mean for service groups like release engineering?
We too will be away spending time with our families, but we will also continue to monitor the general health of the continuous integration infrastructure. If something happens that knocks an entire platform or datacenter offline, we will stand things back up, but we won’t be worrying about regular day-to-day issues like rebooting individual slaves or fulfilling loan requests.
Barring a chemspill, we also won’t be producing releases over the two-week span, with the exception of Nightly and Aurora daily releases which should continue to happily hum along.
Mozilla staff know how to get in connect with release engineering should it become necessary. For other Mozillians, your best bet is to contact us in the #releng channel on IRC. Ping coop (that’s me), catlee, or hwine if necessary, but please be patient. There may be eggnog involved.
Posted 1 year ago at 00:56. 2 comments
John has posted his farewell over on his blog.
John’s been managing me now for 6.5 years. I’ve had other friends/mentors/managers before, but it was under John that I became a manager myself.
Everything I’ve learned about people-first, no-nonsense management has come from him. Everything I’ve learned about cross-group co-ordination and asking the right questions has come from him. Everything I’ve learned about Irish whiskey has come from him.
On the technical- and pure getting-shit-done-side, the Google Tech Talk he gave on “Release Engineering as a Force Multiplier” is probably the only resume he’ll ever need, even if he doesn’t actually need it.
Where does John’s departure leave Mozilla, and more specifically, our release engineering team, and even more specifically, me?
I don’t know, but I’m optimistic, mostly because of the imprint left by John on all of us.
Posted 1 year ago at 11:55. 0 comments
All good things must come to an end.
Posted 1 year ago at 18:34. 0 comments
Thursday was a planned outing day, with all of releng decamping to the New England Aquarium for the morning. As a trained marine biologist, it was nice for me to be able to drop some science on my Mozilla colleagues.
Posted 1 year, 1 month ago at 17:54. 0 comments
In addition to starting to train some of the new hires in the ancient arts of buildduty and releaseduty, we had 3 great sessions today:
Posted 1 year, 1 month ago at 10:30. 0 comments
With everyone now arrived, John gave his “State of Releng” address in the morning. Here’s a rough transcription, with apologies to Mos Def:
Posted 1 year, 1 month ago at 12:00. 1 comment
People from releng were still arriving on Monday, so we left topics that affect the whole group for later in the week. We did still manage to get a bunch done. Monday had 3 major themes: