Vegas, Baby!

Remember the AD Disaster Recovery Workshop from DEC 2006? Guido and I (and the other MVPs who helped out) apparently didn't suffer enough, because I managed to convince Guido that we should do the workshop again at Windows Connection in Las Vegas this past week. This time however we handled the hardware end of the workshop right by not handling it at all. Guido navigated the trade show teams at HP so adroitly that not only did they agree to provide the server infrastructure, they also agreed to provide 50 workstations for people to do the labs on. Brilliant! No home made servers and no screwed up hotel wireless to mess with!

I had one server shipped to NetPro a few weeks before the show to test it out. It was a DL385 with dual Opterons, 16GB of memory and 6 10K RPM UltraSCSI RAID drives. Nice! When I powered the thing up though, I thought I was standing on an aircraft carrier deck during flight ops. Man, that thing was loud! I had to lock the server up in another room. I loaded the old VMs from the DEC workshop on the server, and they basically seemed to work. (More on that later.)

The plan was for me to show up in Las Vegas on Saturday and start building out the other 5 servers from scratch. Since all I had to do was run the HP SmartStart utilities and install WS 2003 R2 x64, Virtual Server x64, and Monad, I didn't think getting the servers ready to go would be a big deal. And with Brent Harman from NetPro helping out, it wasn't. We had the servers pretty much ready to go by Saturday night.

With all the work I had to do for TechX World, I didn't have much of chance to look at the lab VMs. I knew we had to do a few things to them, but I didn't think it was a whole lot. Likewise Guido was booked with customer visits and also didn't get a chance to check out the VMs. So we went off to Vegas with a sincere hope that the base VM images from DEC in April would be in pretty workable shape. We sure didn't want to pull another couple of all-nighters like we did at DEC.

On Sunday I started poking around the lab VMs and noticed that they weren't replicating. A little more poking around uncovered some significant errors in the AD event logs, and brought about the sudden and unhappy realization that Nov 4 minus March 23 was greater than the 180 day tombstone lifetime. The AD on the images was effectively toast. So there we were at 4 in the afternoon, the day before the workshop, with an utterly broken AD. I considered what it would take to make DC images start replicating again (not that much really), but I didn't have any confidence that after patching them up that there wouldn't some other problems to deal with. So we agreed the best thing to do was to rebuild the AD from scratch by unproting and repromoting the DCs. I took the opportunity to simplify the DNS (a single DNS server in the root), and Guido loaded the directory with data (except for machine accounts for some reason, which caused a little difficulty in the workshop), so the whole process went pretty smoothly. We got a bite to eat around 7, and got the VMs nailed down around 11 or so. Guido was at the end of his string with jet lag, and Brent was pretty fried too, so they both hit the sack a little after midnight.

I stayed to build the 192 VMs from the base images. I had written a command-line utilty called VSCONTROL that automates the creation and configuration of VMs, and that combined with some Monad scripts made the build process pretty quick. The longest task was copying the base images to the different servers (about 20 minutes per server) I finished up around 3am, Brent and Guido came back around 6:30am or so, and I came back around 7:15am to manually reconfigure all IP addresses of all the VMs and to get the room ready for the workshop attendees at 9.

The workshop had sold out (100 people) a month or so before the conference, so we knew we would have a packed room. Guido and I did our presenter thing at the front of the room, and Brent, and Sean Deuby from Intel, and Dan Holme from Intelliem helped with the proctoring responsbilities. We got started on time, the hardware worked fine (except for one client machine with a broken keyboard port), and the VMs worked fine too, with one exception. One of the base VM images on one server was corrupted somehow... probably during the copy. That caused 8 of the 192 VMs to misbehave. Brent managed to get it working again during the session, but we never did figure out exactly what had gone wrong with it. Possibly I copied it while it was still running or shutting down? I dunno. Something to watch out for next time.

Nobody left the workshop until after about 3:30, which isn't bad considering we were, well, in Vegas. And there were people still grinding through the labs at 5pm. We got several rave reviews both during the workshop and later at our respective sessions. So I guess it all came off pretty well. The Windows Connection guys asked us to get ready to do it again next year, so that's says something.

My thanks to Brent, Sean, and Dan for helping out on this endevour, and especially to Guido. Most of the technical content of the workshop comes from the research he did into the guts of AD data recovery. It's been a great learning experience for me personally, and Guido's been a great partner producing these workshops.

NEXT time we'll be more organized!