Darrick finally posted his version of the pictures from our September road trip. Specifically, everything from the Grand Canyon onward are ones I haven't gotten around to posting here yet. There are also others from earlier in the trip that aren't posted here, because I was far more ruthless in deleting photos that weren't so great. Plus, he posted all the panoramas. I haven't done that yet, either. One of these days I'll get to it... maybe... Anyhow, click here to see them.
Click below for the photo gallery from the Zion Narrows hike. Possibly the best pictures from the entire trip. I want to go back. As I've said before, the Panoramas will come later. (Though, most of the panoramas from this particular hike are of the vertical variety as opposed to the horizontal due to the landscape.)
This is a small gallery, just a few photos from Las Vegas at night. Most of the photos that turned out from this night are parts of panoramas which will be posted later. Most of the rest just flat out didn't turn out.
After checking into the hotel room around 2:00 PM, we went straight to sleep... woke up... went to dinner... and got back out just before midnight to take these pictures. We were hoping for some of the water show at the Bellagio, but just as we were walking up, the last show of the night (midnight) was wrapping up.
... but not all of them. I know, the headline is evil and misleading. :-) I've been working on some post-processing of the pictures we took on the trip, which involves a bit of learning how to manipulate .CR2 files and bracketed images to correct exposure settings. Here's a teaser of what I've done so far. (Click on either picture to open a REALLY BIG version in a new window.)
Death Valley at Sunrise:
(This is a blended HDR image from three bracketed shots IMG_6887, IMG_6888, IMG_6889) on a tripod at various exposure levels.
This is a modified single CR2 image with Fill Light added, the Black level tweaked slightly, and some other modifications in Aperture. I think it's a little bit too washed out still, so I may try working on this one some more. The original is a bit too dark, so... compromises, I guess.
I'm getting ready to get a new pair of glasses, and I'm trying to decide what to get. Here are two options that I have come across so far: Option 1: Full Frame
Option 2: Rimless
Please tell me which of the two styles you like better. The Rimless style is shown in black -- I would probably get a different color, similar to the color of the full frame glasses in the first picture. Black frames do not match my skin or hair colors well, and I'm not a particular fan of pink or gold, either.
After three weeks and $550, the computer which hosts my web site has been returned from the shop. Except that when they put it back together, the genius who hooked all the cables back up neglected to fully connect the SAS connector that pigtails off for all my internal SATA drive bays. This, of course, led to several drive bays not working properly until I tore the thing apart to make sure all the cords were connected.
Anyhow, this is a bit of a long-winded way of saying that all my old photo galleries are now back online. It also means that I can start the process of uploading the stuff from my most recent vacation. Look for it soon(ish.)
We kicked off today at 5:30 AM, when some "strange noises" coming from the room next door woke me up. You know what I'm talking about... enough said. Later in the morning, I had the pleasure of discovering that the plumbing in our hotel room is terrible, both on the supply side (majorly inconsistent water pressure) and the drain side (our shower more or less doesn't drain.)
With the issues of the morning out of the way, we made our way two blocks down the street to the Springdale Loop shuttle stop to catch the free shuttle that goes to the visitor's center at Zion National Park. From there, we paid our entrance fee ($12 per person or $25 per vehicle) and walked through the park gate and over to the Canyon Shuttle stop. We rode the Canyon Shuttle to the end of the line and hiked for a mile on the paved river walk trail to get to the start of the Zion Narrows hike.
For those of you unfamiliar with the Zion Narrows, it's essentially a hike through the Virgin River (upstream in our case to start, downstream on the way back) where water shoes and a walking stick are required. More than 90% of the hike is through the river itself -- we were frequently wading in knee-high water. In a couple of places, it got up to my mid-thigh. We hiked upstream for about 2 hours and 15 minutes when our pre-set alarm went off and told us it was time to head back. The return trip took a bit over an hour because we didn't stop for pictures and because going with the current downstream is significantly easier than going against it. Courtesy of Darrick, here's a panorama of one of the less-wet portions of the day. That would be me standing there holding our hiking sticks.
The Narrows; Click image for larger size.
After returning to the trailhead for the paved trail, we stopped for a bit to dry off our feet and change back into our hiking boots (which thankfully hadn't been stolen, since we left them there with other people's shoes instead of taking them in the river with us) and walked back down to the shuttle stop. After that, we returned to Springdale and had dinner at a Mexican Cafe. Now, we've returned to our room to look at our pictures for the day. They look good.
I'm writing this entry from the passenger seat of the rental car. We're on our way out of Vegas at the moment, headed toward Zion National Park. Darrick is driving.
Today, we stuck to the plan: check out of the hotel at 11am, and then head to Hoover Dam. We arrived at the Dam at about 1:00 PM and went on the Dam Tour at 2:30. We left Hoover Dam at 4:00 PM, and pit-stopped in Las Vegas for fuel at about 5:00 PM. We're now headed toward Zion, and expect to get there around 9:30 PM local time (factoring in the loss of a time zone and stopping for dinner.)
Update: We got in at about 9:30 PM. Funny how that works, isn't it?
Our road trip began in San Diego. We departed from my apartment at about 9:30 PM, headed toward Racetrack Playa in Death Valley National Park.
Just before midnight, we stopped in Hesperia to top off the gas tank and grab a bite to eat prior to getting on US-395. Though the food at In-N-Out is quite normal at midnight, the same can't be said of some of their customers. There was one man in the store who appeared to order two items -- a strawberry shake, and a water cup. He proceeded to fill the water cup about 1/3 full... with ketchup from the ketchup pump. Remember... he didn't have any food. Just a strawberry shake. We left before seeing what became of the ketchup.
At 3:30 AM, we encountered our first major problem of the trip: Neither of us checked to make sure that the roads we intended to traverse were passable. This isn't a problem when all the roads are open and in good working order, but when coming across a "Road Closed" sign on an unmaintained desert road that goes over a mountain in the middle of the night... that's a problem. We spent 25 minutes debating what to do, and ultimately decided to modify our route, attempt morning photography at the Sand Dunes near Furnace Creek since it was close enough to be reachable by sunrise.
Around 4:45 AM, we see an FJ Cruiser in a turnout with its hazard blinkers on, and a man with a head lamp on waving his arms as we drove by. Since this was the only vehicle we had seen in hours and it just seemed like maybe the guy was in some sort of trouble, I turned around to make sure he was OK. He wasn't. At about 10:00 PM as he was taking a curve to the right in the road, the entire front passenger wheel came completely off his vehicle. By some sort of miracle, he skidded directly into a turnout where he came to a stop, instead of going the other way off the cliff. Five of the six lug bolts on his axle had completely sheared off, rendering his vehicle completely immobile.
Apparently, we were the 9th car to pass by in the almost 7 hours he was stranded there, and the first to so much as slow down. We collected an assortment of information from him (his roadside assistance number, name, VIN, exact GPS location, and the closest mile marker) and called CHP and a tow truck for him as soon as we got to a pay phone, which we found at a park entrance station about 20 miles down the road.
The delays we encountered for doing our good deed put our arrival time at the sand dunes about 10 minutes before sunrise. Unfortunately, this eliminated our planned "nap time" as well as impacted the volume of photography we were able to do... and forced us to rush things just a bit. We'll have to make another attempt at sunrise photography later in the trip. While we were there, we decided to go play around on the dunes a bit to get the blood flowing, and then got back in the car at about 7am, planning to get to Las Vegas as soon as possible. We needed sleep.
We arrived at our hotel in Vegas at about 11:00 AM, and "pre-registered" for our room. We were told to come back after 1:00 PM, because our room would not be ready until then. With three hours to kill and both of us being exhausted and hungry, we went to the brunch buffet at Paris. Honestly, I was so tired I didn't care what the food tasted like -- but I'm pretty sure it was delicious. I had a large plate of "breakfast" food followed by a large plate of "lunch" food, followed by a plate of desserts. We finished eating with even more time to spare... and being the geeks we are, we went to Fry's Electronics to kill the rest of the time. I bought some blank CDs to burn MP3's on (Our 4Runner has an MP3-CD capable stereo, but the AUX jack for an iPod sucks) and some cough drops. Darrick bought a Slinky.
We got keys to our hotel room at 2:30, and immediately went to sleep. We went to dinner at a Chinese restaurant recommended by Walter's mom around 8:30, and took a stroll around the strip just after midnight.
Epic Failure: Road Closed.
Here is the modified trip map. (EDIT: The map is wrong. I need to fix it.) Marker "B" is where we encountered the closed road and had to turn back. Marker "C" is approximately where we stopped for sunrise and took some time to go climb on a bunch of sand dunes.
Last night, I registered for MacWorld 2009. It's going to be held January 6-9, 2009 in San Francisco.
At this particular juncture in time, I am not 100% certain that will be able to attend. I plan to go, but there may be outside circumstances that prevent me from attending. I'm already pretty busy at the beginning of the year, so we'll see. If I can make it work though, I do plan to be in the Bay Area in early January. Just as a heads-up for anyone who cares.
With just days left before departing on the road trip mentioned in a prior post, a few updates:
(1) After careful deliberation, it was decided that the cost-benefit analysis for the question of "What type of car should we rent?" had tilted to favor a "Standard SUV" instead of a Prius. As the economy has faltered and the price of gas has risen, the cost of renting an SUV has fallen dramatically -- to the point where renting the SUV and paying for the additional gas it will consume on our voyage is virtually identical to the higher rental cost (but lower fuel cost) of the Prius. The added benefits of the larger vehicle include more cargo space, more "nap" space, and 4-wheel drive... which enables us to trek on some of the higher clearance, less-traveled roads that have in the past provided some of the most breathtaking views we've seen.
(2) Sushi has been moved from Mr. Sushi in Encinitas to Nobu in Solana Beach. I've never been there before, but hear it's delicious.
(3) The final night of the trip is still up in the air. Once we depart from the Grand Canyon, we have several options on how to return to San Diego. The original plan calls for spending a night in Phoenix, but we still have not booked a hotel room there yet as a means of keeping our options open. There are four different "major" routes we can take to get back to San Diego from the Grand Canyon -- only two of which go through Phoenix or the surrounding area. Option 1 takes us directly from the Grand Canyon to Barstow on I-40, where we meet up with I-15. Option 2 takes us on US-40 to US-95 and some State Highways to I-10 near Joshua Tree and through Riverside County, where we'd meet up with I-15. Option 3 takes us East on I-40 to I-17 and down to Phoenix, where we would meet up with I-10. Option 4 takes us on the same route, but further South from Phoenix, where we meet up with I-8 and head back to San Diego along the Mexican border. Addtionally, there are a number of State and US highways as well as Forest Service roads we may try to navigate, now that we have a larger vehicle in which to do so. Picking a route may prove to be difficult -- all of the options are "new" to us. We planned originally to stop in Julian on our way back to San Diego, but depending on the route we take and the time of day we're near the area, the Pie shops may not be open. We'd just have to make a quick trek out there the next morning. :-)
The computer responsible for hosting all of my photo galleries failed spectacularly this afternoon. The problem is most likely some form of massive hardware failure. Things may be unavailable for the next week or so.
A couple of weeks ago, I bought new tires for my car and had them installed. Because of the type of rims my car came with, I of course have to buy the super expensive low profile tire variety. A set of four cost me about $700 out the door. Here's what I bought:
Michelin Primacy MXV4
These tires replaced the Bridgestone Potenzas that came with my car. I opted to go for the Michelins because the Bridgestones cost significantly more *and* come with horrible reviews on the internet. Personally, I never had a problem with the other tires, but after 4.5 years and 31,000 miles, the sidewalls were cracking pretty badly. The Michelins came with a cheaper price tag, better reviews, and a substantially longer warranty. They're considered all season "Luxury Performance Touring" tires, which basically means they're designed to be quiet and comfortable while delivering decent grip in all road conditions. In limited drive testing, they seem to have much better grip than the old tires, and handled remarkably well on dry pavement considering that the installer's balancing machine was mis-calibrated when they were installed. With that problem fixed, I can now drive on the freeway without the car feeling like it's going to shake apart.
Today, I made my first serious (a.k.a. permanent) modification to my new apartment. You may be asking, "Why would you do this?" to which I would respond, "Because being forced to open your front door to see who is standing at it is potentially unsafe, and also annoying." I would, of course, be correct.
The solution to this problem: install a door viewer. Apparently, "door viewer" has fewer negative connotations than "peep hole" and is thus the preferred name for such things. Since this modification involves drilling a half-inch diameter hole through my front door (which is composed of aluminum and foam) I decided that I should (a) be careful and (b) match the finish of the rest of the door hardware. I also chose to (c) install the thing at a "standard" height that most people can use instead of where I'd prefer it to be.
So, I bought something very similar to this:
Satin Nickel Door Viewer
They look pretty horrible when they're not actually installed in a door. But as it turns out, once you install the thing it looks decent. Installation was pretty quick and painless, too. Literally all you have to do is measure, drill, and then screw the thing into the hole.
UPDATE: Several other apartments in my building have requested the Maintenance folks to install one of these in their doors. From walking around, it looks to me as if all the ones installed by the complex are Stainless Steel (which does not match the Satin Nickel door hardware) but are otherwise identical to mine. The two varieties even cost the same from Home Depot. Mine is better!
Note: This is part 8 of an 8 part series. Read them in order, it'll make more sense. Part 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8
SAN Nightmare: Conclusions
In summary, xSan 2.0 sucks. Here's my general list of complaints:
- Undocumented steps in the upgrade process from 1.4.2 to 2.0 cause confusion and panic when users can't get their volumes to mount properly.
- Upgrade process introduced errors on one of our volumes that led to its eventual failure.
- Under 2.0, fsm process crashes randomly and far too often when folders on SAN volumes are re-shared over AFP/SMB and/or backed up with Retrospect.
- Under 2.1, fsm process segfaults in a similar manner to the crashes in 2.0. This can be easily reproduced by setting ACLs on an AFP/SMB shared volume and propagating permissions to all folders/subfolders under the top level of the share. Every time I try this, it crashes within 3 minutes.
- Under all versions, you cannot copy .mpkg files to an xSan volume over AFP. The volume crashes.
- Some programs do not allow you to open files directly on the server and edit them. Notable examples are EndNote and several Adobe apps. Instead, you have to copy the files to a local disk, edit them, and then copy them back to the server. This is annoying for users who keep their files on the server for safekeeping.
- Once an xSan volume crashes or becomes unstable, a computer reboot is often required to clear the memory and start fresh. If the volumes are mounted uncleanly, the OS will still think files are open and try to close them before restarting. Since the volume is not mounted, it is unable to do this. This causes a hang on restart that prevents the system from being rebooted/shut down gracefully. A force reboot is required. Forcing a power cycle through the rack PDU works quite well, but is not good for the server.
- fsm crashes typically force reboots of the metadata controller and/or the client hosting them. When the client is the file server, this causes issues for connected clients. When the metadata controller is affected, all other volumes are forced to failover while the controller reboots.
- Retrospect takes an incredibly long time to scan volumes for files and to determine whether files have been changed or not. Similarly, the actual backups of files themselves are slow. This seems to be the case no matter how fast your metadata controller is, but is significantly more pronounced when using older/slower computers as the metadata controller.
- Retrospect is unable to define sub-volumes of an xSan volume as backup targets because of the way the filesystem handles directory ID information. This forces Retrospect to scan the entire volume for a backup. On a 1.6 TB volume with 1TB of used space and 500,000 files, this can routinely take up to 20 hours to scan. On a normal HFS+ volume, this process takes mere minutes. This problem is compounded by the 4,000,000 file "limit" for Retrospect backup sets. Files are often marked as changed when they weren't and are re-backed up. That problem, combined with normal change/modify operations, means that a backup set can approach the 4,000,000 file limit easily within the course of its normal incremental backups before tape rotation.
- File copy and general file operations that require access to filesystem metadata are noticeably slower under xSan 2.0 compared to 1.4.2.
- The xSan Admin GUI for 2.0 is completely different from 1.4.2, and takes some re-learning to get used to. In version 2.0 of the GUI, it is also impossible to change a computer's role in the SAN from a Controller to a Client or vice-versa. Whatever the computer is added to the SAN as is what it must remain. I hear they fixed this in 2.1, but still... this is a very common thing that people do and it somehow got overlooked.
- If you open the xSAN Admin GUI on more than one computer, you occasionally get differing/conflicting information. This is most notable in the actual name of the SAN (inconsequential) but also shows up in places it should never report false information -- like where it tells you which metadata controller is currently controlling a specific volume. The cvadmin command line utility is so much better for most tasks, it's not even funny.
Note: This is part 7 of an 8 part series. Read them in order, it'll make more sense. Part 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8
SAN Nightmare, Part 7: Eliminating the SAN
Tuesday, June 17
A plan for migration off of xSan and back to regularly formatted disks is born in the weekly IRT meeting. We decide to do the entire thing at once, over the weekend... and to split up our files onto two physical servers instead of one. The end result should be two file servers, each with approximately half of the data and half of the total users... and a total fill ratio of about 36% across all disks. Since we couldn't get an exact split of 50-50, we opted to give a slightly higher number of users to the computer that was faster and has more disk space.
Saturday, June 21
The migration began at 8:00 PM on Friday night. I also received my very own corporate credit card on Friday. During the course of the day Saturday, files were copied to the new disks. I spent as much time as I could at home, sitting out on my patio reading a book and getting tan. By some stroke of random, the temperature in La Jolla got up to 97 around 3pm. I got a little bit pink, but just the proper amount. It's already turned into a nice tan.
Just before bed, my phone starts chirping with SMS messages from one of our drive arrays. It appears as though Drive "Utica-6" has developed read/write errors and is no longer reliable. Thankfully, this is a RAID5 system... the data is protected and still intact. I go to bed and deal with it in the morning.
Sunday, June 22
Overnight, Utica-6 produced many, many more errors. It was time to replace it. I had exactly one spare drive on hand, which was used for this purpose. Time to put the new corporate card to use -- We needed to get more spare drives to have on-hand. Of course, the particular variant of the Hitachi DeskStar 7K500 drive that I needed to have a matched set is no longer manufactured... which posed a bit of a problem for finding an identical replacement drive. After a couple hours scouring Google and the internet at large, I was able to find ONE of these drives still in its new, unopened condition... in a warehouse in Canada. I also found a site that sells refurbished drives of this exact model. I ended up buying the single new drive along with two refurbs to replenish the stock of spare drives, and I bought some new tape for our P-Touch labeler while I was at it because I ran out over the weekend.
Monday, June 23
By Monday evening, we've gone one whole day without a single server or disk crash. This should not be big news, but given the recent string of events I'd say it's pretty impressive. Things seem stable, and for the most part speedy. The Utica array is still rebuilding and conditioning itself after the failure of drive 6, so files on that particular disk are a tiny bit slower than normal. Other than that, everything is fine. Our former SAN was made up of one file server, one metadata controller, and two dedicated backup target computers. We have since unwound that and turned it into three independent file servers. The fourth computer is four years old and has a bad FireWire controller, so I think its days of usefulness are over. Later this week, I plan to strip it for parts and upgrade our admin file server.
Note: This is part 6 of an 8 part series. Read them in order, it'll make more sense. Part 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8
Part 6: More Suffering
Monday, June 9
My boss announces that we are no longer allowed to use his corporate credit card to make purchases, because his assistant has transferred to another department and will no longer be reconciling the bill for him. Later this afternoon, someone decided to try to copy 200+ GB to their lab volume, which is hosted on one of the external drives. The drive fills to 100% capacity, and suddenly we are in need of an additional FreeAgent Pro. This minor crisis is the justification for having Accounting assign me my own corporate credit card.
Thursday, June 12
The 'san4' volume starts crashing repeatedly mid-day, much in the same way 'san5' did on May 27. This particular volume was the exclusive host of the file server data for the company we share the building with, and is therefore a critical part of our network. It becomes necessary for us to take this volume offline and move the data on it to a new drive as soon as possible. With no more FreeAgent Pro drives available, I had to use a lesser FireWire drive. The lack of eSATA slowed things down considerably. Meanwhile, we placed a rush order for another FreeAgent Pro drive, another eSATA card, and two internal hard drives for the server that we're turning into a dedicated server for this group.
Friday, June 13
The data copy was complete by about 1:00 PM. We had decided to take this chance to move this particular set of files to a dedicated server to avoid future problems like this, and because it makes things much cleaner for us from an administration standpoint. I began the migration to the new server at 6PM.
At 4:00 PM, the same instability issues previously experienced in the san5 and san4 volumes have spread to san6. We are forced to take two more labs offline to move them to the new FreeAgent Pro drive purchased the day before.
Both copy operations are successful and wrap up faster than anticipated, leaving some spare time over the weekend to come up with a more comprehensive plan for final removal of xSan from our network.
Note: This is part 5 of an 8 part series. Read them in order, it'll make more sense. Part 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8
Part 5: Relative Stability
Thursday, May 29
A pre-planned trip to Sacramento couldn't have come at a better time. I thought I was going to be unable to go, but as it turns out, things are pretty stable for the two days before I leave. For the most part, they remain that way while I am gone. I think there was one instance where the server had to be rebooted.
Over the course of the weekend, Apple continues to work on finding a way to recover files from the original "san1" volume that crashed. They're still convinced there may be hope of recovering those files... which would go a long way toward making people not hate me so much. My understanding is that there was a fair amount of priceless research that was lost from that one week gap in the backups.
Thursday, June 5
Much of the early portion of the week is consumed by maintenance requests: finding specific files and folders that didn't restore properly, fixing file permission issues, etc. Just before noon, Apple declares our san1 volume to be officially unrecoverable. At this point, we are free to re-format the disks and start moving data back to them. We opt to use the space to do a little additional testing, first.
Note: This is part 4 of an 8 part series. Read them in order, it'll make more sense. Part 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8
Part 4: Recovery
Wednesday, May 21
Restore operations have been ongoing since Saturday night, and the last batch of lab restores completed by the end of the day. A full week of downtime for some labs... and their files are being restored from the night of May 8. Ouch. The IRT and Public volumes are still offline. Restore operations at this point have taken a back seat to making sure the backup operations run properly.
Friday, May 23
After working with Apple over the course of the week, it was determined that a software patch was necessary to prevent recurring issues. In addition to the catastrophic failure of our "san1" volume, we had been experiencing other problems throughout the week with our other SAN volumes. Specifically, the would crash on a very regular basis, usually at least once a day. These crashes required frequent server reboots to address the issues. We were provided with a patch that contained a known fix to the known bug we were experiencing, which I installed in the evening while the server was offline for scheduled maintenance.
Almost immediately after installing this patch, I confirmed that the previous problem we had (random fsm crashes) seemed to be fixed... but that a new, much more serious problem had been introduced: random segmentation faults that corrupted the entire operating system. Oops. After running a few more diagnostics, we reverted the software back to the original less-buggy version and went from there.
Tuesday, May 27
The long holiday weekend gave me a chance to get the last volumes restored from tapes and back online. At this point, I finally get access to my files again -- I had been without them for 12 days, and work was piling up. By noon, another one of our SAN volumes (san5) had become very unstable -- three crashes in under an hour. Each time it crashed, it hung the file service processes as well -- file servers get angry when you disconnect the disks that hosted files live on without warning. We had to take the four labs living on this volume offline for the remainder of the day. By 9:00 PM, all data on the san5 volume had been moved to another Seagate FreeAgent Pro disk.
Note: This is part 3 of an 8 part series. Read them in order, it'll make more sense. Part 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8
Part 3: Disaster Strikes
Thursday, May 15, 2008
At 10:00 AM, the "san1" volume, which is our largest volume with 5TB in total capacity and 11 of our 17 research divisions' (and IT) data, crashes and refuses to re-mount on either server. After a series of reboots and continued failures attempting to get the thing to mount, another call to Apple was in order. By noon, Apple determines that we should probably start restoring the data from backup tapes, since the prognosis does not look good. I start running some diagnostic tools for Apple while Michael loads the most current tapes in the tape library to begin the restore operation. At this time, we discover two things: (1) The volume "san1" is the only one that has not properly backed up in the last week (since the upgrade to xSan 2.0) due to the backup process crashing while running, and (2) The robotic arm on our tape loader has picked this exact moment, of all possible times, to fail. So, we have a drive that won't mount because it's corrupt, backups of that drive that are one week stale, and no way to read the stale tapes anyways because the robotics on the tape loader have failed. The first order of business, then, is getting the tape library working so we can read the tapes. We put in a call to Quantum support about getting the library repaired and were told that since the device is out of warranty, they won't even talk to us until we purchase a support contract. The support contract department is out of the office for the day, so we should expect a call back tomorrow.
Feeling completely helpless, I decide to go home by 4:00 PM to get some dinner and some rest, because I have to be back in the office at 8:45 PM to start taking equipment offline for a large planned power outage. The impeccable timing of this disaster plus the planned outage kept me at the office until 2:45am.
Friday, May 16, 2008
By 8:00 AM, I'm back at the office running on a little less than 4 hours' sleep. Efforts to contact the folks at Quantum are unsuccessful all morning, so I leave to run other errands while Michael continues to try to talk to the contract folks. I head to Fry's to buy two 1TB Seagate FreeAgent Pro drives so we have somewhere to put the data once we start restoring it. (Side note: The FreeAgent Pro drives are eSATA/USB2/1394 and are awesome. I highly recommend them.) Also on the way back, a trip to Costco was in oder to pick up beer and desserts for the IRT-hosted Happy Hour that was scheduled for 4:00 PM. We had already booked and paid for the catering, so we couldn't cancel the thing. Another case of impeccable timing.
Shortly after returning to the office, we finally manage to get in ouch with Quantum regarding the support contract. We ended up paying a bit over $3000 for the "Gold" maintenance contract which entitles us to 24/7 on-site support. They diagnose the problem as a bad picker hand and schedule a courier to deliver the part by 4pm, and a technician to install the part by 6pm. Convenient: The IRT Happy Hour ran from 4-6pm.
The Quantum tech shows up, installs the new picker hand incorrectly, and continues to get the same error message as before on the Library. Then, he tweaks something and manages to run over the umbilical cord that connects the hand/picker to the rest of the library's electronics at about 8:30pm. Since it shot sparks, Quantum decided to send another umbilical out to us. I've been in the office for quite a while at this point, so I send the service tech home with instructions to come back in the morning. Quantum delivers the part to my apartment at about 11pm, just as I'm finishing up watching the season finale of The Office on my DVR.
Saturday, May 17, 2008
9:00 AM: Arrive at Office. 10:00 AM: Replacement umbilical cord installed. Same error. 10:30 AM: Service determines the problem is in the (already replaced) picker. 10:35 AM: Closest picker is in Irvine, it is being sent by courier to arrive at 2pm. 11:00 AM: Lunch. 2:30 PM: Picker arrives late due to bad traffic in Oceanside/Del Mar. 3:30 PM: Same error. All field-serviceable parts have been serviced. Quantum replacing entire chassis. 3:45 PM: Closest chassis is in downtown LA. Estimated arrival: 8:00 PM. 3:46 PM: I send the service tech home. I don't trust him anymore. I'll swap the chassis myself. 7:01 PM: Courier must have broken all sorts of speed laws to get chassis to us by 7pm. 10:30 PM: Restore operations begin. I go home.
Sunday, May 18, 2008 9:00 AM: Service tech returns to pick up the bad unit. The new one is working fine, thanks.
Note: This is part 2 of an 8 part series. Read them in order, it'll make more sense. Part 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8
Part 2: Imminent Failure
Monday, May 12, 2008
In the morning, our LDAP server at work manages to get its internal account database corrupted. This server issue has absolutely no bearing on the xSan project other than its timing -- I ended up spending most of the day on Monday running around trying to fix other systems that were affected by the LDAP outage instead of paying attention to the backup scripts I'd started on Saturday afternoon.
Tuesday, May 13, 2008
Our weekly IRT meeting focused mainly on the LDAP failure from Monday, and how to better communicate things like downtime in the future. A brief wrap-up from the SAN migration over the weekend was presented, with the verdict that things looked good to this point. After the meeting, I checked on the backups and noticed the first problem: instead of speeding up the backup window, xSan seemed to be lengthening it dramatically. We had six SAN volumes, each are supposed to back up nightly. We had one dedicated computer to run backups and the ability to add a second if necessary which would give us two simultaneous backup processes at most. The "san5" volume was single-handedly taking about 27 hours to run an incremental backup... on its own. As a result, our other volumes are being skipped over for backups because the process is taking so long. I make a few changes and set up the "san1" volume to start a backup operation.
Later in the afternoon, we start having odd problems with some of our share points on the server. It turns out that people in specific labs aren't able to connect to their files, because the volume their data resides on has un-mounted itself from the file server. The odd thing was that I couldn't get it to re-mount on that computer -- but, it would mount on the "spare" server just fine. Over the course of the afternoon, I re-configured al the server share points onto the new hardware and moved the DNS records over. This allowed everyone to connect to the new box using the same server names. Everything seemed happy.
Note: This is part 1 of an 8 part series. Read them in order, it'll make more sense. Part 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8
Part 1: The Upgrade
Friday, May 9, 2008
On the evening of Friday, May 9, I scheduled downtime on the file server to upgrade our version of xSan from 1.4.2 to 2.0. I had hoped the new version would fix some random, nagging problems we'd been having with the software such as occasional unannounced server reboots and problems with certain types of files. The random reboot thing was happening more or less on a weekly basis and seemed to coincide with some larger backup operations we were doing. The upgrade was also (hopefully) going to help our backup server more effectively back up the data on the SAN by improving read/copy speeds.
I downloaded the migration guides and read them over prior to starting the upgrade. The guide mentioned the need to wait a period of time (sometimes a few hours) for the volumes to update their metadata to the new 2.0 format before they would be available. I installed the software and noticed the volumes were showing up in the GUI admin tool as being available after a few minutes, and didn't think much of it until I went to try to start/mount some of them and began to receive errors. After a bunch of times where I froze up the GUI by trying to start a volume and had to force reboot the server, I decided to call Apple. The support rep on the phone mentioned something that was not notated anywhere in the migration guide at all: you have to just let the upgrade run its course before trying to start the volumes. Problem: there is no progress bar that tells you (a) whether the update has started, (b) whether it is running, or (c) when it is done. This update is all done silently in the background, and can take "hours" depending on what exactly you're storing there. To determine whether the RPL update is done you have to go hunting through the system logs for a very specific (undocumented) file and search for the string that denotes entries related to the upgrade. Thanks for documenting that, Apple.
Saturday, May 10, 2008
After the support call with Apple from the previous night, I decided to just let the "RPL update" run its course overnight, and come back in the morning to see how things looked. With all the RPL updates done and all our volumes mounting properly, things seemed to be in good shape. I brought the server back online at about 2pm, re-configured the backup routines, and told them to start backing up the server.
Note: This is part 0 of an 8 part series. Read them in order, it'll make more sense. Part 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8
Part 0: Introduction
This series of entries timelines select events that happened between May 9 and June 23, 2008. The purpose of this series of entries is primarily so I can remember how bad the last two months have really been. In the process, maybe someone will stumble along this and decide that xSAN has as the potential to have as much of a "distaste for your environment" (Apple's words) as it did for mine.
A bit of background: We had been running xSAN 1.4.2 software since late November, 2007. Due to some issues we had with it (explained later on) we decided to upgrade to the new version in hopes of a fix. When we originally implemented the xSAN solution, we did so because we were intrigued by the idea of allowing multiple servers to share a single large pool of disk space. This would, in theory, allow us to do things like sharing a single public folder across several servers or move groups of people from one server to another for load-balancing reasons without having to move their data. Furthermore, it allowed us to set up a model in which a computer on the SAN was dedicated specifically to being the computer that Retrospect sent all its backup requests through. Structuring backups in this way freed up a considerable amount of CPU space on the file server itself to do things such as serve files in a timely fashion.
As promised in the previous post, pictures have been posted of the apartment with all the furniture where it now lives. Additionally, photos and art have been installed on (most of) the walls. Things are looking better, but still need some work.
Also included are a few quick snapshots of the garden I planted last weekend. Enjoy!
After a very long delay, I have finally gotten around to uploading pictures of my new apartment with actual, real furniture in it. Between moving, unpacking, decorating, a crazy two months at work, and trying to actually be social, I haven't had time to bother getting these things uploaded until now.
Most of this gallery consists of photos taken during the "unpacking and settling in" phase, so a lot of the furniture is no longer arranged as shown in the photos. Another set of pictures will follow with the "final" furniture layout.
For being such a great customer, Cox gave me a stack of tickets to the Padres' Friday Night game of Opening Week against the Dodgers. In the Cox suite. These are some pretty nice digs for a baseball game. If you've never been to Petco Park or sat in the Garden Level suites... they are right behind home plate and at a perfect height to see the entire field (and catch fly balls.)
View from our seats.
After the unfortunate (blowout) game, there was a special fireworks show for what the Padres were calling "Military Opening Night." Some of the photos that I took of that ended up coming out alright, as well.
Postgame fireworks display
In the end, a good time was had by all. Hopefully, I'll get tickets to another game soon!
I went over to the new building at Crossroads on opening day, a week before I was supposed to get my keys and move into my new place. The maintenance staff hadn't completed inspecting and finishing up my apartment... and accidentally left it unlocked. I took the opportunity to go in with the camera and tape measure and take lots of pictures (and lots of measurements.)
I took a nice Easter drive out to Anza-Borrego today to do some wildflower viewing. Also did some off-paved-roading in my 2-wheel-drive low-clearance car and almost got stuck in some sand. Saw a Mustang that did get stuck in some sand. Despite the fact there were several vehicles that stopped to help, one of which had towing ropes... the ladies who were driving the Mustang refused help of any sort and insisted on waiting for a tow truck to come from an hour away to help them.
Update 24-Apr-2008: Pictures are online! Click on the pic below for the link to the gallery.
Time Warner updated the operating software on all of their cable boxes in my area this week. It happened to the standard-def box I have in my bedroom first, and I was quite annoyed when it happened because it managed to delete (seemingly randomly) about 2/3 of the scheduled series recordings I had set up.
This morning, I noticed my HD box in the living room had been updated. Same annoying problem. But, a new behavior out of the cable box: on non-HD channels, instead of putting up stupid, ugly gray vertical bars on either side of the 4:3 picture to make it fit in the 16:9 window, it now outputs solid black bars. I might actually be able to watch non-HD programming on that expensive TV without wanting to throw things at it. Like I said, it's the small victories...
On April 7, I get the keys to my new apartment. This is important for several reasons. In no particular order:
- I will be freed from the mold-ridden, flood-prone craphole that is my current apartment. - I will no longer be living on the first floor, and hence less prone to water issues. - I will be moving into a 1-bedroom apartment, and therefore once again living alone. - The new apartment is in a brand new building. I will be the first person to ever live in my unit. - The new apartment is big at 862 square feet, and cheap for comparable units in the area. - I will still be close to work. Only about 2.5 miles each way.
Here's an official floor plan, borrowed directly from their website:
Official Floor Plan
Get a good look at that? Good, because that's not at all what my unit will look like. I'm going to have a mirror-image flipped version of what you see above. I took the liberty of flipping the image horizontally so you can see what my unit will really look like. Note that the text in the image isn't so kind as to magically stay put.
My Floor Plan
Now that this is settled, I can start furniture shopping. I've only been putting that off for, oh... 3 years. Should be painful and expensive.
On my last day in Vegas, I decided it would be a better use of my time to head out to Hoover Dam and take a tour, as opposed to spending another crowded, sweaty day insaide the show halls at CES. Plus, it would be very un-like me to rent a car and put less than 50 miles on it... I average a lot closer to 2,000 miles per rental, after all.
The drive is about 20-30 minutes or so, even when going the speed limit in my bright Orange cop-magnet of a rental car. I managed to walk right up to the Visitor's Center at 10:50 on a Wednesday morning, and immediately get onto the 11:00 Dam Tour. Normally, people have to wait all day to get one one of those things. Ah, the advantages of traveling alone...
The gallery contains about 100 pictures from inside and outside of the Dam. Enjoy!
Inside the gallery are a few pictures from inside CES 2008. Unfortunately, none of the pictures I was able to snap off adequately convey just how insanely packed the Convention Center was -- I am fairly certain I have never seen so many people in one place in my entire life. And with over 3.2 Million square feet of space inside the convention center, stuff was still spilling out into the parking lot and into other hotels! Pretty insane.
This gallery contains all the pictures I took when I was in Vegas for CES... but not actually on the show floor at CES because it was nighttime. The first night I was in town, I took a stroll on foot starting at my hotel and working my way through the Hooters Hotel and Casino, followed by the Tropicana and MGM Grand. From there, I took the Monorail to Bally's and walked through that and Paris. I then went across the street to the Planet Hollywood Hotel and Casino and the attached Miracle Mile shops. After all that walking, I was beat and headed back to the hotel early to get some rest for the long day ahead of me on the show floors at CES on Tuesday.
On Christmas Eve, Mom, Nicole, and I went for a drive to go looking at Christmas lights. There's a court nearby in Citrus Heights that goes pretty overboard with them, so of course we had to go there. This gallery contains the pictures that came out OK.
On December 22, 2007 I went on a hike with John, Kristine, and John's parents at Iron Mountain. Other people were supposed to come too, but they chickened out... or something. I forget why exactly they didn't make it.