Thank goodness for useful snapshots

Got this message from Amazon today:

Hello.
This is a notification that your volume vol-XXXXXXXX experienced a failure due to multiple failures of
the underlying hardware components and was unable to be recovered. We recommend recovering from your
most recent snapshot.
We regret the loss and inconvenience.
Sincerely,
XXXXXXX XXXXXXXXX

Fortunately, this was on a mysql slave and it was easy to recover from a previous snapshot. Do you plan for failure of the systems you run on Amazon EC2?

Adding a Facebook “Like” block to your Drupal site

I just added the newly announced “Like” button to my blog. It’s very easy to do on Drupal.

First, enable PHP input format…
In Drupal 6, the PHP input format is not enabled by default – it has to be enabled by enabling an additional module. So go to admin/build/modules and enable:
PHP filter 6.8 Allows embedded PHP code/snippets to be evaluated.

Next, create your new block…
Go to #Administer, #Site building, #Blocks
Click on the “add blocks” tab
Enter in a Block description and put the following in the Block body:


Change the “Input format” to “PHP Code”

Save it and you’re done!

My first facebook phisher

I just received my first facebook phishing email.

It looks pretty legit. The obvious thing that stood out to me was that the link points to www.facebook.com.xxxasqwz.eu/globaldirectory/LoginFacebook.php. Note that the domain is xxxasqwz.eu, not facebook.

Here is a snapshot of the page on xxasqwz.eu:

If you login here, you’ll be giving a phisher your username and password.

If you receive email that appears to be from any site you consider important (banks, social networks, etc…) use caution when clicking on the links within. The best practice is to open a browser and type in the site name (or use your bookmark) to go to the site, then login and look for alerts there. If Facebook was really going to make the login changes mentioned in the email, you’d see something about it when you login (in either your Facebook inbox, notifications or an announcement at the top of the page once you’re logged in.)

Don’t trust links sent in email. Go to the site like you would normally, login and then look for any announcements or alerts from there.

Amazon offers Reserved Instances

Amazon announced today the availability of Reserved Instances. Basically, you pay a one-time fee and get a significant discount on the hourly rate for that instance.
Their rates are available here.
After doing some calculations on it, I figured out the break even point between the old “On-Demand Instances” and the new “Reserved Instances.”
For the current 1 year term, the break-even point is 193 days. For the 3 year term, the break even point is 298 days. So basically, if you’re going to be running a particular instance for much more than a year, you should probably sign up for the 3 year term.

Here’s a brief comparison of the year by year costs for a small instance. I’ve only listed the small instance type since the costs for the other instance types are all proportional (e.g. large is 4x and xlarge is 8x.)

1st year costs 2nd year costs 3rd year costs 3 year Total
On Demand $876.00 $876.00 $876.00 $2,628.00
1 year term $587.80 $587.80 $587.80 $1,763.40
3 year term $762.80 $262.80 $262.80 $1,288.40

(I’m not sure why the spacing is so funky on this post with a table in it.)

RebuildStarted event detected on md device /dev/md0

Logging into one of my systems, I saw the following in /proc/mdstat this morning.
Personalities : [raid1]
md0 : active raid1 sdc1[0] sdb1[1]
976559104 blocks [2/2] [UU]
[=========>...........] resync = 45.7% (446333440/976559104) finish=489.8min speed=18037K/sec

Unaware of a power failure or disk failure (these are pretty new disks) I did some digging. Looking through /var/log/daemon.log, I found these entries:
Mar 1 01:06:02 chewbacca mdadm: RebuildStarted event detected on md device /dev/md0
Mar 1 04:49:03 chewbacca mdadm: Rebuild20 event detected on md device /dev/md0
Mar 1 08:43:03 chewbacca mdadm: Rebuild40 event detected on md device /dev/md0

After some googling around, I found that on the first Sunday of every month at 1:06am, an array check (/usr/share/mdadm/checkarray) is run on Debian and Ubuntu systems. (see /etc/cron.d/mdadm)

For some reason it is listed in /proc/mdstat as a rebuild even though it is really a read-only operation to check the health of the array. Whew! Now I can go get ready for church.

Upgrading php5-memcache on Ubuntu (Intrepid)

We were having a little problem on our webservers with php5-memcache when we try to utilize multiple memcached servers. The error was:
ALERT – canary mismatch on efree() – heap overflow detected (attacker ’10.X.X.X’, file ‘/path/to/file/index.php’)
It didn’t come up too frequently in the logs, but it was frequent enough to cause some concern.

As part of troubleshooting, I was tasked with upgrading php5-memcache on one of our test systems so that we could see if this resolved the issue. This turned out to be much easier than I thought. Here’s what I did (as root):

apt-get install php5-dev
cp /usr/lib/php5/20060613+lfs/memcache.so /usr/lib/php5/20060613+lfs/memcache.so.bak
pecl install memcache-3.0.3
/etc/init.d/apache2 restart

That was it, we’re running on 3.0.3 now and I’m crossing my fingers that resolves the issue we’re seeing.

Getting Yammer to work with IM on Google Apps for your domain

Our company uses Yammer so everyone can stay in touch and post status updates easily. One problem we’ve had is that the IM component wouldn’t work with our google talk accounts (we use Google Apps for our domain.) I discovered here that all we needed were some additional entries in DNS.
After following that doc, our settings for that service are as follows (trimmed for readability):

$ dig SRV _xmpp-server._tcp.familylink.com.
...
;; ANSWER SECTION:
_xmpp-server._tcp.familylink.com. 600 IN SRV    5 0 5269 xmpp-server.l.google.com.
_xmpp-server._tcp.familylink.com. 600 IN SRV    20 0 5269 xmpp-server1.l.google.com.
_xmpp-server._tcp.familylink.com. 600 IN SRV    20 0 5269 xmpp-server2.l.google.com.
_xmpp-server._tcp.familylink.com. 600 IN SRV    20 0 5269 xmpp-server3.l.google.com.
_xmpp-server._tcp.familylink.com. 600 IN SRV    20 0 5269 xmpp-server4.l.google.com.
...

$ dig SRV _jabber._tcp.familylink.com.
...
;; ANSWER SECTION:
_jabber._tcp.familylink.com. 600 IN     SRV     5 0 5269 xmpp-server.l.google.com.
_jabber._tcp.familylink.com. 600 IN     SRV     20 0 5269 xmpp-server1.l.google.com.
_jabber._tcp.familylink.com. 600 IN     SRV     20 0 5269 xmpp-server2.l.google.com.
_jabber._tcp.familylink.com. 600 IN     SRV     20 0 5269 xmpp-server3.l.google.com.
_jabber._tcp.familylink.com. 600 IN     SRV     20 0 5269 xmpp-server4.l.google.com.
...

After making those changes, I was able to go into my yammer account and add my google talk account like normal. I assume the same thing will work with Twitter, if you use a google apps account for that (assuming IM is working on Twitter, which it hasn’t for months for me.)

It’s an easy fix for Yammer with your Google Apps account. I’m also in the process of setting up yammer for my family’s domain so we can stay in touch more easily. I think the thing I love most about Yammer is that I can send/receive updates from my phone.

Keep an eye out for some upcoming blog posts about ways to stay connected with your family (hint: Yammer is one of them)

AWS Security Whitepaper

AWS security whitepaperAmazon just posted a security whitepaper which describes the security measures they have in place to protect their customers. It’s a short read (9 pages) and I strongly recommend it if you are using EC2, S3 or SimpleDB for anything. They explain their security measures and make recommendations for further protection their customers can put in place to protect their data.

One major concern I had was addressed by the whitepaper, “The AWS proprietary disk virtualization layer automatically wipes every block of storage used by the customer, and guarantees that one customer’s data is never exposed to another.” I was always curious about those disk devices on EC2 and what data might be lingering on them but never had the time to investigate.

Here’s an interesting snippet regarding their physical security:
“Amazon has many years of experience in designing, constructing, and operating largescale data centers. This experience has been applied to the AWS platform and infrastructure. AWS data centers are housed in nondescript facilities, and critical facilities have extensive setback and military grade perimeter control berms as well as other natural boundary protection. Physical access is strictly controlled both at the perimeter and at building ingress points by professional security staff utilizing video surveillance, state of the art intrusion detection systems, and other electronic means. Authorized staff must pass two-factor authentication no fewer than three times to access data center floors. All visitors and contractors are required to present identification and are signed in and continually escorted by authorized staff.”

Performance increase with Amazon’s EBS (persistent storage)

Dirt Road
At familylink.com, we have 4 MySQL database systems on EC2 that run that our facebook app, various other social network apps and various websites. I recently switched our disk storage for those instances from the standard EC2 instance disks to EBS (Amazon’s persistent storage for EC2) and wanted to share some brief numbers with you regarding performance.

I’m using a simple (yet quite complex) metric to measure the performance increase, load. System load is a number that show how many processes are contending for system resources (usually CPU.) For a more detailed description of load, read this article.

Enough of the talk, here’s what I saw when I switched the 4 databases over to EBS:
—Database server #1—
Purpose: 2 moderately used databases
Disk change: 2 striped local disks (raid0) to single EBS volume
Peak Load change: 2.5 to 1
Estimated disk performance increase: 5x

—Database server #2—
Purpose: 2 lightly used databases
Disk change: 2 striped local disks (raid0) to single EBS volume
Peak Load change: 1.5 to 0.5
Estimated disk performance increase: 6x

—Database server #3—
Purpose: 9 lightly used databases
Disk change: 2 striped local disks (raid0) to single EBS volume
Peak Load change: 1 to 1 (no noticeable change)
Estimated disk performance increase: 2x

—Database server #4—
Purpose: 1 heavily used database
Disk change: 4 striped local disks (raid0) to 4 striped EBS volumes (raid0)
Peak Load change: 3 to 1.5
Estimated disk performance increase: 2x

Keep in mind that in theory 2 striped disks are almost twice as fast as a single disk. That’s why I say there’s a disk performance increase of 2x on database server #3 even though there was no noticeable performance increase (we went from using 2 disks to 1 disk.)

There you go, real-world numbers from real-world sites and servers. In summary, it’s safe to say you’ll see a significant disk performance increase if you switch over to using EBS with your EC2 instances. In addition to the performance increase, it’s a no-brainer that you want persistent storage for your databases. One other huge benefit is snapshots. You can quickly and easily snapshot your database for backup purposes or for testing/reporting you may want to run against your most recent production data. See Amazon’s site for more details.

If you haven’t yet tested EBS with your systems on EC2, now is the time.

Red Bull gives you wings…or a big headache

Last week, I was at the Facebook developers conference. It was a pretty good conference and I learned a lot. Here are a few things I learned:

  • Some companies still operate with their blinders on: One of the sessions I was most excited about was “Made for Mobile.” I was hoping for some insight into developing apps for mobile phones and maybe some new “stuff” from Facebook. I was sorely disappointed. This session should have been named “Made for iPhone.” Instead of ranting in this post, I think I’ll do a dedicated post to the blinders concept.
  • Facebook is on the cutting edge when it comes to social networks: They announced Facebook Connect. If you’re a digg/citysearch/six apart user, you can see its effects already. It’s a new and easy way to put a social network twist on any site (using Facebook of course.)
  • Red Bull gave me a headache: The hardest stuff I regularly drink is Mountain Dew. Red Bull was a sponsor at this conference and as a result, the stuff was given out. I decided to give it a shot. The taste wasn’t very good. I’m a Guaraná Antarctica fan and so I’m a little picky when it comes to guarana. The taste of this took guarana and made it disgusting. To top it all off, within about 10 minutes of drinking it, I got the worst headache I’ve had in a long time. Needless to say, it didn’t give me wings and I don’t think I’ll be trying it again.
  • Facebook does an awesome job at scaling: This is the stuff I really love. In one session, the explained how they handle the high load demand due to their feeds. Their feeds are what displays all of your friends’ activity/actions on the main page when you’re logged in. If you think about it, that’s a lot of data. Just to generate your custom feed they have to go out and get all of the recent feed items from all your friends, filter and prioritize them, and then display it on a nice pretty page for you to see. And they do it all in around 60 milliseconds. Man, that’s fast!