AWS customer service

I must admit I’m a bit surprised. This morning, I tweeted about a DNS issues I was seeing on some EC2 instances (which turned out to be a Godaddy problem)
Amazon responded to my tweet by opening up a support case:
I am sorry to hear that you are having issues with DNS Services, Can you please let us know what issues you are seeing so we can resolve this for you.
Best regards,
Amazon Web Services

I must admit, I’m surprised and impressed. Good job AWS!

Raspberry Pi case made from cardboard, cardstock, paper or plastic

A few months ago I designed a case for my Raspberry Pi that I could cut out of cardstock with a Silhouette Cameo.


(this case is cut out of an overhead projector slide)


It was inspired by the Punnet case.  Except I wanted something that didn’t require glue to assemble and was a little more compact.


Punnet case (bottom), rev1 (middle), current design (top)


The Silhouette file is downloadable from my raspberry-pi-case github project.


I’d like to change the vent in the top to be a raspberry, but haven’t had the time to make that change yet.



How I installed Kubuntu 10.04 (Lucid Lynx) on an HP Mini 110 with full disk encryption

Get a USB drive and make a boot disk from the kubuntu 10.04 CD image as detailed here:
Be sure to select persistent storage, “When starting up from this disk, documents and settings will be: ‘Stored in reserved extra space'” I allocated approx 1GB of space for this (You’ll need more than the default 128MB.)

Use the ethernet connection on the netbook to connect it to the internet. Power it on, and as soon as you see the first screen, hit . Select your USB drive as the boot device.
Once booted, run:
sudo apt-get update
sudo apt-get dist-upgrade
sudo apt-get install cryptsetup lvm2

then reboot the netbook (from USB again)

Create the following partitions (I prefer cfdisk to partition)

  • /dev/sda1, 512MB, ext2 (primary partition, bootable)
  • /dev/sda2, remainder of space, (Pri/Log partition)
  • /dev/sda5, entire logical partition, (Logical partition in sda2)

Setup your encrypted volumes:
cryptsetup -y --cipher aes-xts-plain --key-size 512 luksFormat /dev/sda5
cryptsetup luksOpen /dev/sda5 pvcrypt
pvcreate /dev/mapper/pvcrypt
vgcreate laptop-vg /dev/mapper/pvcrypt
lvcreate -n swap -L 3G laptop-vg
lvcreate -n root -l 100%FREE laptop-vg
mkswap /dev/mapper/laptop-vg-swap
mkfs.ext3 /dev/mapper/laptop-vg-root

Start the installer (from the icon on the desktop) and choose to setup the partitions manually:

  • set /dev/sda1 to be /boot
  • set /dev/mapper/laptop-vg-root to be /
  • set /dev/mapper/laptop-vg-swap to be swap space

After the install is complete, do the following before rebooting
mkdir /mnt/newroot
mount /dev/mapper/laptop--vg-root /mnt/newroot
mount /dev/sda1 /mnt/newroot/boot
mount --bind /dev /mnt/newroot/dev
chroot /mnt/newroot
mount -t proc proc /proc
mount -t sysfs sys /sys
apt-get update
apt-get install cryptsetup lvm2

Then edit /etc/crypttab and add the following line to the end of the file:
pvcrypt /dev/sda5 none luks,retry=1,lvm=laptop-vg

Next, edit /etc/initramfs-tools/modules and add the following lines:

Then run
update-initramfs -u

and reboot

In order to get my mic external speakers working (headphones worked fine), I had to:
apt-get install linux-backports-modules-alsa-lucid-generic

Why I prefer Ubuntu

Yet another draft post that I found recently.  Yes, I’m still a fan of Ubuntu.

As many of my colleagues know, I’m a huge Debian fan. I love the stability and ease of package management. The primary thing I don’t like about Debian is it’s lack of a release schedule. Their attitude is, “We’ll release it when it’s ready.”

I don’t disagree with that attitude, Debian is a very impressive community-driven development project. If I was programming for such a project, I’d probably prefer the “release it when it’s ready” method.
Enter Ubuntu. Ubuntu is based on Debian, which means it shares its ease of package management. Ubuntu also has a release schedule. They plan to release a new version every 6 months, supporting it for 18 months (security patches, fixes for critical bugs that could cause data loss, and extra translations.) They also plan to have an Enterprise Release every 12 to 24 months (which will receive additional testing.) These Enterprise are supported for a longer period of time. The current LTS version is supported until 2015.

OCR processing millions of images on Amazon’s EC2

Note: I wrote this post nearly 2 years ago and recently discovered it in my drafts, some of the info is outdated.

Recently, I was tasked with running OCR on a huge set of images (3.4 million.) I’m going to post some brief details on how we processed these images in about a week.

Initially, we uploaded all of the images to S3 from a colocated server we have locally using s3sync. This took a long while (~1.5 TB of data.)

Once the images were all stored in S3, I retrieved all of the meta data and stored them in a MySQL database which was running on a small EC2 instance. This host became the queue manager.
Since some images were in non-English languages, I went through and specified the language (if it wasn’t english) in the database.

I wrote a simple perl script which would:

  • retrieve the next image to be processed from the queue manager
  • retrieve the corresponding image from S3
  • run OCR on the image (with language option if it wasn’t in English)
  • store the OCR output in the database on the queue manager and mark the image as processed

For cost reasons (and because the OCR output was adequate) we used tesseract to process the images. It did a good job (depending on the image quality) and handled foreign languages very well.

To ensure we were getting the most bang for our buck, I whipped up a hack-of-a-script to keep at least 8 processes running on each server. The OCR processing instances were High-CPU x-large servers.

From there, I dumped the contents of the database and handed them off to our indexing expert. Some of the content is currently posted on and the rest is in the works.

Lessons learned (things I would have done different)

  • Find a way to send the hard drives which contained the images to Amazon, instead of uploading the entire 1.5TB from our datacenter. I’ve heard rumors of them doing this for large datasets, but have not verified.
  • Create a better script to manage running jobs (I’d probably use a multi-threaded perl script)
  • Start processing images as soon as they were successfully uploaded. For simplicity’s sake, I uploaded all of the images, then processed them all at the same time.
  • Get increased allocations for resources ahead of time. I started out with a 100 instance limit on EC2 and quickly saturated that limit. Around the middle of the week, I was able to finally get that limit increased to 200 instances.
  • I’d consider using Amazon’s SimpleDB and Simple Queue Service in leiu of the MySQL database I used for the queue manager and for storing the OCR output.

Migrate from Drupal 6 to WordPress 3

Since I just made the switch back, here’s the script that made it easy…
Be sure to read the script to edit or comment out sections so that they will meet the needs of your site. the only real problem I’m seeing so far is URLs aren’t always coming up as links in the posts. I may have to do some manual editing.

It was less than 2 years ago that I switched from WordPress to Drupal, why did I switch back?  I’m lazy:

  • I already have multiple websites that I host on this server that run WordPress.  It’s easy to throw it into the mix and keep it up to date.
  • I don’t anticipate ever doing anything much more than a blog here (that was one of my big reasons for switching to Drupal before, I was planning on doing a bunch of non-blog stuff on this site and others running Drupal)
  • WordPress “just works” for blog sites.  It’s brain dead easy to setup compared to Drupal and I like the edit/compose interface.

“Too many open files” error on Ubuntu

So, you’ve run into a “Too many open files” error (why else would you be here?)
I ran into it on some Ubuntu Hardy systems where Tomcat was running and Java had been allocated a ton of memory (thus giving it plenty of space to run out of control.)

It’s a simple fix.
Add increased limits to /etc/security/limits.conf. Here’s what I added to fix my problem:

* soft nofile 16384
* hard nofile 16384

… then edit /etc/pam.d/common-session and add:

session required

to the bottom of the file. Users will have to re-login to see the new limits.

Latest Facebook hoax going around

I just received this message in my Facebook inbox from a friend…

Sorry that I had to send this message. Since Facebook has recently become very popular, has become the many complaints that Facebook has become unacceptably slow. The report shows that the reason is that Facebook has a number of non-active members and, secondly, many new Facebook members. We want to send this message to see whether you're active members or not. If you're active, can you send this message to at least 15 users. Use the "Copy - Cut and Paste" to show that you are still active. Those who do not send this message within 2 weeks in, will be removed in order to get more space. Send this message to your friends to show me that you are still active, and do not want to be removed.

Facebook founder
Mark Zuckerberg

There are a few things about this that make it unbelievable:
1- Your data on Facebook is too valuable to them to delete, whether you’re active, inactive or new. Combine that with the fact that they’ve really nailed how to store a ton of data and how to keep Facebook fast and it comes down to this message being a lie.
2- If this was an official Facebook announcement from Mark Zuckerberg, he wouldn’t have people forward it around. It would be posted clearly on Facebook’s site.

I’m pretty sure that Facebook doesn’t ever delete anything, so I’m guessing they’ll be able to trace the root source of this hoax and act on the offender’s account appropriately.