Backblaze Storage Pod

Backblaze is a cloud backup service that needs cheap storage. Lots of it. They say a petabyte worth of raw drives runs under $100,000, but buying that much storage in products from major vendors easily costs over $1,000,000. So they built their own.

The result is a 4U rack-mounted Linux-based server that contains 67 terabytes at a material cost of $7,867, the bulk of which goes to purchase the drives themselves.

And best of all, they open sourced their hardware:

backblaze storage pod main components

Drobo: Sweet Storage, One Big Flaw

Drobo!

I’ve been a fan of Drobo since I got mine over a year ago. The little(-ish, and sweet looking, for stack of disks) device packs as many as four drives and automatically manages them to ensure the reliability of your data and easy expandability of the storage. However, Thomas Tomchak just pointed out one major flaw: if you overflow your Drobo with data, the entire device may give up and you’ll lose everything.

How do you overflow a Drobo? Most users only have a few terrabytes of storage in their Drobo, but configure it to tell the computer its attached to that it can store eight or 16 TB of data. Doing that allows easy expansion when more or larger drives are added — the attached computer doesn’t need to reformat anything, it can simply save more stuff to the device — but it also opens the door to the Drobo overflow.

From Tomchak’s post:

While on my tech support call I asked the engineer how frequently he received calls about this particular problem. After a big sigh he admitted that it was nearly every day.

One commenter on the article suggested the Drobo could “just simulate that the uninstalled part is already full of simulated read-only data,” a suggestion that makes sense, but may require the Drobo to know more about the filesystem on it than it otherwise would.

I’ve been at 90% capacity on my Drobo for a while, I think it’s time I popped another disk in there.

(CC licensed photo by Pixelthing.)

The Bugs That Haunt Me

A few years ago I found an article pointing out how spammers had figured out how to abuse some code I wrote back in 2001 or so. I’d put it on the list to fix and even started a blog post so that I could take my lumps publicly.

Now I’ve rediscovered that draft post…and that I never fixed the bad code it had fingered. Worse, I’m no longer in a position to change the code.

Along similar lines, I’ve been told that a database driven DHCP config file generator that I wrote back in the late 1990s is still in use, and still suffers bugs due to my failure to sanitize MAC addresses that, being entered by humans, sometimes have errors.

I’ve written bad code since then and will write more bad code still, but as my participation in open source projects has increased, I’ve enjoyed the benefit of community examples and criticism. My work now is better for it.

SSH Tunneling Examples

Most of my work is available publicly, but some development is hosted on a private SVN that’s hidden behind a firewall. Unfortunately, my primary development server is on the wrong side of that particular firewall, so I use the following command to bridge the gap:

ssh -R 1980:svn_host:80 username@dev_server.com

That creates a reverse tunnel through my laptop to the SVN server and allows me to checkout code using the following:

http://localhost:1980/path/to/trunk

I’m posting that because I lost my terminal command history and had to think for a moment about how to do this again.

Years ago I used to tunnel my outgoing email to an un-authenticated SMTP server that only accepted outgoing messages from hosts on the local network. That was fairly common back in 2000 or so, but obviously made life (or communication) difficult for people at home or on the road. The easy solution was to SSH to a machine on mail server’s local network and forward emails through it.

ssh -L 1925:email_host:25 username@ssh_host

Doing that, I was able to configure my mail client to send outgoing emails using a server configuration like the following:

SMTP host: localhost
SMTP port: 1925

Yelp: A Poster Child For Semantic Markup

Search Engine Land.com:

Yelp…is…essentially a poster-child for semantic markup. This spring, Google’s introduction of rich snippets has allowed Yelp’s listings in the SERPs to stand out more, attracting consumers to click more due to the “bling” decorating the listings in the form of the star ratings.

There are now some very good reasons why sites with ratings and reviews should be adopting microformats, and it’s not that hard to do! For a more detailed explanation, read my recap on the subject, Why Use Microformats?

iPhone’s Anti-Customer Config File

In March of this year Apple applied for a patent on technology that enables or disables features of a phone via a config file. The tech is already in use: it’s the carrier profiles we’ve been downloading recently. On the one hand this is just an extension of the parental controls that Apple has included in Mac OS X since the early days, but it also implies some rather anti-consumer thinking at the company.

One examplar claim in the patent is that the config file can include a “blacklist of device resources to be restricted from access.” 

AT&T used this this technology to block MMS until recently, and uses it now to block tethering, but the description given in the patent application goes much further:

For example, a carrier may wish to provide an enhanced service which utilizes the global positioning system (GPS) functionality in a mobile device. Carrier may wish to charge a premium for this service, so it may configure carrier provisioning profile to disallow third party applications from accessing the GPS functionality in device, and instead only allow applications digitally signed by carrier (or another entity affiliated with carrier) to access the GPS services in device.

Readers may remember the Trusted Computing video by Lutz Vogel and Benjamin Stephan that spotlighted the growing interest within the computing industry to impose new and artificial restrictions on the way we use the hardware and software we use daily.

Evil Evil klaomta.com

A quick Google search of klaomta.com reveals more than a few people wondering why it’s iframed on their websites. The answer is that the site has been compromised.

Unfortunately for the fellow who asked me the question at WordCamp, solving the problem can be a bit of a chore. Keeping your WordPress installation up to date is important, as there are some known security flaws in older versions, but most of the attacks that crackers use are targeted elsewhere. Your passwords, all your server apps, the PHP config, your hosting control panel, and other users all must go under the microscope when trying to find security holes.

The WordPress Way

Plugin Development

Will Norris‘ talk at WordCamp PDX introduces WordPress coding standards, common functions, and constants to would be plugin developers (and smacks those who’ve already done it wrong). Also notable: functions, classes, variables, and constants in the WordPress trunk.

Custom Installations

Just as WordPress has a number of hooks and filters that plugins can use to modify and extend behavior, it also has a cool way to customize the installation process.

Extending The WYSIWYG Editor

TinyMCE, the WYSIWYG editor in WordPress has a rich API to allow adding buttons and stuff, but the docs are hard to get into. We can get a jump on that by looking at how it’s implemented in other WP plugins. This code creates the buttons, while the function that responds to the button click and does the work is defined within the plugin. The TinyMCE plugins in core are also informative.

Hacking WordPress Login and Password Reset Processes For My University Environment

Any university worth the title is likely to have a very mixed identity environment. At Plymouth State University we’ve been pursuing a strategy of unifying identity and offering single sign-on to web services, but an inventory last year still revealed a great number of systems not integrated with either our single sign-on (AuthN) or authorization systems (AuthZ, see difference). And in addition to the many application/system specific stores of identity information (even for those systems integrated into our single sign-on environment), we also use both LDAP and AD (which we try to synchronize at the application level). Worst of all, the entire environment is provisioned solely from our MIS database, which is good if you want to make sure that students and faculty get user accounts, but bad if you want to provision an account for somebody who doesn’t fit into one of those roles.

The one way relationship between our user accounts and the MIS database also makes it difficult to engage with new users online. If you can’t get an account until you become a student, how do you allow potential students to apply online if all your systems are integrated with single sign-on? And if you can’t authenticate the online identity of your users, how do you set initial passwords into your system? Or allow them to reset a forgotten password online?

Internet companies never struggled with this issue, as their customers could only approach them online, but most universities built systems around paper applications and have fond (and relatively recent) memories of offering their students their first internet experience. It’s still not unusual for universities to offer their students their campus computing account with a default password based on supposedly secret data shared between the user and the school. But your SSN, birth date, and mother’s name are no longer secret. A proposed change in FERPA policy (see the the top of page 15586 in the NPRM) would have barred the use of “a common form user name (e.g., last name and first name initial) with date of birth or SSN, or a portion of the SSN, as an initial password to be changed upon first use of the system” in systems that store academic data. The final rule excluded that provision, much to the relief of those schools with more lobbying clout than brains.

Read more…

Pigeon Beats ADSL: Slow Networks Or Massive Storage Capacity?

Moving data by homing pigeon takes planning

Moving data by homing pigeon requires some planning, and pigeons

It was a tech story so apparently humorous that the popular media felt compelled to cover it: carrier pigeons delivered 4GBs of data faster than an ADSL line. The BBC story’s subtitle read “broadband promised to unite the world with super-fast data delivery – but in South Africa it seems the web is still no faster than a humble pigeon,” and that’s how most stories played it. Unfortunately, they all got it wrong.

The race was run by The Unlimited Group, but the clearest telling of it comes from Wikipedia:

Inspired by RFC 2549, on 9 September 2009 the marketing team of The Unlimited, a regional company in South Africa, decided to host a tongue-in-cheek “Pigeon Race” between their pet pigeon “Winston” and local telecom company Telkom SA. The race is to send 4 gigabytes of data from Howick to Hillcrest, approximately 60 km apart. The pigeon carrying a microSD card (an avian variant of a sneakernet), versus a Telkom ADSL line. Winston beat the data transfer over Telkom’s ADSL line, with a total time of two hours, six minutes and 57 seconds from uploading data on the microSD card to completion of download from card. At the time of Winston’s victory, the 4GB ADSL transfer was just under 4% complete.

Read more…