Friday, July 9, 2010

Adrive.com, the Transfer Remote File Feature and server backup

I've had my basic Adrive.com account, with 50GB of free backup just sitting with no purpose for several years. I have had no reason to use it, mostly because the basic service only allows you to upload through a crappy little java applet and allows a max of 2ooo files at a time. For whatever reason, this did not cut it for me. Then one day, I gave adrive another try and ran across a link to Transfer Remote File, very nice! It works much the way you'd expect. Give it a file on the internet and it will download it and put it in the folder of your choosing on adrive.

There are stipulations of course, for instance, only one file at a time, no queuing mechanism, oh and no url's with question marks (?) in them. Damn.. no question marks? Adrive, have you any idea how many people are using server side scripting languages to deliver downloads these days? Anywho, I found out about the question mark issue a little late, way after I had devised use for adrive: as a server backup system. So, I had to adapt.

I have my own linode vps, just hanging out for projects and some light hosting. I also have some hefty folders full of files on that server, so I thought I would make a system for backing up folders using some encapsulation method. The constraints were that I would have to zip or tar the folders that I wanted, since the remote transfer only takes one at a time and no question marks in the url's. So I set to work. I was thinking of using a php script along with mod_rewrite for apache, so if I typed in

http://dl.server.com/Folder I Want.tar/zip
the php script would kick in and deliver Folder I Want as a tar/zip archive to adrive on the fly, which would be awesome.

Why not just make the archives first and allow them to be downloaded? There are a few reasons, the first and most important to me was storage. Putting 1-2GB of additional data onto disk temporarily is not an option, I'm strapped for space as it is. Secondly, is obscurity, having no discernible file structure to the outside world is nice and eliminates the need for the folders to be in the web root directory.

Implementing zip on the fly was my first task because it offered compression which saves on bandwidth. This failed miserably, however, because php offers no native zip streaming functions. There are a few libraries floating around out there, the most notable I found was ZipStream-PHP written by Pablotron, but when I looked at the documentation, its not suited for recursively adding folders and has a lot of overhead, creating the archive twice, once to calculate file size and the second to stream to the end user. Truth be told, I could probably make pablotrons library work if I wanted to spend any time on it, but I don't, so I won't - for now.

At this point, I'm just hoping that streaming a tar archive is easier, as it ought to be because there is no compression for tar archives. After some googling, I found a few php commands for streaming a tar archive. Here is my php script which I put in my webroot folder for dl.server.com

file.php

<?php

$dir = "/home/mike/torrent/";

$dirname = empty( $_GET['dir'] ) ? "hiphopapotomus" : $_GET['dir'];

// if there was no directory by that name then 404
if ( !file_exists( $dir . $dirname . "/" ) ){
header('HTTP/1.0 404 Not Found');
echo "<h1>404 File Not Found!</h1>";
echo "<p>The file that you have requested could not be found.</p>";
exit();
}

// remove script time limit to allow for download
set_time_limit(0);

$filename = $_GET['dir'];

// send browser file headers
header('Content-type: application/x-tar');
header('Content-Disposition: attachment; filename="' . $filename . '.tar');

// add some escape characters
$filename = escapeshellarg($filename);

// the C argument is so that doesn't get included in your tarball extraction
$cmd = "tar cC $dir $filename";

// stream tar file
$fh = popen($cmd, 'r');

while (!feof($fh)) {
print fread($fh, 8192);
}

pclose($fh);

?>



MOD_REWRITE

Now that we have the script running properly, get mod rewrite working so that Adrive doesn't bitch about question marks. Installing and setting up mod_rewrite for apache is beyond the scope of this post, however, I'm sure you're a resourceful person and can use google.
As it stands now, a file request looks like this:

http://dl.server.com/file.php?dir=Folder I Want

We need to make it

http://dl.server.com/Folder I Want.tar

so, with a little .htaccess magic we can make that happen.
Create a new .htaccess file in the webroot folder

Options +FollowSymlinks -Indexes
RewriteEngine on

RewriteRule ^(.*)\.tar$ file.php?dir=$1 [NC]

So, I bet you want to know what this does. The magic is in the RewriteRule line.

PART 1
RewriteRule ^(.*)\.tar$
  • ^ means beginning of expression - meaning stuff after http://dl.server.com/ - in this case: Folder I Want.tar
  • () parentheses will denote that we want to store the thing that we match inside as a variable
  • . - means that we want to match one character - in this case, the first character is 'F'
  • * - means that we want to match 0 or more of the thing before it.

.* will match:

Folder,
Folder I
Folder I Want
Folder I Want.tar,
or any permutation you can think of

  • \. - since the symbol '.' means something special, if we actually want to find a period, we have to tell apache that we just want the period, and that means using the escape character '\' so \. means -> find a period

so far:

^(.*)\. will match
Folder I Want.
where
Folder I Want
will be stored in a variable that we can use later.

now for the magic
  • tar$ will match anything ending in tar, $ means end, so it will match 'Folder I Want.tar'
The full regular expression(regex) ^(.*)\.tar$ will match:
anythingyoucanconcieveof.tar

In plain english it says match anything that starts with something and ends with .tar

PART 2
file.php?dir=$1 [NC]
The second part of this line tells apache what to convert this to if it finds the expression

  • file.php - is the name of the php script we want to handle this business.
  • ? - ? means that everything after this is data from the requester
  • dir - is the name of the php variable $_GET['dir'] in our script
  • = - is setting $_GET['dir'] equal to anything that comes after it
  • $1 - remember in the first part when 'Folder I Want' is saved to a variable? Well $1 is that variable
  • [NC] - means no case, equivalent to case insensitive so Folder I Want.TaR would match
What does all this do?
http://dl.server.com/Folder I Want.tar
Becomes
http://dl.server.com/file.php?dir=Folder I Want
All of this happens in the background on the server side, so Adrive never notices any of this happening. Mod rewrite changes the requested url to what you want and then performs the actual http request, so the file.php script is called whenever Something.tar put in the url. Nice huh?

NOTES:
  • You can request folder names with space in them by just tying them exactly as they appear EX: Folder Name.tar
  • Adrive will complain that it does not know the size of the file being downloaded. This is because we never told adrive how large it was when we began the stream. This is an inherent problem with the file.php script, which I will someday attempt to rectify.
Troubleshooting:

I ran into a problem with mod_rewrite not working correctly (at all actually). This problem stemmed from the httpd.conf file for apache, well, actually from the vhost file in the /etc/apache2/sites-available/ (im running Ubuntu server)

My vhost file had the following line in my directory declarations twice:
AllowOverride None
I changed them to
AllowOverride All
Ran apache2ctl graceful

Viola! Worked. Now, if you're using shared hosting, you probably wont be able to change these files, but then again, I would think your hosting provider would allow overrides with .htaccess files from the start.

This concludes my Adrive mania. Oh, the steps we take to not pay money for a backup solution! I know it is unnecessarily long, but I am conditioned to adhere to mathematical rigor, even when no math is involved.