Here's what's been happening

It's been a rough week. I wanted to explain what has been happening recently with our CDN, and talk about all of the problems we've had with CDNs in general. If you can stomach a novel, you'll discover the good news that it's been resolved to the point where we don't foresee any further issues.

The quest

In June, we decided to move away from our home brew CDN and get a real one, because we were outgrowing it and it was becoming a real pain to manage amongst other things.

The main requirement was that we needed support for HTTPS with our own domain name ( There are surprisingly few CDN's out there that offer this service without selling your soul and first born child. Most CDN's only let you use a generic sub-domain of their CDN's domain to get HTTPS, such as This is fine the assets on the CDN are only for your web site, but that obviously is not the case with us.

Literally the only two we could find that offered this feature at a reasonable price were CloudFlare and MaxCDN, so we decided to test these out. We also wanted to try one of the enterprise level ones, just to see the difference in performance. For this we chose the 800lb gorilla that is Akamai.

MaxCDN offers HTTPS for $99 setup + $99/month, on top of the normal bandwidth costs. Very reasonable. The service was perfectly fine, but they only have locations in the US and Europe. This is definitely a majority of our market but we wanted Asia too. Well, they do offer Asia, but you have to upgrade to their enterprise service, NetDNA, for considerably more money. It was still less than what we were paying for our home brew CDN though, so I decided to try it.

This was one of the worst days I've ever had. I didn't know when the transition was occurring, because I had to submit a ticket for it and then just wait. When they finished it, they let me know, but they messed up the configuration so the HTTPS didn't work. (They forgot the chain file. If you know how certificates work, that's kind of important). It was several hours before I realized this however, because DNS hadn't propagated yet - I was still hitting their old servers for a while, which were still working fine. Once I realized there was a problem, the damage had already been done to anyone who was tracking a secure site. Not to mention it completely broke our web site for our Pro+ members, since they get HTTPS interface by default and none of the assets were loading for them. I immediately emailed them to get it fixed, meanwhile I pointed the domain back to our old CDN so HTTPS would work in the meantime. But they never actually got it fixed. I don't know what the problem was, we had a lot of back and forth, but it was clear this was not going to work.

Next was Cloudflare. I'd met the founders at TechCrunch Disrupt the previous September, they're great. Thing is, they're not technically a pure CDN. You point your DNS to them, and then all of your site's traffic passes through their network. They automatically cache all of your static resources on their servers, and then "accelerate" your HTML / dynamic content. Accelerating means requests to your server pass through their network directly to speed them up, but they don't cache the actual HTML - it just gets to you faster because the route is optimized.

All in all it's a fantastic service, and I'd be all for it, but they didn't (and still don't) support wildcard DNS - which is another do-or-die feature for us because of our white label analytics service. But their rock star support guy, John, told me they could setup a special integration with us where we could just point a sub-domain to them to act as a traditional CDN. Well, it was worth trying because there weren't any other options at this price level, especially since HTTPS only costs $1/month on top of their normal pricing, and they have servers in Asia too. It seemed too good to be true really. How could they be doing this for such a great price and have such good support? I'm pretty sure John doesn't sleep, no matter what time I email him I have a reply in minutes it seems.

Anyways, the service worked great. We had it live for a week or two. At some point there was a problem that caused us to move back to our home brew CDN, although I don't recall what it was exactly. But overall I was happy and planned to test it again in the future, but I still had Akamai to test.

Akamai is what the big boys use. Facebook, etc. I knew it was good, but also expensive. However, I figured it was worth it if the service was as good as I expected it to be. They literally have thousands of data centers, including South America and Africa which very very few CDN's have, and my speed tests on their edge servers were off the charts. Using, which tests response time from over 50 locations worldwide, I could barely find a single location that had higher than 10ms response time. Ridiculous to say the least.

They gave us a 90 day no commitment trial to test their service, which was appreciated. Their sales and engineer team were great. Very professional, timely, and helpful. But man did I hate their control panel. It was nothing short of the most confusing interface I have ever laid eyes on. I had no idea how to do anything, and I'm usually the guy who figures that kind of thing out.

They walked me through a basic setup, but then the next thing I didn't like was discovered - any changes you want to make take 4 hours to deploy. What if you screw something up? That's gonna be a nail biting 4 hour ball of stress waiting for it to get fixed.

I never actually got to really test their service because I was just too scared of screwing it up. A few weeks had passed and I had forgotten how to configure anything. My patience was wearing thin, as our custom CDN continued to deteriorate and I was dealing with other junk too. There's always a thousand things going on around here.

John from Cloudflare continued to email me to ask how our testing was going with these other services. He was confident Cloudflare would meet our needs. I was pretty sure too, just hadn't made up my mind yet. But I decided to go back to them because I didn't have much other choice.

That was early August and, well, we've been with them ever since. No problems at all. Great service. Overall I have nothing but good things to say.

But then...

Well, it turns out there was a problem. A few weeks ago, our "pull" server (that they pull static files from) crashed, and at the same time our tracking code stopped being served. It was fixed quickly but... How could this be? They should be caching everything from this server, right?

I emailed them about it and they weren't sure how the server crashing would affect cached files being served. But unless the cache expired at the exact same time as the crash, something was definitely up.

I did some digging and finally ended up "watch"ing the ifconfig output on the pull server, which shows bandwidth usage amongst other things. We were pushing almost 3MB per second of data out of that thing. Hmm, that doesn't seem right.

I renamed the tracking code file as a quick test, and sure enough, suddenly Cloudflare wouldn't serve it. Put it back, bam, it worked.

Clearly this file was not being cached. But why? Well, it wasn't their fault. The problem was the rather strange URL for our tracking code. Instead of e.g., the URL is just This is one of those "Why the hell did I ever do that" type things, but is too late to change now with almost 400,000 sites already pointing to it.

I emailed them about this and only then discovered that they cache based on file extension, not mime type or cache headers, which we of course properly serve. I wish I knew this beforehand, but wish in one hand shit in the other, see which one fills up first.

At this point I knew I needed to do something, since this single file was not being cached properly, it relied 100% on the single pull server being online at all times. I should have made it my #1 priority but with only a single 5 minute outage in 2 months, I somehow convinced myself I could think about it for a while. This was a big mistake on my part and I apologize profusely for it - it won't happen again. I could have spent a few grand with Dyn to get failover immediately to give us a safeguard until I found the right (affordable) solution, but I didn't (more on this in a second). I'm really sorry and I won't compromise our reliability like that again. Clearly it was not worth it.

So anyways, the same day I discover this caching issue, the server crashed... again. I got it fixed quickly, and as a quick precaution I setup another server and setup round robin DNS to serve both IPs so in case one crashed, there'd be backup. However there was not monitoring/failover on this config, but if DNS serves multiple IPs for a domain, theoretically the requester is supposed to fall back on the second one if the first one fails. I had never actually tested this scenario, but it was just an intermittent fingers-crossed fix until I got a real solution in place.

And then the server crashed again... and I discovered this did not work as I hoped (surprise).

Ok, so we need failover on this, like yesterday. This is now my #1 priority. Our DNS provider, Dyn, offers this feature, but what I hate about their implementation is the restrictions they place on the TTL (time to live), which is how long DNS will cache a query for. Obviously the TTL should be fairly short for maximum uptime, but the max they allow you to set with failover is 7.5 minutes. And with our level of traffic, this increases our bill several thousand dollars a month which is a bit steep for my liking. Not to mention the expensive monthly base fee just to have this feature enabled in the first place.

The plan

I finally came up with a plan though. I found another DNS provider,, that offers monitoring/failover for very reasonable pricing and no restrictions on TTL. I specifically emailed them about this like 4 times to confirm it would work exactly as I expected. However I can't just transfer to be hosted there, because we're in a contract with Dyn (sigh). So I was going to setup a different domain on their servers, and then using CNAME's, point Cloudflare to pull files from that domain, instead of the sub-domain we were using for

That was yesterday. "Great!", I said to myself. "I'll set it up first thing tomorrow because it's almost midnight!"

And then this morning.......... that's right, the freaking server crashed again. My phone was on silent by accident and I slept in, so for almost 2 hours our tracking code was only being served for about 75% of requests (because DNS IP fallback does work some of the time, it seems). Hence, more problems this morning.

ARGH. I screamed at my computer and just about burned down my house I was so mad. I had come up with a plan that I knew would work and was going to implement it first thing the next day, but the server crashes in the meantime and here I am in bed, blissfully dreaming of puppies and unicorns, unaware of any problems because my STUPID PHONE IS ON SILENT. WHY. ME.

The fix

But the good news is, today, I got this all setup. Monitoring/failover is now live on our pull servers, and they are checked every 2 minutes - so if there is a problem with any of them, DNS will stop serving that IP to Cloudflare within 2 minutes at the most, and I verified it works properly by intentionally killing a server. And the TTL is only 5 minutes, so the absolute maximum amount of time there could potentially be a problem for any individual person is 7 minutes. And we added a third pull server, so at the most this would only affect 1/3 of anyone, and even then, for a maximum of 7 minutes.

(Note: Above I was complaining about Dyn's 7.5 minute max TTL, and here I am with a 5 minute one. Well, this one's a bit different because only Cloudflare's servers talk to it, so the total queries generated are quite small. The real issue is we're also going to be doing this same thing in order to "load balance the load balancers" (really?), because we're adding two more of them this week. Using failover on this is what would be really expensive, so we're avoiding that by using another DNS provider for it, and we figure we might as well do all of that monitoring and failover in one place. Load balancers are stable and reliable, so the TTL will be a bit higher - and even if not, their pricing is considerably cheaper than Dyn's, so it's all good).

On top of all that, Cloudflare desperately wants to "fix" this caching "problem" on their end too. (I say "problem" in quotes because their service is working exactly as they designed it to work, I just didn't know ahead of time that caching was based on file extension only). They are working on a solution that will allow us to rewrite URLs on their end so that their servers will see the tracking code file as something that ends with a .js file extension and hence cache it properly, without us having to make any changes on our end. Once that's live, even if all 3 of our pull servers were offline (knock on wood), it should have zero impact because that stupid legacy URL file will be actually be cached!

In conclusion

So that, my friends, is as short a summary as I can write about everything we've been through with CDNs.

And on top of all this, we also made an update to the tracking code on Nov 1 that caused issues for some of you. This update has been reverted but that was the last thing we needed with the CDN also causing issues at the same time. [Update: And there was a small network hiccup at our data center on Nov 9 that caused a short outage. Worst week ever.]

So I don't really feel like we have earned your money this month (and to think, it's only the 8th...) If anyone wants a refund, send us an email we'll happily refund you a full month of service.

No matter what, know that I value the quality of our service above anything else and will always do everything in my power to make sure it works flawlessly. This has been a horrible week, but as of now the CDN should not impact anyone.

Thanks for reading and (hopefully) understanding.
30 comments |   Nov 09 2011 1:48am

iframe tracking, copying dashboards, Google search encoding, etc

It's new feature Tuesday!

Better iframe support

A common problem we have is that people can only install the tracking code inside an iframe, but they want to track the parent document, not the iframe. But unfortunately this only tracks the iframe itself. Now I know there are plenty of people who want to track the iframe specifically, but there are way more people in the other camp. So now, by default, our tracking code will detect if it's in an iframe and use the parent documents URL and title instead of the iframe's. This is already what most other services do by default.

There is a way to override it though if you are actually wanting to track the iframe on purpose, via the new clicky_custom.iframe property.

[Nov 3 update: This change may have caused general tracking problems for some sites. We tested it, as we always do, against all major browsers before deployment and it worked fine, but something with it is causing problems for some of you, so the change has been reverted. We will look to add it back in the future.]

Copying dashboards between sites

If you have a bunch of sites, you may have created the most awesome amazing customized dashboard ever. And then you have to recreate it for every site in your account. So fun!

Well, now when you go to your customize dashboard page, there will also be a list of all the dashboards you've created for your other sites. One click and bam, that dashboard is now copied into the new site. After it's copied you can edit it if you want, or just leave as is.

Google search encoding

Some change Google made to their URL structure is resulting in double URL encoding, and you might be seeing searches+like+this instead of searches like this. It's not just affecting us either, I checked my Google Analytics account (gasp!) and was seeing the same thing. As of today, we now just double URL decode all searches before storing them, so this problem is history. I imagine Google will fix it on their end eventually but patience is not one of my virtues.

Black nav bar

We know some of you don't like this but we feel it's important to have these links highly visible and easy to find. If you stick something in a footer, no one clicks the links because no one sees them. Two designs ago, when we actually had a footer, as soon as we moved a bunch of those links into the sidebar we added, the number of clicks each one was getting skyrocketed. I'm talking 10-20x as much activity. That's a good thing.

But anyways, today, I reduced the padding so it takes up a bit less space, and also removed the position:fixed style rule so it's not always on the screen, instead it's only visible when the page is scrolled up all the way. I hope that appeases some of you to a small degree at the very least.
6 comments |   Nov 01 2011 1:05pm

More ways to view hourly data... and more!

We just deployed a bunch of changes to hourly data, amongst other things:

  • Goals and Revenue now support hourly data, so you can more easily see your best converting and most profitable times of day. However, we just started doing this today, so earlier dates will not have hourly data.

    Anytime we add hourly support for something, it essentially require 24x as much storage space, which is why we only do it for a few types of data (currently: visitors, actions, tweets, short URLs, goals, and revenue). If you saw how big our databases were already, you'd cry and then realize why this is necessary.

  • Hourly averages - there are three new options in the drop down menu for hourly graphs:
    • Same day of week average - example, this Monday vs the average of the last 4 Mondays
    • Weekday average - Today vs the average of all weekdays (Monday-Friday) from the last 4 weeks
    • Weekend average - Same as Weekday average but for Saturday/Sunday only

    These are all insanely useful, particularly the first one!

  • You can set "same day of week average" as your default trend comparison in your dashboard preferences, in which case your hourly graphs will also default to displaying this mode. We initially coded in support for weekday/weekend stuff too, but they generated WAY too many queries; there were up to 20 extra pieces of data that needed to be pulled from the database for each item in any given report, and it couldn't be optimized since there are "holes" in the date ranges.

  • Daily graphs default to 28 days instead of 30 days, to more cleanly fit week boundaries. We think you will find this especially useful when comparing vs "previous period". An example is shown on the right.

  • Compare menu fixes/additions - When viewing daily graphs, the "Compare..." menu has been broken for a while now. Not sure when it happened but we finally got it fixed. We also added some more options to it that were much needed (revenue, goals, campaigns, pages, and tweets, to name a few).

14 comments |   Oct 27 2011 4:46pm

24-hour time formatting and smarter defaults for new sites

We have finally added a 24 hour time formatting option, so you'll see e.g. "15:30" instead of "3:30pm". This change should affect everywhere you see time within a site's reports, but if we missed something, let us know. You can change this setting in your site preferences.

The defaults when registering a new site also just got a lot smarter. Those of you with lots of sites probably get annoyed with how many preferences you have to change every time you register a new site - particularly if you are not on the west coast of the US. Now, any time you register a new site, we'll grab the following preferences from the last site you registered and make them the default for the new one (which you can of course change if desired):

  • Time format (12/24 hour)
  • Time zone
  • Daylight savings
  • Anonymous IP logging
  • Hide hostnames in visitors list
  • Hide ISPs

    3 comments |   Oct 21 2011 2:42pm
  • utm_custom: a new URL parameter to attach custom data to visitors

    One feature that gets requested a lot is to be able to set a variable in the URL that could then be attached to the visitor as custom data. This would be particularly useful for things like email newsletters, so when someone clicks through, they can be identified automatically.

    The variable name needed to be generic because of our white label program, and since we pictured this being used with "campaign activity" more than anything else, we decided to call the variable utm_custom (related to Google/Urchin's utm_campaign etc variables).

    You can see full documentation here. Because custom data requires a Pro or higher account (upgrade), this variable will also only be processed if you have a Pro or higher account.

    utm_custom is an associative array so you can set multiple key/value pairs on a single page. (It must be an array with at least one key/value pair, or it will be ignored). For example, if you sent a visitor to this page:[username]=Bob+Jones&utm_custom[email]

    You would see this in your visitor's list:

    And this when viewing visitor/session details:

    We've had requests for this countless times over the years so we know many of you will find it quite useful :D
    3 comments |   Oct 20 2011 7:41pm

    Next Page »

    Copyright © 2019, Roxr Software Ltd     Blog home   |   Clicky home   |   RSS