Please do not place third party javascript in your HTML head.

There were a few bugs with our new heatmaps for people who were putting our tracking code in their HTML head tag, and there were an alarmingly large number of you doing so. Those bugs are fixed now, but that's beside the point of this little post.

Web browsers download page elements in the order that they are listed in your HTML - CSS, javascript, and images (mostly). If the domain that any of those are hosted on is offline or going slow, a web browser will "hang" for up to 60 seconds while it waits for the item to load, before moving on to the next item. (Asynchronous javascript is an exception but that's still not very common - although we do offer it an option - but again, that's not the point. And besides, if you're loading it asynchronously, you're not depending on it being available for immediate execution, so what would be the point of putting it in your HTML head?).

If you have third party javascript in your HTML head and that third party server goes offline, your web site is effectively dead because nothing else is going to load for up to 60 seconds. When I visit sites that hang like that, I immediately close them as I'm sure 99% of people do.

If the javascript was instead in the footer of your web site, the web browser would still hang trying to download the item, but most of your site would already be loaded in the visitor's web browser. In other words, your site would still be (mostly) usable while the engineers for the third party service go into panic mode.

Our CDN never really goes "offline" but there are obviously thousands of services out there that ask you to put their javascript on your web site. Any third party javascript code that's not critical to your web site working properly should never go in the HTML head. Tracking code, ad code, widgets, social plugins, etc - these are all non-critical and have no business up there. Help make the internet a better place and always put that code at the bottom of your web site, k?

The only valid exception here is if it's hosted by a major provider such as Google, and you rely on the functionality of that code as the page loads (which we do for Google's map API, but that's it). That is the only exception.

So... if you've got our code in your HTML head, please move it instead to the bottom of your HTML, right before the closing /body tag, as has always been recommended on the page where you grab your tracking code from.

Bonus pro-tip: For first party elements, your CSS file should always be the very first item in your HTML head. That way your site will always look "proper" as soon as possible, beacuse this will be the very first thing the browser downloads after the HTML itself.
3 comments |   Oct 17 2012 4:25pm

Heatmaps!

We've got two great new features launching today, after nearly 4 months of development: heatmaps, and on-site analytics.

Here's an example of a heatmap report, after the new tracking code has been live on our site for a couple of hours:




(Note: Everyone's user homepage is unique so some of the hot spots look weird here when conglomerated on a global scale. That's the nature of dynamic pages).

IMPORTANT! Heatmaps won't start logging anything until you specify your site's width and layout in your site preferences. You can see where to do that in this screenshot. And as with on-site analytics, you must have the newest tracking code in your browser, so clear your cache!

Over the years we've had a lot of requests for this feature and we're psyched to finally offer it to you. But once we started on it, the idea for on-site analytics was born soon thereafter, making this otherwise fairly-simple feature spiral out of control (in a good way). Hence nearly 4 months of development and testing!

Most services, to view heatmaps, you go to their site (e.g. getclicky.com) and go to your heatmap report, click a button for the page you want to view heatmaps for, and then your site is opened in an iframe, and then a heatmap overlay is loaded on top. That's unideal for a lot of reasons, but particularly the iframe (ugh). Loading a site in an iframe introduces a lot of potential problems, and a lot of sites have "frame busters" that would totally break this feature. So we took a different approach.

Ok, we won't deny you the "boring" way, going to your reports and clicking the appropriate link in the Content report:




What we do differently is that your page loads directly in a new window with a special parameter in the URL fragment, that our tracking code recognizes and automatically loads a heatmap dynamically on your site - without a frame, and without requiring being logged in to Clicky (i.e. you can share the URL). So that's good.

But we thought it would be so terrific if you could just be on your web site and say "I want a heatmap of this page, now" and just click a button on that page and see it immediately. So that's what we did, and hence made it part of the on-site analytics experience, as you can see to the right.

This was still not enough for us. Even though very few of our real-time competitors offer heatmaps, we didn't want to just make a "me too" feature. We believe most of the features we are releasing with our heatmaps are unlike anything that any other service offers.

First of all, beyond "per page" heatmaps, we also do "per session". For every single page view of every session we log for your site, we log a unique heatmap for that visitor. You can view session heatmaps by viewing a session, and next to each action, on the right-hand side, will be a heatmap icon to click if we have any heatmap data for that page/session combo. Click the icon to view all of their clicks on any given page:




Second, what would be analytics without segmentation? You can view "global" heatmaps (all clicks) for any page, or you segment them by a number of criteria. For example, only view clicks on a page for people who completed a specific goal, or only people who arrived via a specific search or campaign or referring domain or type of referrer (e.g. advertising), or any/every version of every split test you're running.

Here's an example of our homepage for users completing the "new user" goal. Unsurprising, the majority of the clicks are on the "register now" button:




The order of items in these reports is a bit different than what you see on our web site. We take the top 24 items ordered by "popularity", e.g. the top 24 goals completed today, then re-order them alphabetically. This is because we think you'll generally want to see the most popular segments but you'll also want to find them quickly - so instead of showing them in order of popularity, we grab the top ones and re-order them alphabetically so you can quickly narrow down what you're looking for in the interface.

To generate segmented heatmaps, we needed to add a new filter type to quickly find visitors with heatmap data attached. So we made this a general filter as well. In the main visitors report on Clicky, when you click "add a filter", you will see a new option callled "heatmaps". Click that to filter down your visitors to just people with heatmap data attached. This also works with the analytics API, by specifying "heatmap=1" in combination with type=visitors or type=segmentation requests. (We also added a filter for "online now", since we needed to be able to internally filter visitors who were online now in order to display in the widget. Via the API, use "online=1").

When viewing your visitors report, anyone with heatmap data attached will have a new icon next to them:




Not every visitor will have heatmap data though... We anticipate this type of data to take up a lot of space, so for now we only do it for a random 50% of your visitors. We will likely adjust this in the future as we see what kind of impact it has on our resources. For example, it is quite likely we'll increase the sampling rate for low traffic sites, maybe even do "all visitors", and for higher traffic sites, do less than 50%. But for now, just doing 50% was an easy safety measure and we'll analyze resource impact as time goes on.

Most heatmaps use server-side image generation to overlay on your site, which takes up a lot of resources on the service providing the heatmap - and is also quite slow. Earlier this year we discovered the excellent open-source heatmaps.js by Patrick Wied, and we knew that was just what we needed. This is a javascript library that generates the heatmap on the fly, using your web browser's resources, so it's fast and efficient. There are lots of features this library has that aren't yet documented, some of which we wanted to use (animated heatmaps for example) - but we didn't want to take the time to figure them out ourselves. We hope this library matures in the future and we'll take advantage of it when it does. But for now, for what we need, it does the job extremely well.

A difficult decision

This will be the first feature we offer that is not included in the standard Pro plan. The Pro plan has always had every feature we offer, so this was hard to do. But, the resources needed for this feature are quite significant, both in terms of bandwidth and storage, so we can't justify the standard Pro pricing for this feature.

We do want all of you to experience it first hand though, so for the first 4 weeks, all Pro or higher accounts will have this feature. After 4 weeks, a Pro Platinum or Custom plan will be required to keep the heatmap feature. On-site analytics will be standard with any Pro or higher account, though.

Data retention

As we said above, we anticipate this taking up a lot of storage space, so we built the database structure around monthly chunks that are auto-purged every 30 days. Initially, daily/session data will only be guaranteed for 30 days, but be available up to 60 days, and monthly data will only be guaranteed for 2 months, but available up to 3 months. It's hard to anticipate the exact impact this will have on our resources so we may (and hope to) expand this limit in the future, but for now, this is all we can guarantee.

Internet Explorer, may I count the ways in which I love thee?

Prior to IE8, IE does not have native JSON support, and even then, only when your site runs in standards mode. IE causing developers headaches is as sure as the sun rising in the east, regardless, the point is that IE users will not have heatmap data logged unless they are on at least version 8 and your site renders in standards mode. Good luck!
49 comments |   Oct 14 2012 8:51pm

On-site analytics!

We've got two great new features launching today, after nearly 4 months of development: heatmaps, and on-site analytics.

On-site analytics is a new feature that embeds a widget on the bottom right corner of your web site automatically to view information about your visitors who are on your web site right now. Don't worry, only you (the site owner) can see it - but if you want to disable it, you can do so on your user preferences page.

This feature is available to all Pro or higher customers. Need to upgrade?

Before we get into the details, a few important notes for this to work properly:
- This requires the latest version of our tracking code, so clear your cache!
- Third party cookies must be enabled (or at least, have Clicky white listed)
- You need to check the "remember me" box when you login to Clicky, so that a cookie is set to remember who you are.

There are three components to on-site analytics:

Visitors online

The "on site" number shows you how many visitors are currently on your web site. Click on it to view a list of said visitors! Up to 8 visitors are shown at a time, and you can page through them with the "next page" link in the top right corner. Click on the username (if you're using custom data tracking) or IP address to view their session on Clicky:




You can also view several summary reports about the visitors onine, such as top searches, referrering domains, traffic sources, goals, and more.




Visitors on this page

The "on page" number represents how many visitors are currently viewing the page that you are currently viewing. Clicking on it shows the exact same type of reports as the global "on site" report, except it's limited to visitors currently viewing the same page that you are.

The "on site" and "on page" numbers will automatically update once per minute for 5 minutes, then once every 5 minutes for an hour, while you are idle on a single page on your web site. But after an hour, the updating will stop.

Heatmaps

If you have enabled heatmap tracking in your site preferences, and there is heatmap data in the last 7 days for the page you are currently viewing, you will see the heatmap icon on the right hand side (large colorful pixels). Click that to view a heatmap report for the page you're viewing for the current date, along with segmentation options for the heatmaps.

There's a lot to talk about with heatmaps, so we wrote a separate post for that. Read about heatmaps here.


Other notes/features


  • Clicking on the "Clicky Web Analytics" link will open a new tab/window on getclicky.com with your site's dashboard.

  • On-site analytics requires jQuery. We automatically check if your site already has it, and if not, we side-load it from our CDN - after which we call jQuery.noConflict() to ensure we don't interfere with any other libraries you may have on your site that use the $() shortcut. In order for all features to work, we do require at least version 1.6, which is almost 30 months old - if you're on a version older than that, please upgrade ;)

  • To authenticate your access to on-site analytics, we had to cache your user cookie on the tracking servers. While we were at it, we went ahead and enabled ignoring your traffic to your own web site automatically, without having to set a IP filter / filter cookie. If you need to test something on your own web site and see it in Clicky, simply logout of Clicky or use a different web browser. We know this will be a PITA for some of you, but the majority of users want to ignore their own traffic so this makes sense for us to implement as a convenience feature.

  • Calls to clicky.log() are now stored in a cookie queue (unless you have disabled cookies), to help ensure these calls aren't missed. Previously we recommended calling clicky.pause() manually after any clicky.log() or clicky.goal() event that resulted in a new page view, as otherwise the call would almost never be logged. This new queue system is designed to fix this issue. As long as the visitor's next page view is still on your own web site, the cookie will be seen by our tracking code and processed as an event and sent to our tracking servers upon the next page loading on your web site. The queue is processed every 5 seconds when idle on a page, and immediately upon a fresh page view.

    Hope you enjoy!
    22 comments |   Oct 14 2012 8:47pm
  • History of Spy (also, ZeroMQ rocks)

    This post is written by Alexander, hacker extraordinaire, who rewrote the Spy backend from scratch. Next time you're in Portland, kindly buy him a beer.

    Clicky is almost 6 years old and hence was one of the first real-time web analytics platforms. Spy is an important part of that offering. Spy allows you to glimpse important information about active visitors on your website, as that information flows out of your visitors' browsers and into our system.

    Spy has been a part of Clicky since the very beginning and has always been one of the most popular features. The name and functionality were both heavily influenced by "Digg Spy". Sean, our lead developer, says that when he saw Digg Spy circa 2005, he thought to himself "how sweet would it be if you had that kind of real time data stream, but for your own web site?" That was in fact one of his motivations to create Clicky in the first place.

    Since 2006, we've grown by leaps and bounds, and many parts of our service have had to absorb the shocks of scale. Recent tweets about our major data center move and infrastructure changes confirm this, but much is afoot at a deeper level than that meeting the eye.

    Spy has gone through multiple complete rewrites to get to where it is today. We thought some of you might find it interesting.


    Initial implementation

    The first version of Spy was drafted in a few days. Back then, all infrastructure components for Clicky were located on a single machine (yikes).

    Incoming tracking data was simply copied to local storage in a special format, and retrieved anew when users viewed their Spy page. It was extremely simple, as most things are at small scale, but it worked.




    Spy, evolved

    By mid-2008, we had moved to a multi-server model: one web server, one tracking server, and multiple database servers. But the one tracking server was no longer cutting it. We needed to load balance the tracking, which meant Spy needed to change to support that.

    When load balancing was completed, any incoming tracking hit was sent to a random tracking server. This meant that each dedicated tracking server came to store a subset of Spy data. Spy, in turn, had to open multiple sockets per request to extract and aggregate viewing data. Given X tracking servers, Spy had to open X sockets, and perform X^2 joins and sorts before emitting the usage data to the user's web browser.

    Initially the Spy data files were served via Apache from all of the tracking servers, but that tied up resources that were needed for tracking. Soon after this setup, our first employee (no longer with us) (Hi Andy!) wrote a tiny python script that ran as a daemon on each tracking server. It took over serving the Spy files for the PHP process on the web server that was requesting them. This helped free up precious HTTP processes for tracking, but there was still a lot of resources being wasted because of the storage method we were using.

    Tracking servers had no mechanism for granularly controlling the memory consumed by its Spy data. Therefore, each tracking server had a cron job that would indiscriminately trim its respective Spy data. Sometimes, however, this mechanism would grow out of control, so trims happened randomly when users viewed their Spy page. These competing mechanisms and their implementations presented an opportunity for datum linearity issues to arise occasionally.

    The number of websites we tracked continued rising, and we made several changes to this implementation to keep resource usage in check. Among them was the move to writing Spy data to shared memory (/dev/shm) instead of the hard drive. This helped a great deal initially, but as time wore on, it became clear that this was just not going to scale much further without a complete rethinking of Spy.




    Trimming the fat

    In the end, we decided to reimplement Spy's backend from the ground up. We devised a list of gripes that we had with Spy, and it looked like this:

    *N tracking servers meant N sockets opened/closed on every request for data
    *Data had to be further aggregated and then sorted on every request
    *Full list of Spy data for a site was transmitted on every request, which consumed massive bandwidth (over 5MB/sec per tracking server on internal network) and required lots of post-processing by PHP

    Our goals looked like this:

    *Improve throughput
    *Improve efficiency
    *Reduce network traffic
    *Reduce per-request and overall resource usage


    Enter Spyy

    The modern version of Spy, called "Spyy" to fit in with our internal naming conventions, is implemented in two custom daemons written in C. The first runs on each tracking server and dumbly forwards incoming Spy data to the second daemon, the Spy master. Data travels over a persistent socket established by ZeroMQ.

    The Spy master daemon stores the information efficiently for each site, using a circular buffer whose size is determined and periodically reevaluated by a site's traffic rate. This removes the need to manually trim any data from a site as we've been doing for years. Once the buffer is filled for a site, old data is automatically purged as new data arrives. When we read data from the new daemon, it gives us only the data we need (basically, since timestamp X) instead of "all data for site X", drastically reducing network bandwidth and post-processing needed to be done by PHP.

    Like the previous version of Spy, all data is stored in RAM to ensure peak performance. Because of this, however, the data is still volatile. We have other mechanisms for permanently storing tracking data, and Spy data excludes much information from what is permanently stored. This means that when we implement new features or apply bugfixes to Spyy, it must be killed and started anew.

    To mitigate this blip in the availability of data, we have implemented a rudimentary transfer mechanism between the existing Spy program and a new one coming online to take its place. This mechanism also uses ZeroMQ and, basically, drains datums from the existing process to the new process. At completion, the old process shuts itself down and the new process claims its occupied network interfaces.




    Conspycuous benefits

    We met our goals, and then some. The overall project took about 8 weeks.

    Benchmarks at the start of the project demonstrated the old Spy implementation barely able to keep up at peak load, tapping out at about 5,000 reqs/sec. In comparison, the new implementation can handle upwards of 50,000 reqs/sec on the same hardware.

    In the old architecture, read performance decreased as the number of tracking servers increased. In the new architecture, read performance is unaffected by the number of tracking servers, and is roughly 2N times better than the old architecture, assuming at least 1 tracking server N. Write performance in both cases is constant.

    RAM usage for storing Spy data was approximately 4GB per tracking server under the old architecture. At the time we had 5 tracking servers which meant 20GB total (we have 7 tracking servers now). New Spy, on the other hand, only uses up 6GB total RAM on a single virtual machine, and it takes up so little CPU power that the same server hardware also hosts one tracking server and four database servers without issue.

    Bandwidth wise, the old Spy used over 20MB/sec of combined bandwidth for read requests from the tracking servers. New Spy? About 500KB/sec on average, reducing network footprint to barely 1% of what it was before.

    In the event of a Spy master server outage, our tracking servers simply drop Spyy-bound packets automatically and continue persisting other data to disk, with absolutely no performance impact.

    Because of the way that ZeroMQ is implemented, we can scale this architecture very readily and rapidly. It removed a lot of application complexity and let us focus on the implementation. With ZeroMQ, business logic itself drives the network topology.

    Additionally, because of ZeroMQ, we can easily segment sites out onto different Spy "masters" with little change to the rest of Clicky, should the need or desire arise. In fact, we already do this because of some legacy white label customers (which we, of course, thank for the challenge provided by their existence in this implementation).

    As stated above, redundancy is not a goal of ours for Spy, because of the volatile and transient nature of its data. But if we ever change our mind, we can simply set up a ZeroMQ device between the tracking servers and Spy master, and have each datum be split off to N master servers.

    Overall, we are extremely happy with the improvements made. We are also very impressed with ZeroMQ. It has been getting quite a bit of hype recently, and in our opinion it lives up to it.
    8 comments |   Sep 20 2012 2:13pm

    Jigsaw company API integration is dead and we're looking for alternatives

    About two weeks ago, our integration with Jigsaw broke. Jigsaw is (was) a crowdsourced database of information on businesses, mainly US ones but some in Europe and other places too. It let us provide you with information like this.

    We looked into it and discovered that our API key got revoked. We did some digging and realized that Jigsaw had been bought out by SalesForce last year, and Salesforce has decided to kill free access to this API, which apparently became effective around September 1.

    The API is still available, if you're willing to pay. We are definitely willing to pay for this kind of data, but the price is $25,000/year. That works out to just over $2,000/month, which would make it our biggest monthly expense, other than payroll. Sorry, can't justify that.

    For now, we have modified the links that pulled in this data to instead just open up a Google search page with the organization name pre-filled. In some ways this is actually better because Google will almost always find the company in question, regardless of physical location, whereas with Jigsaw, when I wanted to look up info on a company it was only maybe 50% success rate. With Google you'll have to do a bit of work on your end to find the details that we were previously providing, but at least it's something.

    Alternatives?

    If you know of an alternative service that is reasonably priced and includes an API, please let us know. We haven't found anything worthwhile as of yet. A lot of these services seem more geared towards providing you with "leads" at these companies, rather than just information about the companies themselves, which is not what we're really interested in at the moment.
    11 comments |   Sep 17 2012 2:27pm

    Next Page »




    Copyright © 2017, Roxr Software Ltd     Blog home   |   Clicky home   |   RSS