On-site analytics!

We've got two great new features launching today, after nearly 4 months of development: heatmaps, and on-site analytics.

On-site analytics is a new feature that embeds a widget on the bottom right corner of your web site automatically to view information about your visitors who are on your web site right now. Don't worry, only you (the site owner) can see it - but if you want to disable it, you can do so on your user preferences page.

This feature is available to all Pro or higher customers. Need to upgrade?

Before we get into the details, a few important notes for this to work properly:
- This requires the latest version of our tracking code, so clear your cache!
- Third party cookies must be enabled (or at least, have Clicky white listed)
- You need to check the "remember me" box when you login to Clicky, so that a cookie is set to remember who you are.

There are three components to on-site analytics:

Visitors online

The "on site" number shows you how many visitors are currently on your web site. Click on it to view a list of said visitors! Up to 8 visitors are shown at a time, and you can page through them with the "next page" link in the top right corner. Click on the username (if you're using custom data tracking) or IP address to view their session on Clicky:




You can also view several summary reports about the visitors onine, such as top searches, referrering domains, traffic sources, goals, and more.




Visitors on this page

The "on page" number represents how many visitors are currently viewing the page that you are currently viewing. Clicking on it shows the exact same type of reports as the global "on site" report, except it's limited to visitors currently viewing the same page that you are.

The "on site" and "on page" numbers will automatically update once per minute for 5 minutes, then once every 5 minutes for an hour, while you are idle on a single page on your web site. But after an hour, the updating will stop.

Heatmaps

If you have enabled heatmap tracking in your site preferences, and there is heatmap data in the last 7 days for the page you are currently viewing, you will see the heatmap icon on the right hand side (large colorful pixels). Click that to view a heatmap report for the page you're viewing for the current date, along with segmentation options for the heatmaps.

There's a lot to talk about with heatmaps, so we wrote a separate post for that. Read about heatmaps here.


Other notes/features


  • Clicking on the "Clicky Web Analytics" link will open a new tab/window on getclicky.com with your site's dashboard.

  • On-site analytics requires jQuery. We automatically check if your site already has it, and if not, we side-load it from our CDN - after which we call jQuery.noConflict() to ensure we don't interfere with any other libraries you may have on your site that use the $() shortcut. In order for all features to work, we do require at least version 1.6, which is almost 30 months old - if you're on a version older than that, please upgrade ;)

  • To authenticate your access to on-site analytics, we had to cache your user cookie on the tracking servers. While we were at it, we went ahead and enabled ignoring your traffic to your own web site automatically, without having to set a IP filter / filter cookie. If you need to test something on your own web site and see it in Clicky, simply logout of Clicky or use a different web browser. We know this will be a PITA for some of you, but the majority of users want to ignore their own traffic so this makes sense for us to implement as a convenience feature.

  • Calls to clicky.log() are now stored in a cookie queue (unless you have disabled cookies), to help ensure these calls aren't missed. Previously we recommended calling clicky.pause() manually after any clicky.log() or clicky.goal() event that resulted in a new page view, as otherwise the call would almost never be logged. This new queue system is designed to fix this issue. As long as the visitor's next page view is still on your own web site, the cookie will be seen by our tracking code and processed as an event and sent to our tracking servers upon the next page loading on your web site. The queue is processed every 5 seconds when idle on a page, and immediately upon a fresh page view.

    Hope you enjoy!
    22 comments |   Oct 14 2012 8:47pm
  • History of Spy (also, ZeroMQ rocks)

    This post is written by Alexander, hacker extraordinaire, who rewrote the Spy backend from scratch. Next time you're in Portland, kindly buy him a beer.

    Clicky is almost 6 years old and hence was one of the first real-time web analytics platforms. Spy is an important part of that offering. Spy allows you to glimpse important information about active visitors on your website, as that information flows out of your visitors' browsers and into our system.

    Spy has been a part of Clicky since the very beginning and has always been one of the most popular features. The name and functionality were both heavily influenced by "Digg Spy". Sean, our lead developer, says that when he saw Digg Spy circa 2005, he thought to himself "how sweet would it be if you had that kind of real time data stream, but for your own web site?" That was in fact one of his motivations to create Clicky in the first place.

    Since 2006, we've grown by leaps and bounds, and many parts of our service have had to absorb the shocks of scale. Recent tweets about our major data center move and infrastructure changes confirm this, but much is afoot at a deeper level than that meeting the eye.

    Spy has gone through multiple complete rewrites to get to where it is today. We thought some of you might find it interesting.


    Initial implementation

    The first version of Spy was drafted in a few days. Back then, all infrastructure components for Clicky were located on a single machine (yikes).

    Incoming tracking data was simply copied to local storage in a special format, and retrieved anew when users viewed their Spy page. It was extremely simple, as most things are at small scale, but it worked.




    Spy, evolved

    By mid-2008, we had moved to a multi-server model: one web server, one tracking server, and multiple database servers. But the one tracking server was no longer cutting it. We needed to load balance the tracking, which meant Spy needed to change to support that.

    When load balancing was completed, any incoming tracking hit was sent to a random tracking server. This meant that each dedicated tracking server came to store a subset of Spy data. Spy, in turn, had to open multiple sockets per request to extract and aggregate viewing data. Given X tracking servers, Spy had to open X sockets, and perform X^2 joins and sorts before emitting the usage data to the user's web browser.

    Initially the Spy data files were served via Apache from all of the tracking servers, but that tied up resources that were needed for tracking. Soon after this setup, our first employee (no longer with us) (Hi Andy!) wrote a tiny python script that ran as a daemon on each tracking server. It took over serving the Spy files for the PHP process on the web server that was requesting them. This helped free up precious HTTP processes for tracking, but there was still a lot of resources being wasted because of the storage method we were using.

    Tracking servers had no mechanism for granularly controlling the memory consumed by its Spy data. Therefore, each tracking server had a cron job that would indiscriminately trim its respective Spy data. Sometimes, however, this mechanism would grow out of control, so trims happened randomly when users viewed their Spy page. These competing mechanisms and their implementations presented an opportunity for datum linearity issues to arise occasionally.

    The number of websites we tracked continued rising, and we made several changes to this implementation to keep resource usage in check. Among them was the move to writing Spy data to shared memory (/dev/shm) instead of the hard drive. This helped a great deal initially, but as time wore on, it became clear that this was just not going to scale much further without a complete rethinking of Spy.




    Trimming the fat

    In the end, we decided to reimplement Spy's backend from the ground up. We devised a list of gripes that we had with Spy, and it looked like this:

    *N tracking servers meant N sockets opened/closed on every request for data
    *Data had to be further aggregated and then sorted on every request
    *Full list of Spy data for a site was transmitted on every request, which consumed massive bandwidth (over 5MB/sec per tracking server on internal network) and required lots of post-processing by PHP

    Our goals looked like this:

    *Improve throughput
    *Improve efficiency
    *Reduce network traffic
    *Reduce per-request and overall resource usage


    Enter Spyy

    The modern version of Spy, called "Spyy" to fit in with our internal naming conventions, is implemented in two custom daemons written in C. The first runs on each tracking server and dumbly forwards incoming Spy data to the second daemon, the Spy master. Data travels over a persistent socket established by ZeroMQ.

    The Spy master daemon stores the information efficiently for each site, using a circular buffer whose size is determined and periodically reevaluated by a site's traffic rate. This removes the need to manually trim any data from a site as we've been doing for years. Once the buffer is filled for a site, old data is automatically purged as new data arrives. When we read data from the new daemon, it gives us only the data we need (basically, since timestamp X) instead of "all data for site X", drastically reducing network bandwidth and post-processing needed to be done by PHP.

    Like the previous version of Spy, all data is stored in RAM to ensure peak performance. Because of this, however, the data is still volatile. We have other mechanisms for permanently storing tracking data, and Spy data excludes much information from what is permanently stored. This means that when we implement new features or apply bugfixes to Spyy, it must be killed and started anew.

    To mitigate this blip in the availability of data, we have implemented a rudimentary transfer mechanism between the existing Spy program and a new one coming online to take its place. This mechanism also uses ZeroMQ and, basically, drains datums from the existing process to the new process. At completion, the old process shuts itself down and the new process claims its occupied network interfaces.




    Conspycuous benefits

    We met our goals, and then some. The overall project took about 8 weeks.

    Benchmarks at the start of the project demonstrated the old Spy implementation barely able to keep up at peak load, tapping out at about 5,000 reqs/sec. In comparison, the new implementation can handle upwards of 50,000 reqs/sec on the same hardware.

    In the old architecture, read performance decreased as the number of tracking servers increased. In the new architecture, read performance is unaffected by the number of tracking servers, and is roughly 2N times better than the old architecture, assuming at least 1 tracking server N. Write performance in both cases is constant.

    RAM usage for storing Spy data was approximately 4GB per tracking server under the old architecture. At the time we had 5 tracking servers which meant 20GB total (we have 7 tracking servers now). New Spy, on the other hand, only uses up 6GB total RAM on a single virtual machine, and it takes up so little CPU power that the same server hardware also hosts one tracking server and four database servers without issue.

    Bandwidth wise, the old Spy used over 20MB/sec of combined bandwidth for read requests from the tracking servers. New Spy? About 500KB/sec on average, reducing network footprint to barely 1% of what it was before.

    In the event of a Spy master server outage, our tracking servers simply drop Spyy-bound packets automatically and continue persisting other data to disk, with absolutely no performance impact.

    Because of the way that ZeroMQ is implemented, we can scale this architecture very readily and rapidly. It removed a lot of application complexity and let us focus on the implementation. With ZeroMQ, business logic itself drives the network topology.

    Additionally, because of ZeroMQ, we can easily segment sites out onto different Spy "masters" with little change to the rest of Clicky, should the need or desire arise. In fact, we already do this because of some legacy white label customers (which we, of course, thank for the challenge provided by their existence in this implementation).

    As stated above, redundancy is not a goal of ours for Spy, because of the volatile and transient nature of its data. But if we ever change our mind, we can simply set up a ZeroMQ device between the tracking servers and Spy master, and have each datum be split off to N master servers.

    Overall, we are extremely happy with the improvements made. We are also very impressed with ZeroMQ. It has been getting quite a bit of hype recently, and in our opinion it lives up to it.
    8 comments |   Sep 20 2012 2:13pm

    Jigsaw company API integration is dead and we're looking for alternatives

    About two weeks ago, our integration with Jigsaw broke. Jigsaw is (was) a crowdsourced database of information on businesses, mainly US ones but some in Europe and other places too. It let us provide you with information like this.

    We looked into it and discovered that our API key got revoked. We did some digging and realized that Jigsaw had been bought out by SalesForce last year, and Salesforce has decided to kill free access to this API, which apparently became effective around September 1.

    The API is still available, if you're willing to pay. We are definitely willing to pay for this kind of data, but the price is $25,000/year. That works out to just over $2,000/month, which would make it our biggest monthly expense, other than payroll. Sorry, can't justify that.

    For now, we have modified the links that pulled in this data to instead just open up a Google search page with the organization name pre-filled. In some ways this is actually better because Google will almost always find the company in question, regardless of physical location, whereas with Jigsaw, when I wanted to look up info on a company it was only maybe 50% success rate. With Google you'll have to do a bit of work on your end to find the details that we were previously providing, but at least it's something.

    Alternatives?

    If you know of an alternative service that is reasonably priced and includes an API, please let us know. We haven't found anything worthwhile as of yet. A lot of these services seem more geared towards providing you with "leads" at these companies, rather than just information about the companies themselves, which is not what we're really interested in at the moment.
    11 comments |   Sep 17 2012 2:27pm

    We're moving!

    This Saturday, August 4, from approximately 2pm to 5pm PST (GMT -0700), our web site will be offline while we move our servers to a new data center in downtown Portland.

    We have carefully planned this over the last 3 weeks to ensure that tracking will still be online during this time (no data will be lost and there will be no impact on your site's performance), and that the move itself will be as fast as humanly possible. The new data center is already pre-railed and pre-wired with power and ethernet, so de-racking and re-racking will be extremely fast. So why will it take ~3 hours? Well, the old data center is about 90 miles away... :(

    New machines are already setup at the new data center to support tracking during the move. When the database servers get plugged in here, they will automatically start parsing the ~3 hour back log of traffic they will each have waiting for them on the tracking servers. It will take a good 3-6 hours from that point for all servers to catch back up with real time again.

    This is something we've wanted to do for a while but as we grew to over 50 physical servers, it became unfeasible. However, thanks to our full virtualization that was completed in June after many months of work, we are down to just 11 physicals! Suddenly this dream became a real possibility so we jumped at the opportunity to make it happen before we needed to add any more hardware to the rack.

    To say we're excited would be the understatement of the year. We've been with the same host as we've grown to enormous bandwidth over 6 years, so they've had to grow with us, which has been the cause of most of our major outages. The new data center is enterprise class with internet connectivity across 8 unique providers, so problems with internet connectivity should be near zero. Portland is much more major hub than where we were before, so connectivity should also be significantly faster, especially for those of you outside the US. Last, being 15 minutes away from our data center instead of 90 will be a very welcome change when we need to take a trip there.

    A lot of time has been spent over the last 4 months on backend/sysadmin work like this, which has interrupted our regular flow of feature releases. We'll be back to that very soon, don't worry.
    30 comments |   Jul 30 2012 12:44pm

    New custom data report

    If you log custom data with Clicky, you're probably going to like this new set of features. If you're not logging custom data, you should - it's one of our best features.

    Up until yesterday, for our own reports on getclicky.com we have only been logging usernames of those of you logged in to our site. This adds a lot of personality to the visitor reports. But a lot of you log a lot more types of data such as shopping cart information, account status, things like that. I've been wanting to add summary reports for this custom data for quite a while, not only for you, but also because there were other types of data we were interested in seeing about who is using our service on a day to day basis. There wasn't much point though since there was no way to see a summary of it. But now there is!

    If you log custom data, you will see a new item in the main tabs when viewing your site. And if you don't log custom data, this item doesn't show up. Here's what the main report looks like:




    As you can see, when someone is logged in to Clicky, we're now also tracking what type of account they have and how long they've been a member. And we added support for attaching goals to this data, so we can see what goals the different types of accounts are completing, as well as their revenue (hidden here).

    What I really love is that the sub-tabs for this Custom report are dynamically generated based on the different types of custom data you've been logging to your site. So you can click on any of those sub-tabs to see a report for just that family of data, or, you can just click the 'more...' link at the bottom of any family, just like other family style reports (browsers etc - speaking of which, some of those were broken because of horrible code, and have now been fixed as well as optimized to generate faster). (Note: Internally, data types that have "parents" and "chidlren" are called "families", in case you are confused).

    You can click any of the items in this report to immediately see all visitors with that custom data attached to them, and you can also click any "parent", for example "account type", to see all visitors who have any "account type" data attached to them, no matter what its value is.

    Not done yet!

    The new goal report we released about 4 months ago has been a big hit. We thought it would be pretty great to see custom data in this report too, so we added it: (Screenshot has been slightly modified to make it smaller)




    We also created a dashboard module for custom data, and the sub-tabs are dynamically generated just like they are for the main custom report:




    Last but not least, all of these items are graphable too. Just click the trend percentage next to any custom data in any report (main report, dashboard, goal report) to see its history over time. Of course, we've only been logging this data for our stats for about 24 hours so it's not terribly exciting yet:




    If you use custom data with Clicky, we think this will be a nice addition. And if you're not using it yet, you really ought to look into it. Full documentation is here.
    10 comments |   Jun 21 2012 1:35pm

    Next Page »




    Copyright © 2018, Roxr Software Ltd     Blog home   |   Clicky home   |   RSS