Is there potential for “ethical analytics”?


We have Piwik analytics on the site. But I use Ghostery so I actually block analytics like this for my own use.

As someone who is a big web fan, and used to work in client services, I understand the value of particular types of analytics for simple sites like ours:

  • Amount of hits (required to understand popular/unpopular content and make financial decisions for hosting etc)
  • Browser/browser size/operating system (required to know how to optimise for visitors)
  • Language spoken (required to work out how to best optimise for different languages)
  • Referring links (to track who is saying what about your site)

BUT, I think there are unnecessary metrics that can be on the invasive side for simple sites:

  • Internet Service Provider
  • Tracking of individuals

And some metrics I find misleading, as they’re based on pattern-matching and guessing:

  • Engagement time
  • Gender/Income Level/Age/Interests

Some of these are grey areas when it comes to tracking web apps, but I’d be interested to know where you all see boundaries? Are all analytics bad analytics? Is there room for “ethical analytics” that only tracks anonymous data with limited uses? Would there be a business in that?


This is a great idea (and question).

My interest is in how you define “tracking of individuals” – is this still invasive if it’s just an anonymised fingerprint? I mean, if you’re building any kind of ecommerce site then you’ll be interested in funnel progression, which means you need a way of identifiying the same user. Storing their actual details (name, email, ID etc) would be invasive IMO, but storing some sort of unique GUID (based on IP/browser?), would that still qualify as invasive?

Laura’s Diary — Thursday, 30th July, 2015

Could you do this in a way that was stored in isolation (not accessible to any third parties)? Would that make it so bad?

What if a system could be designed so that even if you stored data that was potentially valuable if exposed/combined, it would be impossible to do so? I see this as being a of service that could carry an acceptance mark with it. Like knowing you’re interacting with a site over SSL, you’d know your data was safer than with everyday analytics…


I like piwik and sendy analytics, but for me the main issue should be allowing a creator to know someone cared and to improve content for said audience, I use Ghostery too and whitelist where I feel happy, but this is purely on the basis that I want to tell content creator I’ve visited and if I sign up to a sendy email then I would like sender to be able to know if I looked at email and what content I like (aka click) After that it all becomes bad news normally .

I don’t think I give up privacy by showing my support via an open or a visit, but most of the web is not like that so I block.

For example I see Aral has worked hard to make sendy more private but I’d like you to know I opened email and I’d like you to know what I clicked as this will help build a better newsletter, now do you need to know it was me?.. Maybe not but then again gender might play a role, maybe you want to increase the number of female readers, some stats might help… Sure they become pigeon holes, and there are better ways to do this but… It gets complicated. Something like GoldieBlox was not made on a whim they talked to people and thus via permission first they gained insight to encourage Girl Engineers. Many things to consider.


Orde Saunders has just pointed me to Piwik’s Log Analytics, where you can import server logs and run the same analytics as you would, without necessarily requiring the JavaScript tracker.

I have split feelings about this. Because YAY one less JS tracker, and it thus supports donottrack and has IP anonymisation. BUT, it’s just a hidden tracker, really? At least you can see and block the JS trackers. Are server-side logs more restricted than client-side?

If any of you read last week’s roundup, you might’ve seen I linked to some thoughts by Charles Arthur on an “Adblocking Revolution.” I wonder if the natural response to this would be sites utilising (hidden) server-side logs…?


Yes, they are. This is also mentioned on the piwik website.


So here’s an interesting take: The Intercept have created a custom system with to “anonymise” their analytics data:

I wonder how effective it will be in its privacy (I imagine that it’s pretty good as it’s The Intercept—or is that naïve?) And I’ve asked on Twitter if will allow other sites to follow suit




I didn’t know there were so many trackers in email. Thanks for sharing @adamprocter

This again shows how urgently we need to start using indie tech and connecting to one another above the internet as individual nodes and not as exploitable tiny assets.


This is a really interesting discussion and one that is very important if we are to follow ethical design.

I am just in the process of launching a social enterprise startup and want to follow ethical design principle. Having analytics is crucial for us to understand our users behaviour so we can constantly improve and optimize the experience and delight we want to deliver. If I am tracking users via mixpanel (for instance) and there is a UUID attached but no link-ability to PII is this still aligned to the principles?


Short (and unsatisfying) answer: no. But that’s based on Mixpanel and similar services. Here’s my two reasons:

  1. Analytics services where your data is hosted on their servers (anything that isn’t self-hosted) or pinged back to their servers, are probably monetising that data in some way. At the worst, they already are, and at best, they have the ability to do so in the future.

  2. Sharing personally-identifyable information (PII) with a third party is infringing on your visitors’ privacy (whether that be via analytics or otherwise.) And is possibly a security risk. Any non-PII is likely not non-PII. (

I think if you’re self-hosting your analytics (using something like Piwik) then I think you’re one step better, but you need to ensure you’re still not collecting information you don’t absolutely need.

As a brief update on my initial post: I am no longer using Ghostery as it has proved that the real customers are the ad industry. (I use Better, obviously!) And we took Piwik off the site, as the analytics were not providing us with any particular value. We could still technically get some visitor stats from server logs, but we haven’t done so yet.


Thanks for the reply Laura. I get that minimal data should be collected.

Self-hosted Piwik is a good option ATM and if data is securely isolated and cant be linked then this is as close to the optimal solution as I can find. Building out custom privacy-by-design analytics tools is a stretch goal. Digital self sovereignty is a fundamental human right and I want to build a platform that truly respects this. If the apps our end users will interact with are running on the SAFEnet this goes a long way to ensuring privacy, security and data ownership. Advances in zero-knowledge proofs may enable what I am searching for but this is computationally demanding and still some time off. Appreciate your time and advice.