GDPR – 8 things you can do

It’s not long now until General data Protection Regulation (GDPR) comes into effect on the 25th May 2018.

Many businesses are still unsure exactly what they need to do (and in many cases there is still a lot of debate about what specifically needs to be done), but the one thing that is certain is that you should do something – doing nothing is simply not an option.

What is GDPR?

The GDPR is new set of European data protection regulations that replace the 1995 data protection directive. The GDPR is designed to give give greater protection and rights to individuals and “harmonise” European data privacy laws. The GDPR means that organisations that handle personal data will have to make substantial changes.

The main source of information in the UK is the Information Commissioner’s Office https://ico.org.uk – this website should contain all the guidance you need, but it is complex.

Every organisation is a Data Controller (many will also be Data Processors in addition to being Data Controllers), as such you are responsible for keeping all the data you have safe and secure, and for ensuring that you aren’t keeping more data than you need to have and are allowed to have. It may well be impossible for any organisation to be fully compliant, but your organisation still need to try and be as compliant as it can be and the best way to this is to approach GDPR honestly and transparently.

What can you do?

Every organisation uses and stores personal data in a different way, so every business will need to make changes unique to that business, but here is quick checklist of 8 steps to get started:

1. Get an overview

Set aside an afternoon and visit the ICO website’s GDPR section to get an overview https://ico.org.uk/for-organisations/guide-to-the-general-data-protection-regulation-gdpr/

2. Questionnaire 

Next complete this online questionnaire: https://ico.org.uk/for-organisations/resources-and-support/data-protection-self-assessment/data-controllers/ – this will give you understanding of your current position and will help identify what you need to you.

3. Audit your data

You need to understand what data you have – this includes customer data and employee data https://ico.org.uk/for-organisations/guide-to-the-general-data-protection-regulation-gdpr/accountability-and-governance/documentation/

On this page there is an Excel file with examples that you can use to audit you data: https://ico.org.uk/media/for-organisations/documents/2172937/gdpr-documentation-controller-template.xlsx (don’t worry it has examples)

4. Data Protection Impact Assessments (DPIAs)

https://ico.org.uk/for-organisations/guide-to-the-general-data-protection-regulation-gdpr/accountability-and-governance/data-protection-impact-assessments/

You don’t necessarily need to carry out Data Protection Impact Assessments, but you won’t do any harm if you do. DPIAs will help you understand and document the personal data you control https://ico.org.uk/media/about-the-ico/consultations/2258461/dpia-template-v04-post-comms-review-20180308.pdf

It is probably a good idea to complete impact assessments for internal systems and processes such as Payroll, HR, CRM systems etc. 

If you use other companies to process data (for example managing a website for you) you should ask them for DPIAs that detail what data they process for you and to ensure that is safe and secure.

5. Written agreements

You are supposed to get written agreements from Data Processors (other companies that use your data on your behalf e.g. your accountant). These agreements state they will only use the data you control for the purposes you have employed them to, and that they will keep the data safe and secure. This agreement may already be covered by existing contracts, but if not you should seek an additional agreement covering the new regulations. These agreements are known as  ‘Controller Processor Agreements’ or ‘Data Processing Addendums’

6. Security

An important aspect of GDPR is security. GDPR covers all kinds of records both physical (e.g. paper) and electronic. Offices should be secure, filing cabinets kept locked etc. A good place to start  in terms of electronic security is Cyber Essentials https://www.cyberessentials.ncsc.gov.uk/, a government backed scheme to help organisation protect themselves from cyber attacks. The scheme will guide you through the basics like making sure you have anti-virus software installed, and it is possible to get an official certification too.

7. Privacy Policy

GDPR is about transparency and letting people know what data about them is being used and why. Again transparency is key. For example, if you have a contact form on a website explain that you will only use the data an individual submits to help with that individual’s enquiry, that you will not share that information, or use that information for another purpose (and of course make sure that your organisation doesn’t use personal information in ways it does not have permission to).

If you are thorough and use plain english, you will end up with a privacy policy that is good for users and helps your organisation think about how it handles data.

8. Audit trail 

Keep a records of everything GDPR related you do, an important part of GDPR is the process of understanding and auditing your data, therefore it is important to record the process itself.

Photo by Rúben Marques on Unsplash

Please Feed the Spiders

I think we all forget just how amazing the Internet is. In the developed world instant access to information is almost ubiquitous and it is easy to take it for granted.

On the 6th August 1991 Tim Berners-Lee made the World Wide Web publicly available, since then it has transformed the way we live and work, but an ocean of information isn’t much use without a way to discover information. Search engines like Google and Bing are the tools that have helped to tame this ocean of information and help us find the nuggets of information that we actually want.

Spiders and The Long Tail

Take a look at the log files of a web server you’ll see vast numbers of requests made by spiders and bots as they traverse sites following links and indexing the information that they find on each page.

Every website owner wants to drive visitors to their website, and the best way to ensure visitors is to make sure that your site is regularly indexed by search engines. If a page doesn’t get visited by a search engine spider and its content doesn’t get indexed, the page might as well not exist. If a tree falls in a forest and no one is around to hear it, does it make a sound?

Search engines make sense of the huge amounts of unstructured content that makes up the majority of the internet. In an effort to extract and organise more information, standards like schema.org are becoming more and more important – it is in the interests of website owners to apply these standards on their websites because at the end of the day they want traffic.

Most people working with the web are aware of the idea, popularised by Chris Anderson in his 2006 book ‘The Long Tail’, that there are a lot of business opportunities serving the needs of people with minority interests or looking for niche products in the tail. About half the world’s population now have some access to the internet, so even a niche interest presents vast opportunities.

The problem is that in order to make use of the long tail, your information needs to be found – in the internet age the gatekeepers of knowledge are the big search engines like Google and Bing. Search engines provide their services to us, the end users for free, but these services are not free, they cost an enormous amount of money to operate and are ultimately funded by advertising.

Spidering and then indexing a page costs money. It takes time and it uses energy, in fact it’s estimated that in the US data centres account for  approximately 2% of energy usage. The millions of servers owned Google and Microsoft need to be built and managed, as do the servers of the websites being spidered. There is bandwidth to be paid for… the list goes on and on. It’s hard to get an exact cost but we can get glimpses:

In terms of greenhouse gases, one Google search is equivalent to about 0.2 grams of CO2. The current EU standard for tailpipe emissions calls for 140 grams of CO2 per kilometer driven, but most cars don’t reach that level yet. Thus, the average car driven for one kilometer (0.6 miles for those in the U.S.) produces as many greenhouse gases as a thousand Google searches.

Because searching and indexing does cost money, not all content actually does get indexed – it is clearly in the interest of a search engine to focus on the most popular content, the content that will appear in the most searches and generate the most advertising revenue, so pages in the long tail are less likely to be indexed or are indexed less frequently.

As an internet user I want searches that return exactly what I’m looking for. As somebody responsible for websites I want my end users to find exactly what they are looking for as easily as possible.

Spidering is a mechanism to make sense of large amounts unstructured data but it doesn’t work so well with highly structured data with lots of variations and filters.

The problem with filters

This is not a problem about having too little information, the problem is that there is too much to organise.

It is easy to build a navigation structure that will generate pages for as many variations and filters as necessary, and of course a dynamically generated web page doesn’t use any resources until it is requested, the trouble is spiders will not index all your pages. For want of a better way to describe it, spiders get bored and you run into esoteric issues like Crawl Budgets, Faceted Navigation and Infinite spaces – all clever ways of acknowledging that indexing costs money, but the practical manifestation of this is that filters often get ignored.

It’s easy to demonstrate just how quickly dynamically generated pages add up and therefore why this is a problem.

Imagine a website with a search facility that allows you filter your results. A good example is a website that relates to the physical world and which contains information about real things with attributes or properties – the facilities at a leisure centre or the type of food served in restaurant.

You can choose to filter by any, all or none of these properties. Each of these different filters is a landing page and would ideally be the page that appears at the top of the Google or Bing’s search results when somebody searches for something specific like ‘Vegan restaurant near Luton’, for example. At the moment sometimes landing pages will have been indexed and sometimes not – the experience can be a bit hit and miss. The point is, if an end user is looking for something specific (rather than just restaurants), then just providing a list of restaurants is an unsatisfying experience.

Of course it is easy enough to implement filtering on your own website and there are some great examples like the pub search facility on Useyourlocal.com, but how much better would it be to always have the opportunity to go directly to an exact landing page from the search engine results page?

Pascal’s Triangle and calculating combinations

Let’s think about how quickly combinations grow in size. If we have 2 properties to filter by [ a ] and [ b ] there are 4 possible combinations:

[  ] (no filters)

[ a ] (just a)

[ b ] (just b)

[ a b ] (a and b)

If we have 3 properties there are 8 possible combinations:

[  ]

[ a ]

[ b ]

[ c ]

[ a b ]

[ a c ]

[ b c ]

[ a b c ]

If we have 4 properties we get 16 possible combinations:

[  ]

[ a ]

[ b ]

[ c ]

[ d ]

[ a b ]

[ a c ]

[ a d ]

[ b c ]

[ b d ]

[ c d ]

[ a b c ]

[ a b d ]

[ a c d ]

[ b c d ]

[ a b c d ]

Each time we add another property, the number of possible combinations doubles, so if we filter by 6 properties = 64 possible combinations, 8 properties = 256 combinations etc.

Rather than typing out all the possible combinations you can use Pascal’s Triangle.

Item count 0 1 2 3 4 5 6 7 8 Total
Row
0 1 1
1 1 1 2
2 1 2 1 4
3 1 3 3 1 8
4 1 4 6 4 1 16
5 1 5 10 10 5 1 32
6 1 6 15 20 15 6 1 64
7 1 7 21 35 35 21 7 1 128
8 1 8 28 56 70 56 28 8 1 256

To calculate the possible number of non repeating combinations of items.

  • On the Y axis, go to the row equivalent to the number of items you have (the first row of the triangle is 0)
  • The X axis displays the number of ways you can combine these items – again the first number is 0

So confirming what we previously worked out, the number of unique combinations of 3 items:

1 + 3 + 3 + 1 = 8

1 combination of 0 items [ ]

3 combinations of 1 item [ a ] [ b ] [ c ]

3 combinations of 2 items [ a b ] [ a c ] [ b c ]

1 combination of 3 items [ a b c ]

As you can see the number of possible pages soon becomes vast.

Imagine you have a website for cafes and restaurants. There are approximately 45,000 town and villages in the UK. For the sake of argument let’s say that 20,000 of those have a cafe or restaurant. Imagine that we store information about the following properties or facilities and allow users to filter by them:

[ a ] Free parking

[ b ] Air conditioning

[ c ] Vegan

[ d ] Organic

[ e ] Fair trade

[ f ] Gluten Free

For 6 filters, we need to look at row 6: 1 + 6 + 15 + 20 + 15 + 6 + 1 = 64

We have 64 possible combinations of filters.

We can easily create a system that allows us to filter by location and optionally by one or properties (e.g. Free Parking). With 6 properties and 20,000 locations we now have 64 x 20,000 = 1,280,000 possible location filter pages (excluding pagination and 20,000 potential detail pages themselves). It is unlikely that all these pages will ever be indexed and if they are it certainly won’t be regularly.

If we decide to restrain ourselves to only a single filter, and then look at row 6, we can see that the new number of possible combinations is 1 + 6 = 7 (1 combination of no filter chosen, and 6 possible combinations of just a single filter). This is much more manageable but still results in 140,000 landing pages.

It is easy to say that this is a contrived example, and to an extent it is – but if that information is available, why shouldn’t it be possible to have that information appear directly in search results?

Even if a spider indexed 1.3 million pages it would use a lot of resources (time and energy), a lot of bandwidth would be needlessly consumed (and paid for) – the average web-page today is now comfortably over 2MB. On the other hand you would be improving the experience of many, many users – individually each search result may only be of interest to a handful of people but in the vast spaces of the long tail the small numbers add up and soon become huge.

A solution. Don’t pull, push

Instead of waiting for a spider to come along and slowly crawl millions of pages for highly structured data it would be far more efficient to either be able push the data to search engines or to be able to supply it on demand in a single chunk of data.

The data for all 20,000 of our hypothetical cafes and restaurants, including addresses and descriptions, could probably, in a text format like JSON, CSV or XML fit into a single 2MB file.

In order to implement this we need three things, firstly our structured data, secondly a schema for the data that defines how to construct the URL for any possible landing page and thirdly how to get the data to the search engine.

The necessary technology already exists and there are precedents for this push approach already. From time to time we hear about deals like the 2015 deal between Google and Twitter that enabled tweets to instantly appear in Google’s Search results. Far more common are examples like Google Product Feeds that let shop owners upload or provide feeds of product data for Google Shopping searches.

Google search already makes use of structured data in searches (but the data is all harvested by Googlebot) and with the Sitelinks Searchbox can send searches directly to search results pages on websites. The OpenAPI / Swagger specification is used to define REST APIs, but could equally be used to define the filters and URL structure of a search results page.

So, when shall we get started?

This article was originally published at www.ab-uk.com

Photo by rawpixel.com on Unsplash

We all make mistakes

People make mistakes.

Anybody who claims that they never make a mistake is not being honest with you or themselves. Sometimes mistakes matter and can have devastating consequences – in a hospital, onboard an aeroplane, at your bank. Mistakes aren’t all bad though, we really do learn by making mistakes. Sometimes a mistake is simply unimportant, and mistakes (or the willingness to make mistakes) are an intrinsic part of creative thinking.

What we really want to do is to stop mistakes happening in the wrong place.

Mistakes happen for many reasons and it is easy to blame individuals when mistakes happen, but more often than not we shouldn’t – instead we should examine the processes that allowed mistakes to happen. People often make mistakes doing boring, repetitive tasks – tasks that we are ill suited to from a psychological and evolutionary point of view.

The best way to avoid making a mistake is to design the mistake out of the process and change the situation or process so that the mistake cannot happen.  You never hear about Window cleaners falling off ladders these days because they don’t use ladders, they use telescopic brushes instead.

If you can’t change your process, then the next best things is to automate a solution. Take people out of the loop and let them get on with something else. Fortunately boring, repetitive tasks tend to be exactly the sort of tasks that can be automated.

Writing software, building applications and websites, managing databases, configuring servers – these are part of the life of a digital agency. They involve creativity and problem solving, they can be hard work, but they are also interesting and often fun. Unfortunately there is also a great deal of boring, frequently complex work that has to be done on a regular basis without any mistakes; good examples are building assets, deploying code and updating software.

But what about checklists?

Checklists also help avoid problems and mistakes and they have a really important role to play, but they aren’t a panacea – checklists can easily create a false sense of security. A checklist takes time to complete and it’s also possible to make mistakes when completing a checklist. A good checklist for a non trivial process is actually surprisingly hard to write and checklists themselves need to be tested to ensure that they work. Perhaps worst of all people lose respect for a bad checklist and can end up not completing it properly.

Ideally a good checklist should be short and focussed, unfortunately checklists often consist of checks like ensuring that Google Analytics code has been added or that Favicons exist. Ideally these shouldn’t even be in a checklist, a better approach is to use build tools to do this basic setup and then use automated testing to confirm that your setup is correct.

Luckily for us these days there are a multitude of software solutions designed specifically to help us automate the vital but boring work away. Every business has different processes and workflows but there will almost certainly be a set of tools that will work for it. Tools like Gulp, Grunt, and Webpack allow us to build our front end assets – concatenating, minifying and versioning our files (we like Gulp and Assetic). Vagrant allows us to easily create and share reproducible development environments (and gives us the ability to make these development environments as arbitrarily close to production as we wish). Puppet, Chef, Saltstack, Ansible and Fabric can all be used to configure servers and deploy code (we like Puppet and Fabric).

Automation isn’t free, it requires investment in skills and development/configuration time but there is a clear business case, and it isn’t all or nothing – even a small amount of automation can have a positive impact, reducing time spent fixing mistakes and allowing your team to work more efficiently.

This article was originally published at www.ab-uk.com

If you gotta ask, you ain’t never gonna know

I headed to London for my Curiosity day and saw works by three quite different artists – the only link being that the art exhibitions fitted in with my schedule.

The three artists each asked questions about the nature of contemporary art and what it mean to be an artist.

José Damasceno – Plot – Holborn Library London

A site specific installation at Holborn library, this was interesting but slightly frustrating. The ‘art’ here came from the totality of the experience – sounds, images, objects, smells, even the locals just using the library.

Anselm Keifer – Retrospective – Royal Academy, London

Whether or not you like the actual ‘art’, it’s hard to ignore a canvas the size of a house. Not one to shy away from serious themes, Anselm Kiefer grapples with Teutonic mythology and legacy of Nazism. This show left me exhilarated and exhausted.

Richard Tuttle – I Don’t Know . The Weave of Textile Language – Tate Modern, London

Is it an aeroplane or a fabric spaceship? Maybe a giant fabric replica of an outboard engine? This was a beautiful, contemplative object that filled the Tate’s turbine hall and demanded that you just sit and look and think about light, space, colour, the stuff of the world, things rarely actually notice.

Back in Exeter I gave my presentation and inevitably we started to talk about those thorny old questions: ‘What is art?’ and ‘but a child could have made this…’

So how do I know if it is art?

Allegedly, when Louis Armstrong was asked what jazz was, he replied: “If you gotta ask, you ain’t never going to know.” whether or not its true, its a good anecdote that applies equally to art. Unfortunately if you ‘gotta ask’, its not very helpful. Luckily for folks who still don’t ‘know’, Grayson Perry has provided an easy to use checklist in his Reith Lectures to help confirm if you are in fact looking at art.

Check that the potential artwork passes as many of the following simple tests:

  • Is it in a gallery or an art context?
  • Is it a boring version of something else?
  • Was it made by an artist?
  • If it is a photograph, is it bigger than two metres and is it priced higher than five figures?
  • Is it part of a limited edition?
  • Is it being looked at by ladies with expensive handbags or hipsters?
  • If the art was on a rubbish dump, would people say, what’s that artwork doing on that rubbish dump?
  • If it is computer art does it have the ‘grip of porn without the possibility of consummation or a happy ending’?

Be careful though, most art galleries have benches, but benches aren’t usually artworks despite being in art galleries.

http://downloads.bbc.co.uk/radio4/transcripts/reith-lecture2-liverpool.pdf

And what about your talented child? 

Can your child (or mine) make art? Yes, everybody can make art.

Is your child’s art any good. Maybe, probably not, but the important part is that you like it.

Is your child an artist? Perhaps. But just because you are an artist it doesn’t mean you are a good artist.

Can you get your child’s art in a gallery? Sure, take a leaf out Banksy’s book and sneak it in when nobody is looking.

Don’t get too hung up on questions though, just enjoy it (whatever it is).