Algorithms and ghosts

Stuart Mills concludes his series on data policy theory, exploring simulcara, doppelgängers and how much we really exist in a virtual world.

In the final blogpost of this series, I’d like to talk about people.

The truth be told, I have struggled to write this blogpost. I was going to write something on cryptocurrency and simulacra, and how a false idea of money can become transformed into a very real value-system (e.g. bitcoin). To an extent, I still am, though I have decided to veer away from cryptocurrency and focus on what I think is more interesting – simulacra.

Simulacra is one of those fancy academic words which no one who is self-aware actually uses (evidently, I am not self-aware). The singular of simulacra is simulacrum, which the Cambridge University Dictionary defines as:

“something that looks like or represents something else.”

In my opinion, however, this definition misses something.

I first came across the term simulacra in Jean Baudrillard’s book Simulacra and Simulation (again, a self-aware person wouldn’t admit this), a book which heavily inspired the Matrix films. In the book, Baudrillard explores the idea of the sign, or how symbols such as brands take on substance in modern society. And for Baudrillard, simulacra do not just represent something else, they become the something else.

An example

Imagine a chair.

Now imagine a photograph of the same chair.

Now imagine showing that photograph to someone, pointing at it, and asking what it is.

Of course, the thing is a photograph, but your companion will probably tell you it’s a chair. When pressed, they will probably recognise the medium and tell you it’s a photograph, but in the moment, they stand a good chance of attributing to the simulacrum (i.e. the photograph) the chair’s reality.

This is a very simplified example – as we will see, recognising the simulacrum can be much more difficult.

You may be able to see how the idea of simulacra fits in with the premise of The Matrix, a movie where humans are deceived into believing the world they experience is real, when in fact it is a simulation which they have no control over. Humans in The Matrix see the simulated world (the simulacrum) and believe it to be the real world, rather than a (mis)representation of it.

I think there is a serious risk we do the same with personal data.

Image: a digital matrix by Gordon Johnson from Pixabay

The Doppelgänger

Another useful term to introduce may be that of doppelgänger. Doppelgänger is often taken to mean an identical copy of someone, but this is only a partial definition.

In a more complete sense, doppelgänger means an off or imperfect copy, or as the Merriam-Webster dictionary describes it, a “ghostly” counterpart.

In the modern digital age, I would contend we all have these ghostly counterparts, with each of us perhaps having multiple counterparts depending on the platform from which we are being presented (for instance, who is genuinely themselves on LinkedIn?).

For now, I want to consider only one doppelgänger – the ghostly counterpart which makes up our broad online persona. I also want to establish that I position the doppelgänger as a simulacrum of oneself, an idea I will expand on shortly.

I’m hardly the first to question the relationship between ourselves (whatever that means) and our digital counterpart (whatever that means).

Four dimensional?

The critic Laurence Scott considers this question in quite a poetic way in their book The Four-Dimensional Human. The general conclusion of Scott is not that this added dimensionality (i.e. the three dimensional ‘real’ world plus the fourth dimension of the ‘digital’ world) is necessarily good or bad (which, from my experience, is the question many people jump to), but rather that the addition of a digital persona extends the realm of the person.

Yes, you may be chatting on Facebook, but if that Facebook message allows you to have a conversation which your 3D self could never possibly engage with, isn’t this extending and affecting your 3D self? Furthermore, isn’t it kind of arrogant for theorists like me to build an artificial barrier between the online and the offline, and draw conclusions about the former without ever considering that those conclusions could apply to the latter too? After all, isn’t it your fingers, commanded by the neurons in your brain, which type out and send that Facebook message?

For some balance, but not to dispute Scott’s perspective, consider the use of photoshop on Instagram. It’s quite a well-known secret that many influencers on Instagram use photoshop to adjust their appearance.

In this sense, their online persona is extending the realm of their 3D self – after all, they don’t look quite so shapely in real-life. But this is also a doppelgänger; only in the digital world does this person resemble the photoshopped character.

This is quickly veering into that discussion of good/bad which I wanted to avoid, so let me just say this: the person sending that Facebook message is also embracing a doppelgänger. They are not saying anything to anybody, merely typing onto a screen, and nothing is being said to them either, and they are simply reading. Asked who they are talking to, they may give the name of their friend, but this is false – they are talking to no one.

Is it a photograph, or is it a chair…

To an extent, I think this discussion is moot. Technology changes the relations between humans, and that is considered true by many people, from avowed Marxists to cyberneticists to Silicon Valley CEOs (though, as Wendy Liu has pointed out, the interpretations of this idea can be very different).

Furthermore, it’s important to ask: who cares?

I have sympathy with this argument too, because at a certain point the conflict between saying you’re speaking to someone but actually typing can be considered semantic.

But often it’s not semantic, because – unlike what Scott’s conception would imply – our digital doppelgängers do not follow us around. They instead live on servers as data, and we find them whenever we log onto Facebook or watch a video on YouTube. This is important, because while we move around, our doppelgängers say in the same place, which is in turn quite useful for those who want to understand us.

Or one dimensional?

A second text to consider, continuing the dimensionality theme, is Herbert Marcuse’s One-Dimensional Man.

In this book, Marcuse argues that both American style capitalist production and Soviet style socialist production ultimately reduced the human-being on the factory line to a mere cog in a machine. Whereas Scott sees the digital world as an extension of humanity, following Marcuse, we might also see the digital world as a reduction of humanity.

Consider targeted advertising.

Targeted advertising on a site such as Facebook works by running various data about ‘you’ through an algorithm to determine what advertisements you are most likely to be interested in (I acknowledge I’ve used loaded language here; Facebook’s appeal to your interests may not be the most correct way of describing what they do).

Often, things ‘you’ have ‘liked’ are used. Of course, there’s no denying you’ve liked certain brands, but only in the very specific sense that you’ve clicked a ‘like’ button on Facebook. How much you ‘like’ a brand compared to someone else becomes reduced to either a 1 (like) or a 0 (no like), and things move from there.

The point of this explanation is that it is not you, but Facebook’s assembled version of you (i.e. the doppelgänger) which they point at and say is you. The doppelgänger becomes a simulacrum.

Pragmatism

I often describe myself as a pragmatist, in the philosophical sense. Sometimes things just need to get done, and while ontological (what is real) and epistemological (what is known) debate can be valid and interesting, if there’s a deadline sometimes you just have to work with what you’ve got. This is why I’m sympathetic with the temptation of describing a person’s digital persona as that person.

But pragmatism can also be very dangerous.

As Joy Boulamwini and Timnit Gebru have reported, algorithms which are only shown the doppelgänger can produce disastrous social results, with machine learning systems often reinforcing gender and racial biases. This is not wholly the doppelgänger’s fault – the person who collected the data was clearly unaware of the bias they were also embedding into the data. But given this oversight, it reveals the pressing danger of a pragmatic view which says the doppelgänger is good enough – sometimes, it really isn’t.

Furthermore, consider a very recent study by Anastasia Kozyreva and colleagues. In their survey of user attitudes to personalisation algorithms (e.g. targeted advertising), users were generally accepting of these algorithms for services such as shopping or entertainment.

However, personalisation was not welcome in areas considered in the public domain, such as political advertising.

And if you will allow me some self-promotion briefly, in a paper I have under review, I have speculated that personalisation may be acceptable in individual-level decisions, but unacceptable when multiple people are involved in decision-making.

This reveals another problem with the doppelgänger;

they are always individuals.

But humans aren’t just individuals – we shape and are shaped by each other. Indeed, academics would call the problems I’ve described (those of social bias and social concern) social intelligence, a kind of intelligence which is (in my opinion) frequently missing from discussions of data-driven technologies such as machine learning and AI.

A final thought

To conclude, then, personal data often form into imperfect copies of ourselves – doppelgängers.

Those that use data often point to these doppelgängers and treat them as if they are accurate representations of us. In this sense, doppelgängers are also simulacra. This method can, from a pragmatic perspective, be useful, and from a social perspective, we should not be so quick to write off the ability of data to extend the realm of our own being. Equally, the doppelgänger is imperfect, and these imperfections can manifest as serious social harms.

I suppose, if there’s a lesson I wish to convey, it is this:

the person is always more than the data would suggest.

And a Note on The End

As I stated at the start of this blog post, this is the final one in this series. I personally try not to make too much fuss about endings, but I have been encouraged to write something. Furthermore, I can see the benefit to the reader of trying to tie a thread through this series together.

Human beings are not data. I think this blogpost makes that clear.

But the relationship between humans and data is difficult, and so-long as some are inclined to think of data and the humans they affect as one in the same, there should probably be an ambition to do something about it.

Just as news websites manipulate algorithms by doing ‘top 10s’ or ‘5 amazing things you won’t believe’, I figure I may as well offer 3 points to think about.

1.

Firstly, data do not just exist.

Or, at least, social data do not just exist.

Climate temperature measures, half-life decay, information and quantum physics; there are data here which we could feasibly argue exist without human beings. But not convincingly.

All data suffer from the question of trees and forests and the absence of ears.

So the first point to make: human beings choose what to measure, and thus choose what become data. We, as a society, have agency and exercise over data, and where data are detrimental or unhelpful or problematic in some sense of the word, we can always choose differently.

2.

There are many human creations which, either through a lack of will or the stories we subject ourselves to, we believe we cannot change.

The idea of “it has always been like that” gives way to the cynical argument “it will always be like that” and even the dangerous argument “it should always be like that.”

The digital economy and our relations with data are one of those creations where there is a risk we convince ourselves there is nothing we can do. That despite people like me arguing that data are a product of human choices and that we can choose differently, we won’t.

But in this series, we have seen that alternative forms of ownership of data, and ultimately arrangements of human relations, are possible. I increasingly think fighting for these possibilities is one of our most pertinent challenges.

3.

Finally, with my public policy head on, I think we need to think seriously about our use of data in public life.

This could take multiple strands, but a couple I think are good to pick up on:

we need legislators who understand technology;
we need data scientists who understand the impact of what they’re doing.

Data is a part of the future of public policy, if for no other reason than if machine learning or artificial intelligence really can produce the benefits that are claimed, an government which does not have sufficient data resources will become the beggar to the tech company that does.

But assuming we can acquire those data resources, and have them more or less in the hands of the public, we must still recognise the importance of spatiality in control. The person at the computer, pressing the buttons to affect change in the country, for instance, has more control than the person at the ballot box, voting for the politician who hires the person at the controls.

Therefore, at each step we need to ensure competency, with politicians understanding how data are and could be used, data scientists understanding the same, and the public having the capacities to object to either.

When the internet started to take off 25 years ago, there was this notion that it could be whatever you made of it.

For the most part, this is still possible, if we want it to be.

About the author

Stuart Mills is a Fellow in the Department of Psychological and Behavioural Science at the London School of Economics. He has recently completed a PhD in personalised nudging and political decision-making. His research focuses on behavioural science, behavioural public policy, data ownership and digital economy.

Follow Stuart on Twitter.

Read Stuart’s other posts in this series.

Data Impact blog