
An interview by Cristina Cochior & Ruben van de Ven with Francis Hunger on 18 September 2019 published on Plotting Data. As of late 2024 the interview is no longe available at its original website, so I republished it here.
C: How did you start researching data systems?
In my artistic research and artistic work I have been looking at how technology and society intertwine.
My personal background is that I grew up in East Germany. So in a (supposedly) socialist country. The more I did research about that, the more I began to wonder why the technological development, and also the technological culture in this socialist system was rather similar to the capitalist system. This was one of my starting points.
Another starting point is simply that while being an artist and writer, I’m also earning my living from programming databases. So I’m pretty deeply involved with them. And I have the feeling that I’m able to talk about it in a way that is technically informed, but also from a more politically and theoretically informed researcher perspective.
These have been the vantage points.
At some point I realized that many people talk about algorithms a lot. The algorithm is basically a rule describing how to calculate something. The algorithm might be important for, let’s say Google, to be able to have fast replies to searches. Therefore they would want to optimize their search algorithm. But Google, and everyone else really is not only concerned with the algorithm, there is more. A lot of work for instance is spent to optimize databases so that data is represented in a way that is easily searchable, or that a query can be executed on it. It is because of my experience from working for actual clients and doing the actual database programming myself, that I understood that the algorithm is not the most important concept, and there are other relevant concepts such as data, information model, or query to name a few.
R: In your text Epistemic Harvest you describe the database in great length and you identify some of the commonalities between a database and a dataset. Could you describe what constitutes for you a dataset, and how it relates to the database?
One thing that differentiates both is that, as I understand it, a dataset is some kind of structure that holds data while a database provides a software to work with this structure.
So basically the database provides the means to query the dataset.
Pretty much every dataset can be imported into a database system, although it may need some restructuring. That is often the work of people who deploy databases: they get an already existing dataset, for instance one or several Excel spreadsheets, then they look into how to restructure it, to make it fit into a database system which allows a different way of querying it and so a different way of generating new information compared to just the single, plain, table.
R: In your writing you speak of the trinity of the model-data-algorithm, could you explain how you see this trinity? Which role does ‘data’ play therein and how does this role of data relate to the database and the dataset?
The ‘trinity’ of model-data-algorithm is something that I came up with, but it’s not new.
As I mentioned, I struggle with the insistence on the algorithm in media theory.
Take for instance any kind of programming handbook for MySQL (and there are hundreds, if not thousands of them) and when you just go through the first 30 pages you notice it doesn’t talk about the algorithm. It speaks about the information model or the data model. That is: how to organize the data in a way that it can be queried.
The question of how to organize data touches the closed world assumption. The closed world assumption means that you assume that everything that is in the database is everything that you know, so you can make true or false statements. The contrary would be the open world assumption, which means that if there is no entry or anything that can be matched by the selected criteria, it cannot be assumed to be false. But the predicate logic of the relational calculus in relational databases needs a closed world. So the data model describes a closed world and gets kind of broken in what has been ignored and taken away.
So the ‘trinity’ of data, the model and the algorithm goes as follows: the algorithm kind of optimises how to query data, while the information model defines it. Because of that, the information model is, in my opinion, much more important than the algorithm because the data model defines what becomes data. Everything that is not described in the data model does not become data.
So if you imagine the dataset as a table, the names of the columns create the data model: they describe what you include and what you exclude.
We can now define data as something that is described in the information model and everything that is not described there is non-data. Its not existing for the database because it simply cannot be queried.
R: Thank you. In your work you describe how each database creates its own subject or dividual, through its structuring, through its columns. How do you see the way in which the dataset is created influencing the structuring of this dividual?
Well the person who creates the dataset usually has some kind of questions they want to answer, or some tasks that they want to solve. This leading question dictates which part of reality the person wants to include into a dataset and which part not.
In this way this person makes up a mental information model of reality because they decide what to include, and thereby what to measure. This is something we also know from physical experiments for instance, that I might be looking at the temperature and density of something, while ignoring other properties, like the colour, because for my question the colour is unimportant.
When we created this mental model and when we include or exclude certain parts of reality, already a decision has been made about how to address the subjects – given your data is related to subjects.
R: In your research you also speak of the expansion of data collection as the expansion of the production of transactional meta data and you reframe the leading assumption that surveillance would be the primary drive for data collection. You even go so far as to state the focus on surveillance as being quite a self-centered perspective. If it’s not surveillance, or privacy, could you expand on what then is at stake for data critique?
Currently this only exists as an outline on my blog, but I hope to develop this question further. However, this kind of critique has not only been developed by myself: just recently Evgeny Morozov has written a longer article critiquing Shoshana Zuboff’s notion of surveillance capitalism.
Also back into 1994, for instance Philip E. Agre – who is interesting because he also was a computer scientist who then began to write on a media theoretical questions – developed the notion of the capture instead of surveillance. Recently I came across an article by Till A. Heilmann (from 2015, in German) where he writes about data labour in capture capitalism and proposes to use the perspective of labour instead of surveillance.
The problem with the surveillance idea is that it always ends up in a kind of moralistic call towards companies “Oh my god I’m being surveilled it’s so bad, I can do nothing. It’s a horror!” It actually destroys the idea that you can do something about it and that you can act upon it.
Whereas, if you see it from the perspective of how labour goes into the production of data; and how subjectivity is captured or scraped through data and then made into something that has value (for instance in advertising), then you might be going to ask further questions that go more into the direction of how to regulate this, how to work through law (and not just through moral) to define a set of rules on how to behave or how to act within data space – and this means taking a political stance not just a moralistic towards the issue.
In the book Machine Learners (2017), Adrian MacKenzie explores this question precisely for the machine learning and Artificial Intelligence ecologies. He looks into how machine learning datasets are being created and how much labour goes into it.
Also artists – Adam Harvey for instance – loosely touch upon these subjects when they look into machine learning datasets like Labelled Faces In The Wild and they investigate from whom the data is taken, under which circumstances, and how those people have been compensated for providing their data. Usually not at all.
C: Highlighting the labour aspect is a really interesting point, as it immediately brings in the question of agency and reframing the discussion around labour might open the possibility for self-organisation. The workers could organise and make their own datasets, or make their own databases as a means of self-representing and also employing the entire economic structure that datasets pertain to.
The Data Workers Union by Manuel Beltrán is a union of those people who provide data. While it’s an artistic project, they are talking about things like data ownership, digital colonialism, taxes, data basic income and data exploitation. I think this is the way to actually look into these things and to get a critical grasp on it.
R: In your text Epistemic Harvest you describe how computational capital is only able to work “when humans produce expressions that can be made symbolic and processed”1. You see the computable as that without meaning, so how does the interface interfere or function in this process of meaning making?
Can you give an example?
C: For one workshop we were working with the Enron dataset. You may be familiar with it: it’s a natural language processing dataset that consists of thousands of emails between employees of the Enron corporation. Enron was an American energy company that went bankrupt in the early 2000s. We were interested in the influence that the dataset has had in the field of machine learning. There is a certain company culture that is present within this collection of emails which struck us as a workaholic attitude. So we were querying for example emails that are sent after 3:00 a.m. in the night, or emails that talk about the physical state of the person who was writing the the e-mail, in an attempt to create distinct narratives that might hint towards the larger ones that are at play within the entire collection. From there we created short theater plays in collaboration with another artist, which were enacted on the spot. So in that particular workshop you could say we used our own bodies as interfaces to reenact the content of the dataset. And after the workshop, we made a digital interface to enable more scripts to be made.
I think that’s a great project and a relevant approach to work artistically with these topics.
Since I can not elaborate on the Enron dataset, let me discuss the question of the interface in a more abstract way, using the example of the table. To my knowledge, only a few designers have written about tables and how they create and organize meaning. Regarding the table I only found one chapter or two in Edward Tufte’s writings and then there’s another designer called Stephen Few who has published the book Show Me the Numbers and has written about tables. Except these I haven’t seen much.
Tables are a cultural technique, which allow us to bring information into a formation. They create meaning through spatial differences. You have a spatial relation, which creates meaning. This is how interfaces work because they are able to integrate graphics and text and moving image. The positioning on the screen or on a print out then determines a certain order. Here, the Foucauldian notion of The Order of Things comes in and of course, with the order also comes a hierarchy.
What is important to understand about the table as an expression of – or an interface to – data is, that as it is first of all a construction, it can always be constructed differently. And it’s the designer who decides how the construction is made, what qualities are inherent.
One important quality is the sorting or order. In the European, modernist way of recognition, things that are on top are more important; they are of a higher order than other objects that are lower. So when we sort a table differently we create a different meaning with each sorting.
Another operation that’s important for generating meaning in this interface is grouping. The natural order usually refers to the order of how data has been recorded, which is in itself an unordered situation: it’s a logic based the entrance of data. This could be completely random or in the order of some transactions that have been recorded. If you group information, you understand things differently than if you just have the ‘natural’ order of the table.
The third operation is filtering out certain rows. That is, you take out certain data, because you decide these entities are not important to show in regards to the question that you want to answer.
It all comes down to spatial logic. I have already used a few notions here. Firstly, one can operate on tables. Furthermore, a table also represents a process: the process of being filled with data. So if you would have an empty cell, then this empty cell shows either two things: that there is no value, which means the value is zero or ‘nothing’, or it shows “please fill me in with data! Please, I need more data.”
C: That’s a good point, it would be useful to distinguish between ways in which data is already interfaced, and interfaces that we would like to imagine. That applies both to methods of ordering data and methods of presenting it. For us, one way of making the dataset more legible was to create these sub-groupings based on the queries we mentioned before. The order wasn’t specifically emphasized, but the fact that they existed as a sub collection within a larger collection gave some clarity to the material.
It is really important to look into these processes of ordering and sorting because there are a lot of subconscious decisions happening that influence how you represent and how you view data. They are cultural techniques and we often forget that we are actually doing it; we take it as a given.
For tables or for any kind of dataset, classification is a very important term because again, what appears in datasets – what is taken in – was decided already in advance. And often this is also a classification decision.
There’s this beautiful book by Jeffrey Bowker and Susan Leigh Star, Sorting Things Out – Classification and its Consequences (1999), in which they also explore the notion of classification.
R: Could you expand a bit more what kind of struggles you have and how this influences your art practice?
In media arts we have seen a lot of projects that take some sort of data and transfer it into another kind of interface. For example, you enter a room, some sensors record your movement and the light in this room changes. We have seen thousands of variations of this, some of them go by the notion of generative art.
I understand it is important to explore this interaction at some point – when you are studying media arts, or just when you enter the field, you have to understand these relations. It is fascinating to see how certain data can be brought into form aesthetically. However, I see limitations and a lot of repetition in this genre. The question is always, beyond the ritual spectacle, what can be experienced and learned from that?
This was the motivation to look for something different. I am not saying that I found some real solution, not at all – I am very uncertain and exploring. There is this notion by Bertolt Brecht who said “petroleum resists the five-act form”. He refers to the point that you have to make up characters who can then have a story and something happens to them, which can be dramatised. I went into using narration and looked into how we can talk about datasets and databases and about what they do, how they order things and how they create a discourse.
For Deep Love Algorithm (2013), I invented two characters – Jan and Magda – who would undergo certain transformations in their life. There is a love story between the two. At the same time, they are researching the question of databases. Their love is unfulfilled and their explorations and the promises of databases are also unfulfilled in a certain way.
Another thing that I did, were Database Dérives. With a group of ten interested people we walked through the city and tried to explore, just from looking around us, how can we identify that there is a database involved.
For instance, when we walked down the street in Berlin we saw that each tree has a small sign on it with the number printed on it. Very certainly there is a database: all those trees have to be in some list, in a cadastre. Then we came across a yellow postbox where you could put your letters. One participant explained that inside this postbox there is a bar code; whenever the post is picked up and brought to the central hub then the person who picks it up, scans the bar code and the container bar code. So these both can be connected to each other. Again, there is a database involved. Of course we also came across an A.T.M. for which we have to identify ourselves by entering a PIN number. Providing a login or providing a pin and identifying ourselves in interfaces is also very good suggestion that there may be a database behind it.
The Database Dérive was another kind of reaction – not so narrative, yet very communicative among the participants. It refers to Bowker and Star’s notion of infrastructural inversion, the idea to record the visible ends of an infrastructure. I would see the database as an infrastructure that is hidden from plain sight. By looking at what is still visible, you try to recognize the existence of this infrastructure.
R: I really like the notion of the infrastructural inversion as a way to describe that practice. The Data Walk initiated by Alison Powell is perhaps similar to what you described, only in her version you look for how data is being recorded. It inspired us organise the Data Flaneur: we also walked the street, however we took the position of dataset creators. We came up in groups with matters of concern and then set out to gather our data through walks.
C: Do you know the work of Mimi Onuoha on missing datasets?
No.
C: You might find it interesting. There is a GitHub page where she expands on the concepts, but one paragraph that really stood out to me and I’m reading it now: “The word missing is inherently normative. It implies both lack and an aught Something does not exist, but it should. That would should be somewhere is not in its expected place an established system is disrupted by a distinct absence. But just be some day type of data doesn’t exist doesn’t mean it’s missing. And the idea of missing data is inextricably tied to more expensive climate of inevitable and routine data collection.” I really like that she phrases this missing as being something that is normative. As soon as a gap is made visible through an empty cell, there is a need to fill it, like you said before. She argues that even if something might be missing in a table, that doesn’t mean that it’s missing completely or that it should be found.
Speaking from the practicalities of table making, empty datasets or ‘missing datasets’, start the process of generating new knowledge. They start the process of questioning: “Okay, why is that empty why isn’t there something? Let’s look into it and find new knowledge to fill it in.” At least in the classical table. It has been so.
But yeah maybe I’m missing the point of the problem of normativity.
C: This text exists in the context of a larger research, of which her art piece The Library of Missing Datasets is also part of. Here she lists titles of datasets that do not exist, for example ‘Poverty and employment statistics that include people who are behind bars’. What I get from her work is that data creates a framework through which you view the world. And with the entire framework missing…
…the question is missing.
C: Exactly!
This is the sadness of a whole area of engineering that almost always just looks into solving the specific task they were given; at the same time denying their responsibility for how they solve it, and also for looking further into what could be missing, or if there are there any other questions that need to be asked. So it is up to artists and designers to look into these voids. Although in my opinion engineers are in a much more powerful position compared to artists or designers to implement and to change things. So from this perspective I think the kind of work that Mimi Onuoha does is very important.
It looks like we as artists and designers have to do the work that engineers don’t do.