Siri, What Do You Think of Me?

Some questions to yell down the data mine

“You haven’t seen the new Viagra ads?! They’re ridiculous.”

I have a couple of friends over for dinner, and after a bottle of wine sinks in, our conversation turns to recent highlights in bizarre video advertising (ads: they kinda work!). I wake up my computer to correct their ignorance of the absurd woman-lustily-tossing-a-football-and-applying-perfume spot.

I instinctively open an Incognito tab in Chrome.

Why am I secretly viewing something I am about to show so publicly? Well … I don’t want Facebook to start serving me Viagra ads unbidden or Google Now to helpfully insert facts about impotence, it’s symptoms and treatment, on my phone’s home screen.

Where I’m going, I don’t want the algorithms to follow.

For me, and I suspect many others, going into the shadows like this has become second nature. I open up a private tab whenever I’m doing something a little off my personal brand, something I wouldn’t want Siri to throw back in my face. I hide from the algorithmic version of myself.¹

“Suppose you’re doing something online, and you don’t want your coworkers know about it. An example scenario would be looking for a new employer while at work! One option would be to do your work, and then clear the data that Firefox has stored for you, such as history, cookies, cache, …. But the problem is that this action will also remove the parts of your online activities data which you don’t want to hide, so the history that Firefox records can no longer be used to find a web site you had visited a month before. Private Browsing will help you here.”

With the unassuming explanation above, Ehsan Akhgari, an engineer working on then-dominant browser Firefox, introduced private browsing to the world in December 2008. It was three years after the founding of YouTube, yet the language reads today as Lassie-level quaint. He imagines a snooping coworker, an untrusting spouse — real live humans who might catch a detail you don’t want them to see.

Akhgari couldn’t have anticipated how much of our daily experience is delivered by today’s data-insatiable products. Seven years later, I use private browsing not to avoid prying human eyes, but to hold back data points from a machine. I use these features to stop the accretion of every term searched, every profile clicked, every video viewed, every blog to the platform’s representation of me.

I’m not hiding from my coworkers when I go Incognito, I just don’t want whatever silly thing I tap to change who Google thinks I am. I’m limply asserting the right to define how the machine sees me, not climbing the high horse of some Victorian notion of “privacy.”

ME: What’s with these weird Bluetooth-enabled bike shorts ads everywhere?
ALGO_ME: I clicked a Facebook ad for a smart workout shirt in late 2013! It was pretty neat, I read the marketing material for 4.6 minutes.
ME: Okay, but, maybe it was just because the guy looked fit and that was a shitty period in my personal exercise hab —
ALGO_ME: Great, I love exercise! Going to go ahead and up my score in “weight consciousness & personal wellness” to a 7.8/10
ME: Weight loss?? That’s so not my —
ALGO_ME: Look at this amazing ultra-high-protein nootropic appetite suppressant gel! Great design too. It’s kind of a hipper Soylent. I spent 66.8 minutes reading about Soylent in 2014, and even commented with 80%-likely positive sentiment about it on a friend’s post. It’s perfect.
ME: That was … opposition research … and sarcasm? Fine, fine, show me the gel.

Machine learning products are ubiquitous because they’re extremely useful. Google Now’s ravenous ingestion of all of my email in exchange for notifications of delayed flights and social engagements is … kinda great. More subtly, I would miss many births, promotions, illnesses, and other life events of all but my closest friends if Facebook didn’t have a keen sense that these milestones are why I look at News Feed. Not to mention Foursquare’s making me seem hipper than I am choosing restaurants on vacation, thanks to its knowledge of every business I have stepped foot in for six years.

You would be forgiven for believing companies are willing to use as much data as they can get away with, adding your every click to the model of your algorithmic self. The origins of real algorithmic products are much more haphazard. I know this because I’ve had a hand in making several of them.

Startups and even massive platforms develop features so provisionally, there’s often no notion that the data byproducts of a feature will ever be used in data mining. At Tumblr, the ♥ button existed for two years before we fed all those heart-ings into an algorithm to generate a (helpful)! list of “Blogs You Might Like.”

Don’t worry, this is here purely for visual interest. No linear algebra lesson today.

As an engineer setting out to create a feature like recommended blogs — a feature I think you’ll really like— I go on a fishing expedition through your actions. I pull in data about what you tapped when, who you follow, who the people you follow follow, how often you login, and on and on. I assign numeric values to each of your acts and relationships.²

Your past and future life on the product becomes a many-many-dimensional matrix. This matrix is nameless. To me, it is a banal, temporary set of mathematical facts to be compared to others’ matrices to generate a useful recommendation: a blog to follow. But this matrix is you.

To my fellow technologists, anthropomorphizing the math this way may seem ridiculous.³ But these algorithm-friendly representations decide what millions of people read, buy, eat, and watch every day. The algorithmic self is only inert and abstract because we technologists choose to keep it hidden in the machine.

Could we name these abstractions and let you see them? We squirm. The sum total of your affinities won’t be flattering. The categories you float through will be abstract and hard to name. But this is a discomfort worth diving into headfirst.

We created a Frankenstein double of you, we could at least finish the job and animate him.

Facebook M Assistant Profile: MATTHEW L HACKETT (preferred name: MATT)

Probabilities (in %) based on volunteered data and past interactions

Demographics:

Male-identified (98%)
Single (81%)
Homosexual (91%)
Computer programmer (92%)
Moderately high HHI (48%)
Urban effete (66%)

Top monetizable interests:

NBA (sporadic, 68%)
Organic shit (84%)
Books made of actual trees with runs of fewer than 25k (97%)
Premium brands that connote more indie cred than LVMH ones but use the same psychological tactics and supply chain (73%)
“Ethnic” food (59%)
Booze with old-timey labels (76%)
Home delivery of basic necessities any competent human would have picked up at CVS himself (91%)

Waking hours: 6:30–7:28am (snooze-dependent) to 11pm ET
Mindless insomnia: 2:30 to 3:30am ET
Days without leaving the house (monthly): ~2.2

Ducking from algorithms by opening a “private” tab is the weakest of hedges. I get the distinct feeling it is like spelling the bad things L-E-T-T-E-R B-Y-L-E-T-T-E-R in front of a precocious child: temporarily useful but ultimately futile.

With the rising wave of products with AI at their core, our algorithmic selves are soon to play an even larger role in our real selves’ lives. It would be nice to meet our other halves.

Why shouldn’t apps tell us what they really think of us?

Notes

¹ I wonder about going out in algorithmic drag, purposefully creating false versions of myself just for the machines. On Monday, hop on Twitter as a suburban working mom, getting served up ads for childcare marketplaces and mini-vans. Tuesday, hit Pinterest in the guise of a hedge fund manager remodeling his Hamptons estate, and see the infinity pool recommendations that come with. Etc, etc.

² I took some creative license in describing the creation of such a feature here. See this talk by Adam Laiacano, who built the actual version of blog recommendations at Tumblr, to understand the complex technical work involved.

³ More importantly, my use of “algorithmic” as shorthand for a wide range of technology is reductive. At least I didn’t call it “Big Data,” okay?