How To Block Genius.com Annotations

Update, 2016.05.25: I’d recommend using Genius Blender, a simple JavaScript one-liner, over the methods described below. You can read more about the security issues surrounding Genius in my new article for the Verge.

Over the weekend I wrote a tool to break the annotation functionality of genius.com.

Slow down. You wrote a what to do what to who now?

Genius, formerly known as Rap Genius, is a web site that allows users to annotate blocks of text that appear on other sites. It’s very cool technology; you can just visit any page on the internet using a Genius redirect link, and it will show up with all sorts of additional information which has already been appended by other people. I wrote some code which lets site owners break the Genius annotations for their site, as well as a WordPress plugin which makes that code much easier to use.

If it’s cool, why do you want to break it?

There are two sides to that coin. The existence of the technology they’ve developed should be concerning to anybody who wants to put something on the internet. Not everything needs or deserves freeform annotation by users, and some things – some people – may be actively or disproportionately harmed by it. Genius has made special arrangements with some sites, such as the New York Times (which is also my employer), but hasn’t provided a way for smaller users to either opt in or opt out. This means they’re effectively forcing it on everyone.

I’m also firmly of the opinion that we’ll all be better off if functionality like this is handled by a standards body like the W3C, or a non-profit like the WikiMedia Foundation, or at least an open-source software project. Annotations are a pretty fundamental expression of the nonlinear ways we talk, write, and think, so I’m nervous about the possibility that the content and mechanisms could end up owned by a single for-profit tech startup.

Why did you do this now?

A few days ago Ella Dawson wrote a very upsetting blog post about how Genius was functionally equivalent to forcing crude, violent, or hateful user comments onto a web site she created as a safe space to write about the sensitive work she does. When she reached out to Genius for help, the solution they suggested was “don’t look at the annotations.” This bothered me, so I stayed up all night tinkering and figured out how to make a defensive tool.

Actually, you know what? Let me just quote Ella’s post, which you should really go read in its entirety:

A creator receives no notification if someone has annotated their content. Opening my post using Genius was like discovering graffiti over some of my most personal work. Annotations display more like passive aggressive Post-It notes, but for someone who has been gaslit by partners, diminished by journalists, and harassed by mobs online, Genius annotations are an invasive violation.

I am nervous to publish this post because I know it will be annotated, and not in good faith. I am afraid to talk about how Genius can be used for harassment and abuse because Genius’s code offers no way for me to protect myself from the harassment and abuse I will receive for writing about it. Considering one of the people who annotated my blog is a News Genius editor, I’m not confident we agree on what harassment and abuse even is.

News Genius was probably created as a way to speak truth to power, but it has incredible potential to punch down. I am not a highly paid journalist at a huge publication; I am a survivor with a blog. You can hate-read my content all you want—I know that is a risk of being a person who says things on the Internet. But when you create a tool that pastes commentary directly on top of my work without letting me opt-in and without providing a way for people to turn off the annotation on their pages, you are being irresponsible. You are ignoring the potential your tool has to be abused, and you are not anticipating the real harm your tool can do. News Genius adds one more way for people on the Internet to be made unsafe. The potential it has to intimidate and silence marginalized voices needs to be recognized. Snarky journalists are not who I am afraid of. A tool that allows my abusive ex-boyfriend to interact with me and my content is a tool that should not exist.

When I reached out to Genius and News Genius to ask how I could opt-out of annotations on my website, I was told to not use the Genius URL or its extensions. As I had not downloaded the extension in the first place, their advice was basically just don’t look at the annotations.

That response is completely unacceptable, both as a customer support exchange and as a general guiding corporate principle. Harassment is a huge problem on the internet; even massive tech companies like Facebook and Twitter still have trouble with it, and they built and control their own products. The annotation functionality Genius provides just piggybacks on other things that you and your friends build, write, and post, so the bar should be even higher. I hope the existence of this code will force them to step up and dedicate substantial resources to handling this issue.

This is great! You have saved us all from certain doom!

Not so fast. There are still some weaknesses that would make it easy for Genius to undermine this tool if they so desired. I think that would be a horribly fucked up thing for them to do, but it’s certainly within the realm of possibility. We’ll get to all that in a minute.

How do I use it?

Code and instructions for developers who want to implement the text filter are posted on GitHub, and it’s very easy to rebuild the same logic for additional programming languages. If you have a WordPress blog, you can also just upload the files and it should automatically appear as a plugin in your admin settings (although you can only do this if you have a self-hosted site, rather than one provided by the WordPress.com service). There’s also a simpler JavaScript version available for use in Node.js.

How does it work?

Many thanks to Genius CEO Tom Lehman for explaining precisely how to break Genius! A few months back, he gave a talk at their now-defunct networking event for software engineers in which he explained the text-matching algorithm. Its vulnerabilities became apparent with a little digging.

Here’s the talk:

But hey, you don’t actually have to watch it! There’s a lot of talk about the Bitap algorithm and diff-match-patch, but as I started reading up on both, I quickly came to the realization that it would be much easier to tear it all down than it was to build. I didn’t actually have to dive too deep once I realized that this all essentially depends on sequences of characters. When we screw with that, Genius falls apart immediately.

Sequences of characters?

Right, okay, maybe I should explain in greater detail what this code actually does. When you visit a Genius redirect link, well, there’s really nothing I can do for ya, because I can’t access their servers. What we can do, however, is make the material they’re trying to annotate incompatible with the way they do annotations.

In order to be useful, an annotation needs to have a reference point – in the talk above, Tom refers to them as “anchors.” These are the passages of text you click on to reveal the annotations; Genius displays them in yellow. Tom’s talk goes into some detail about how they connect annotations to anchors, which is a non-trivial problem. Those algorithms I mentioned are basically ways to train a computer how to continue to recognize text even after it changes – for example, if you edit a word in the middle of a sentence. They work by measuring character sequences. That is a very easy thing to destroy if you’re willing to be extremely reckless.

Uh oh.

Yeah.

So?

Buckle up, this is where it gets ugly.

The Genius text-matching algorithm works because it keeps track of characters that appear in succession, so even if some of the content changes, it still has an overall picture of what’s going on. It assigns weighted values to the character matches, and then calculates the most likely annotation anchor based on those values. All we need to do is prevent successive character matches.

Surely you don’t mean…

This code works by rewriting the original text using patterns generated randomly on the fly, injecting extra characters everywhere (I labeled them “wrenches” in the code) until the Genius servers decide it is completely incoherent and they don’t know where to put any of the annotations. But the original content is still readable by people, because all those extra characters are invisible!

Oh my, that sounds completely unreasonable!

Totally, in several different ways. As a developer, I consider this tool a totally insane thing to have to had to build in the first place, because it is completely counterproductive. The extra characters increase the size of the pages, so they take longer to serve over a network. It doesn’t work with screen readers, unfairly breaking access for the visually impaired. And obviously it should not be incumbent upon the entire internet to alter all existing content in order to work around Genius.

But hey, it’s something. Humans can still read the text. It’s still copy/paste friendly in most cases – works fine in both Microsoft Word and Google Docs, though you might see the extra characters if you copy and paste from an obscured passage on a web page into a code editor. It might even be possible to write a reverse implementation of this in JavaScript, so the page is rendered with scrambling and then unscrambled by the user’s browser UI; in that case, the only people seeing the scrambled content would be Genius and other would-be scrapers.

And this all happens on the fly, at the last minute, just before the web page is rendered onto the screen, so the original text upstream can remain untouched. Hopefully eventually you will be able to switch back to just rendering that again, once Genius starts providing a more sensible way to opt out.

Can Genius do anything about this?

Yes, this is just a small token step in an arms race I’m probably destined to lose. They could disable this in a heartbeat simply by stripping out the invisible characters before performing their text-matching analyses. (Update: not true, per Genius engineering lead Mat Brown.) That would defeat this method, but it would also be a horrific moral failing on the company’s part. They’d be actively overruling users who have identified themselves as vulnerable and taken substantial measures to remove their material, simply because they feel entitled to co-opt the whole world’s content.

In other words, if at some point in the future you try to use this code and it no longer subverts the annotations, there’s a good chance you should be absolutely livid. That would likely mean that Genius has decided to specifically reverse this text scrambling method in order to force their product on, I don’t know, rape survivors and suicidal people and whoever else had previously opted out for reasons Genius surely will not have responsibly investigated. (Further reading: regarding “don’t read the comments”.)

To be clear: as far as I can tell this code breaks the primary algorithm Genius is built on, and they can only beat it by deliberately doing something despicable. Let’s see!

I tried it, but some Genius stuff still shows up in yellow?

Well, yeah, sort of. As I said, I can’t change anything about what happens when you visit a page loaded through the Genius server. This code breaks their ability to attach the annotations to text on the page, but Genius has apparently not considered the possibility that their fancy algorithm will completely fail to match, so they didn’t build a failsafe that suppresses the annotations entirely when that is the case. (This is a strange omission, because it isn’t just a weird edge case relevant to nerds writing code to deliberately scramble the content– what if someone completely swaps out the page text for something new?)

Because I suspect they did not consider the possibility of this catastrophic failure, one or more Genius annotations may still appear in your scrambled text, but they will highlight nonsensical passages which will change every time the page reloads. Maybe this is actually better, though, since a broken user-facing product makes the company look even worse than an outright block.

Is this really the best way to go about it?

Maybe not. It would also be entirely possible to block Genius using JavaScript – you could write a script to delete your own page content if viewed through their redirects, or try to strip out their special «genius_referent» tags or something. But those approaches would have to be implemented client-side, and they wouldn’t affect the ability to collect the annotations in the first place. I think this tool is doing something deeper by providing a way to make text fundamentally un-Genius-able, and maybe that will prove more useful in the long run.

What would you like Genius to do instead?

There are many ways to solve this problem and empower the users, so if Genius hasn’t implemented any of them, it’s a deliberate decision. I don’t really care how they get there. Seems like it would be easy enough for them to start respecting robots.txt, a simple and universal standard used to send opt-out signals to search engines about what the site owner would like them to both include and ignore. Respecting robots.txt is voluntary, but it is one of the signs of a responsible technology company.

What now?

The code is completely free and open without any licensing restrictions whatsoever, so any company or site owner can use it, fork it, re-implement it in other languages, and customize the logic. Anybody who does the latter creates a new scrambling pattern which Genius will have to specifically devote engineering resources to un-scrambling! I’m not going to win the arms race on my own, but maybe with the help of every interested site owner on the internet? This is why we open source things.

Look, I don’t think Genius annotations are inherently bad — they’re perfectly innocuous in most contexts, and cool in a small handful. But at their core, they scrape content, and thus are unquestionably parasitic. Unilaterally forcing that on the entire internet through a sin of omission is not a mature or reasonable way to treat the rest of the world.

More

Motherboard
Observer
Vox
Recode
Politico
The Daily Dot
New York
Fortune
Technically Brooklyn