ESPN, Race, and Presidents

Inspired by (and lifting large amounts of code from) Trey Causey’s investigation of the language that ESPN uses to discuss white and non-white quarterbacks, I similarly wondered about the language ESPN uses to discuss white and non-white Presidents. For instance, a common stereotype is that non-white Presidents assassinate their citizens using unmanned drones, while white Presidents assassinate their citizens using polonium-210. Do such stereotypes creep into sportswriting?

Toward that end, I used Scrapy to scrape all the articles from the ESPN website that matched searches for (president obama), (president bush), (president clinton), and so on. This gave me a total of 543 articles. Then, using Wikipedia, Mechanical Turk, and a proprietary deep learning model, I categorized each of these Presidents as either “white” or “non-white”.

Using NLTK, I tokenized each article into sentences and then identified each sentence as being about

  • one or more white Presidents
  • one or more non-white Presidents
  • both white and non-white Presidents
  • no presidents

Curiously, while there were very few “non-white” Presidents, there were nonetheless about four times as many “non-white” sentences as “white” sentences. (This is itself an interesting phenomenon that’s probably worth investigating.)

I then split each sentence into words and counted how many times each word appeared in “white”, “non-white”, “both”, and “none” sentences. Like Trey, I followed the analysis here, similarly excluding stopwords and proper nouns, which I inferred based on capitalization patterns.

Finally, for each word I computed a “white percentage” and “non-white percentage” by looking at how likely that word was to appear in a “white” sentence or a “non-white” sentence and adjusting for the different numbers of sentences.

After all that, here are the words that were most likely to appear in sentences about “white” Presidents:

plaque 5
severed 4
grab 4
investigation 3
worn 3
unable 3
child 3
suppose 3
block 3
living 3
holders 3
pounds 3
ticket 3
blackout 3
thrown 3
exercise 3
scene 3
televised 3
upon 3
executives 3

Clearly this reads like something out of “CSI” or possibly “CSI: Miami”. If I were to make these words into a story, it would probably be something macabre like

The President grabbed the plaque he’d secretly made from a living child‘s severed foot and worn sock. The investigation supposed a suspect weighing at least 200 pounds who could have thrown the victim down the block, not a feeble politician famous for his televised blackout when he tried to exercise but was unable to grab his toes.

In constrast, here are the words most likely to appear in sentences about “non-white” Presidents:

bracket 32
interview 21
trip 16
champions 16
fan 48 1
asked 35 1
carrier 11
celebrate 11
thinks 11
early 11
eight 11
personal 10
picks 10
appearance 10
far 9
hear 9
congratulating 9
given 9
troops 9
safety 9
fine 9
person 9

This story would have to be something uplifting like

The President promised to raise taxes on every bracket before ending the interview. As a huge water polo fan, he needed to catch a ride on an aircraft carrier for his trip to celebrate with the champions. “Sometimes I get asked,” he thinks, “whether it’s too early to eat a personal pan pizza with eight toppings. So far I always say that I hear it’s not.” His safety is a given, since he’s surrounded by troops who are always congratulating him for being a fine person with a fine appearance.

As you can see, it has a markedly different tone, but not in a way that obviously correlates with the stereotypes mentioned earlier. Whatever prejudices lurk at ESPN are exceedingly subtle.

Obviously, this is only the tip of the iceberg. The algorithm for identifying which sentences were about Presidents is pretty rudimentary, and the word-counting NLP techniques used here are pretty basic. Another obvious next step would be to pull in additional data sources like Yahoo! Sports or SI.com or FOX Sports.

If you’re interested in following up, the code is all up on my github, so have at it! And I’d love to hear your feedback.

Three Keys to Successful Parenting

Now that Madeline is two, it seems appropriate to declare myself a success as a parent. Which means it’s now appropriate for those of you with kids (as well as those of you thinking about having or abducting kids) to ask me, “Joel, what’s your secret?” Which means it’s now appropriate for me to say “I’m glad you asked,” and then write a blog post about it.

1. Improv

I’m sure many of you wondered why I took all those improv classes, and why I made you come watch my improvised musical where we could only use words that started with a letter suggested by the audience, and why I didn’t stop the guy in the second row from choosing ‘X’, and why my song “Xerox Xevious” sounded exactly like “Summer of ’69.”

Well, it turns out that improv is a very easy way to become a better parent. (And that all of my songs sound exactly like “Summer of ’69″.)

Before improv

“Daddy, can I have some more candy?”
“No. Go to bed.”

After improv

“Daddy, can I have some more candy?”
Yes, and after your teeth rot and you become obese and get diabetes and have to have your foot amputated, then you should go to bed.”

Before improv

“Daddy, where do babies come from?”
“Go ask your mother.”

After improv

“Daddy, where do babies come from?”
[sits down on a plain black box, mimes that it's maybe some kind of pirate seat on some kind of pirate boat, and starts in a pirate accent] “Yarr, ye land lubbers always be asking me questions about babies … [10 minute monologue in a pirate voice about pirate-y things that cleverly reincorporates elements from earlier in the conversation] Arr, go ask the first mate!”

Before improv

“Daddy, I need to go to the bathroom.”
“Again? You just went!”

After improv

“Daddy, I need to go to the bathroom.”
“DING! Now in the style of Shakespeare.”
“Daddy, I need to go to the bathroom!”
“DING! Now in the style of film noir.”
“Daddy, I NEED to GO to the BATHROOM!”
“DING! Now in the style of a fetish video.”
“Daddy, I peed my pants.”
“And scene!”

2. Radical Libertarianism

Most books (with the notable exception of *Praxeological Parenting*) will tell you that moderate libertarianism is all you need to be a good parent. But there are a great many parenting problems that a belief in the night-watchman state does little to solve.

For instance, when your kid doesn’t want to go to school because it’s a brainwashing factory designed to grind young impressionable minds into submission by (among other things) forbidding them from leaving their seats or talking “out of turn” or using the restroom without first obtaining permission, the moderate libertarian answer is typically to offer them a voucher that covers the tuition to a different brainwashing factory. Your kid is unlikely to find this satisfying, for obvious reasons.

Similarly, when your kid wants to BitTorrent the Criterion Director’s Cut version of Dora the Explorer, the wishy-washy moderate libertarian “you wouldn’t download a Dora the Explorer handbag!” position on intellectual property is not going to make her particularly happy.

And what will you tell her when she asks (as all kids inevitably do) how granting a monopoly on violence could possibly be a good way to prevent monopolies and violence? Or why the dinosaurs on “Dinosaur Train” are able to peaceably resolve their various conflicts despite living approximately 66 million years before the invention of government? Or why it’s OK for the government to take pieces of paper out of daddy’s wallet just as long as they don’t take too many, while she gets punished for taking even one, and don’t try to give me any of that John Rawls “veil of ignorance” stuff, I might have bought that crap when I was an infant, but now that I’m TWO YEARS OLD the flaws in his “logic” are pretty glaringly obvious?

Whereas radical libertarianism easily sidesteps all these problems, making parenting a breeze (relatively speaking).

3. Trolling

Did you ever imagine that all those years you wasted trolling that idiot Marxist kid on LiveJournal debate would end up being useful? Because they are! Kids love being trolled! Love it! Here are a few of Madeline’s favorite trolls:

“My Hippo”

This one’s easy, you just pick up something that belongs to the kid (e.g. a stuffed hippo) and troll that it’s yours:

“Hey, my hippo.”
“No, MY hippo!”
“I’m pretty sure this is daddy’s hippo.”
“No, MY hippo!”
“Does it have your name on it?”
“MY hippo!”
“It was just lying on the floor and I homesteaded it.”
“MY hippo!”
“Have your protection agency call my protection agency and maybe we can work something out.”
“MY hippo!”
“Behind the veil of ignorance it could just as easily have been my hippo.”
“MY hippo!”
[ several hundred lines of dialogue removed due to space constraints ]
“Yeah, but what does it really mean to ‘own’ something?”
“MY hippo!”
“And scene!”

“Science Project”

Part of being a parent is helping your kids with science projects, so help them “demonstrate” something that isn’t real, like cold fusion, or quantum computing, or evolution. Chances are their teachers won’t know the difference, which makes it also work on another level.

“9/11 Trutherism”

Kids will believe just about anything, even that that third WTC 7 skyscraper would just collapse on its own despite not even being hit by a plane. Even so, it’s not very hard to convince them that the towers were brought down on 9/11 by controlled demolition using explosives secretly planted in advance by the government in order to create an excuse to invade Iraq and Afghanistan in order to pave the way for a new American hegemony. And then they’ll repeat this on the playground, and then you’ll get called in for a parent-teacher conference at which you can reveal that you’d assumed that she’d picked these theories from the playground, which means that if she didn’t then maybe she just came up with them on her own? And that if the official narrative is so shoddy that a 2-year-old can pick holes in it, then maybe Alex Jones is onto something!

“The Craigslist Experiment”

OK, so possibly there are some kinds of trolling kids don’t like.

Should you get a Ph.D.?

No.

Vegas with a Lap Infant

Madeline is about to turn two, which is the magical age at which kids transition from fly-for-free lap infants to requires-a-ticket-and-some-sort-of-kid-specific-restraint-and-did-I-mention-a-ticket seat toddlers. Which meant we needed to squeeze in one last vacation. And since Seattle weather kind of sucks, we wanted to go somewhere where the weather was nice. And since flying with a lap infant also kind of sucks, we wanted to go somewhere that wasn’t too far away. Hence Vegas.

You might think Vegas an unorthodox place to take a two-year-old. Now that I’ve finally been here, I’m inclined to agree with you. Nonetheless, with a few caveats, Vegas is an awesome place to bring a lap infant.

1. You have to like to walk

Really, you have to like to walk. I forgot to own a pedometer, but based on the amount of grime that has accumulated on my shoes and a fairly elaborate spreadsheet, I estimate that we’ve been walking somewhere between 3 and 5 miles a day. Generally speaking, we are not stroller people, we are “let Madeline walk when she wants to, and carry her the rest of the time” people. This works fine when you walk about a mile a day. This does not work fine when you walk five miles, and our first day here ended with severe backaches.

Naturally, we didn’t even bring a stroller, so on the second day I hoofed it another 1.5 miles to the nearest Target and bought their cheapest $20 stroller, which was pink. (Then I took a bus back and got yelled at for trying to bring a coffee on the bus, where do you think you are, Seattle, and got chatted up by a junkie who assured me that if he had kids he never would have started using.) Being a $20 stroller, it is a complete piece of junk, and so of course Madeline has grown completely attached to it, has named it (“Pink”, imaginatively), and will probably cry when I throw it into the dumpster behind the hotel at check out, as is my plan.

Anyway, just about everywhere on the Strip is at least a 30-minute walk from anywhere else on the Strip. There’s kind of no way around this. Say you want to support your Wazzou Cougs, who are playing basketball in the Pac-12 Tournament, which — in order to show that gambling on college sports is in no way acceptable — is being held at the MGM Grand. Aha, you think, to make things convenient I’ll just stay at the MGM Grand myself. What you failed to account for is that the MGM Grand is itself a 30-minute walk from the MGM Grand, past a Rainforest Cafe, several Joël Robuchon Ateliers, and about a gazillion slot machines with Gen-X enticing themes like “Ghostbusters” and “Ghostbusters II” and “On Our Own (Theme from Ghostbusters II)”.

Additionally, in most non-Strip parts of the world, if you can see something it is generally close by. However in Las Vegas all of the hotels are built at grotesquely unintuitive scale, so that if you can see (say) the Bellagio then it’s likely (but not certain) that you could probably walk there in less than an hour, although your walk — despite both starting and ending at street level — will involve a bewildering variety of elevation changes, most of which involve escalators that you will get yelled at by security for bringing a stroller on, requiring you to ride a bewildering variety of foul-smelling elevators with a bewildering variety of obese people riding a bewildering variety of rented mobility scooters.

2. You have to like to eat

Lap infants are not allowed to gamble, are not allowed near gambling, not even if you just want to sit in the Rockin’ Sensory Immersion Surround Sound Gaming Chair of the KISS slot machine one more time so that you can “UNLOCK THE STARCHILD”. Lap infants are not allowed to see PEEPSHOW, featuring Coco of E!’s “Ice Loves Coco”. Lap infants are not allowed into the bar at Cabo Wabo, Coyote Ugly, or the Tabú Ultra Lounge.

They are, however, allowed into buffets, which all have a “kids 3 and under eat free” policy, which makes them good places for your lap infant to practice eating with utensils, since even if she drops every spoonful of creme brulee on the floor or her lap you can just grab a few more ramekins and try again, and even if she pukes up an entire cheese omelet you can just get another one.

Suffice it to say that we ate a lot of buffets in Las Vegas, here is how I would rank them:

1. The Bacchanal Buffet at Caesar’s Palace
2. The Wicked Spoon Buffet at the Cosmopolitan
3. just about every other buffet in Las Vegas
4. Le Buffet aux Paris Las Vegas

Supposedly there are also non-buffet places to eat in Vegas, many of them named after chefs who have appeared on television programs and/or have French-sounding names. I wouldn’t know anything about those.

3. You have to like to spend money

Vegas is not cheap. Sure, you could stay at Terrible’s, where I think they actually pay you to sleep and eat, and where the $9.99 Sunday Champagne Buffet Brunch is deservedly legendary. But it is a long, long walk from the strip, past a variety of foul-smelling homeless people, and past the same three HOT ASS ESCORTS advertisement dispensers over and over and over again. (Also, the hipsters at Yelp are kind of down on the place.)

However, if you want to stay and eat at one of the casinos named after birds, or dead people, or capitals of France, it’s going to cost you. If you want to eat at one of the buffets where “angry” describes the mac and cheese and not the service, it’s going to cost you. If you want your frozen sex-on-the-moon grape-raspberry dacquiri in the 32-ounce souvenir neck-lanyard yard-tube container, it’s going to cost you. And then you look back and realize that all the money you saved not buying the baby a plane ticket you spent on a dessert named after Emeril Lagasse and on getting your picture taken with a weirdo dressed like SpongeBob SquarePants dressed like a showgirl.

4. You have to like kid-friendly activities

Surprisingly, there are a few kid-friendly activities in Vegas. Lap infants are kind of at that sweet spot where they like to look at flashing lights and captive flamingos and garish costumes, but where they are too young to ask awkward questions like “Daddy, what’s a ‘hot ass escort’?” and “Daddy, isn’t it cruel to clip flamingos’ wings and put them on display for a bunch of drunken gamblers?” and “Daddy, what does ‘Cabo Wabo’ mean?” at which point you have to have the talk about the unlistenable “Van Hagar” years.

The Circus Circus (“What kind of circus?” “A circus circus!”) has an “AdventureDome” that contains three rides suitable for lap infants (who ride free as long as their parent buys a $5 ticket), one of which is a terrifying school-bus-themed ride which helps prepare lap infants for their mind-numbing trips through the public education system.

The Mandalay Bay (“What kind of bay?” “A Mandalay bay!”) has a “Shark Reef” that is not actually a reef (due to acquarium acidification, I suppose) but does have a handful of sharks and a manta ray petting zoo that’s surprisingly fun to frighten lap infants with.

The Excalibur (“What kind of caliber?”) has a “Tournament of Kings”, which involves horses and swords and broasted chicken and pyrotechnics and a mediocre A/V system that makes it impossible to understand whether Merlin the Wizard is telling you that you’re supposed to tip your servers or that you’re not supposed to tip your servers.

The Bellagio has a pretty incredible fountain show where they play Lee Greenwood and shoot water around in patriotic patterns, and the Mirage has a pretty incredible volcano show, which is fun to explain to your lap infant as a manifestation of the gods’ anger, which can only be assuaged by throwing a lap infant into the volcano.

If your lap infant has reached the age of obsession with choo choo trains, then you can spend the day riding the Las Vegas Monorail (after a bewildering trek through one of the casinos using a bewildering variety of elevators to reach one of the stations), where she can happily yell out “choo choo train!” over and over again all the while watching a bunch of drunk bros putting their lamest moves on a group of amateurishly-tattooed girls from Canada (“whoa, you’re from Canada, that’s so awesome, eh!”).

There is also a supposedly-family-friendly “Tribute to Red Skelton” show, which Madeline refused to see for political reasons.

All that said, bringing a lap infant also means you can’t eat at one of the Joël Robuchon Ateliers or see the “Steve-O and Tom Green Stand-Up Comedy Extravaganza” or slap Kathy Griffin, not unless you’re willing to pawn your father’s watch in order to afford the services of a Vegas Babysitter, who is sort of like a nanny except infinitely more expensive. (And you would have already had to pawn your father’s watch in order to put a deposit down on your Joël Robuchon meal anyway.)

In conclusion, Vegas is sort of like Disneyland for lap infants, except

(a) Vegas is cheaper
(b) Vegas is more fun
(c) the Mickey Mouse impersonators in Vegas have crappier costumes
(d) Vegas is marginally less evil

Highly recommend!

Secrets of Fire Truck Society

Hi, I gave a talk at Ignite Strata on “Secrets of Fire Truck Society” and at the end I promised that for more information you could visit this blog. Unfortunately, I haven’t had time to write a blog post. Here are some links to tide you over until I do:

On On Leaving Academia

Several people in my influencesphere have linked to this essay by a CS prof who’s leaving academia to join Google in order to “make a positive difference in the world.” I am, of course, wholly supportive of such a program, if not of his precise rationale, which is a mish-mash of ranting about wicked Republicans and wild-eyed idealism about the Academy.

What interests me most about his essay is the section entitled “Mass Production Of Education”, which is misguided in all the ways you’d expect from someone steeped in the culture of “bespoke” education. It lists three “worries”:

First, I worry that mass-production here will have the same effect that it has had on manufacturing for over two centuries: administrators and regents, eager to save money, will push for ever larger remote classes and fewer faculty to teach them.

Said differently, technologies that allow fewer faculty to teach the same number of students will allow universities to operate with fewer faculty. Let’s call this worry “Luddism“. I love a good loom-smashing as much as the next guy, but it’s sort of hard to take seriously a preference for the 19th-century manufacturing regime.

It seems likely that in a hundred years our grandchildren and those of us who’ve successfully been cryonically revived will share a laugh about how “education” used to involve crowding people into a room and making them sit still while someone stood up front and lectured at them. And then someone will brain-cast a ludicrous hyper-essay about how 4-D printing is democratizing the singularity, pining for the good old days of 3-D printing. And so on.

Second, I suspect that the “winners win” cycle will distort academia the same way that it has industry and society. When freed of constraints of distance and tuition, why wouldn’t every student choose a Stanford or MIT education over, say, UNM?

Said differently (and with apologies to UNM, which I’m sure is a fine school), if every student has access to cheap, high-quality education, few of them will choose to pursue a low-quality education. It is easy to see how purveyors of low-quality education might worry about this, but it’s hard to imagine why anyone else should.

Are we approaching a day in which there is only one professor of computer science for the whole US?

Seems pretty unlikely, but if we were that would be awesome because it would free up all the other computer science Ph.D.s, many of whom are brilliant, to do other stuff (like building Groupon and Pinterest clones)! This would be sad for the ones who really, really, really want to be teachers, but on balance it would be a huge win for the world.

Third, and finally, this trend threatens to kill some of what is most valuable about the academic experience, to both students and teachers. At the most fundamental level, education happens between individuals — a personal connection, however long or short, between mentor and student.

I have no idea how to say this differently, so I won’t try. Having been a teacher, I agree that the most rewarding moments happened between individuals. (Particularly when one of the individuals was the cute goth freshman girl who aced all the quizzes but still came to office hours.) Were those the most valuable parts of the teaching experience? Less clear. What’s more clear is that what was/is most valuable about my experience as a student was/is learning stuff. And these days most of what I know that’s useful I’ve learned from books or doing or even Coursera, not from the academy. I’ve broadened my horizons by pleasure reading, by arguing on LiveJournal, by discussions with peers on geek hikes far more than I ever did through school. With very few exceptions, my most profound intellectual connections have been with people I met outside of the school system.

It resonates at levels far deeper than the mere conveyance of information — it teaches us how to be social together and sets role models of what it is to perform in a field, to think rigorously, to be professional, and to be intellectually mature.

I suspect you have to have spent your whole life in academia to seriously assert that “the human connection in education” is the only path to these things, or even the easiest path to these things. College taught me how to play the same juvenile bulshytt status games we played in high school but at a slightly higher level. College professors were (sometimes) great role models for how to behave if you ever became a college professor, but not for much else. The levels of professionality and intellectual maturity I experienced in the academy were certainly no greater than I’ve experienced in the real world. I will freely admit to learning rigor (some would say too much rigor) while studying mathematics, which primed me to recognize the lack of rigor in so many other fields.

I am terribly afraid that our efforts to democratize the process will kill this human connection and sterilize one of the most joyful facets of this thousand-year-old institution.

Said differently, “we fear change”. Hopefully at Google he’ll learn to stop saying “democratize”, and maybe he’ll even meet a Republican or two. There must be one or two Republicans at Google, right?

The Hardest Job There Is

One summer during college I was stringing together temp jobs in order to make money so that I could afford to go out with my friends at night and play “Star Trek” pinball. (I would have preferred, of course, to spend my summer developing my idea for a “group couponing” website, but as the summer in question predated widespread adoption of the Internet, the decision was out of my hands.)

These were super-boring temp jobs, involving things like data-entering anonymous “secret shopper” surveys for Jersey Subs, filing papers alphabetically, and going through medical bills with a red pen to make sure that the prices didn’t exceed prescribed rates. (The last was the worst, as their computer system ran on OS/2, which some genius decided should have chess rather than Minesweeper, which made it very difficult to blow off steam after decimating a particularly tough bill, which is why I originally took up amphetamines.)

At some point the temp work simply dried up, possibly because there were no more medical bills, possibly because no one was willing to eat at Jersey Subs anymore, possibly because of the amphetamines. And so my dad arranged it that I could work for a friend of his who owned a warehouse of surplus metal parts.

What were these metal parts? I have no idea. They were large and heavy and in bins on pallets, and it’s possible they were used to repair trains, or in air conditioning, or as weapons. They came in various shapes and sizes and weights (heavy *and* very heavy), and every day orders would pour into the warehouse that some company wanted 137 of the metal pieces from bin A17. My job, then, was to retrieve bin A17 (which involved a forklift, which was sort of cool, except that I never got the hang of rear-wheel steering and always ended up crashing into things) and get an empty pallet and then manually choose 137 of the least-rusty metal pieces from bin A17 and pile them onto the empty pallet, all the while counting (and then double-counting) to make sure that there were indeed exactly 137 of them. Then I’d put the bin back and move on to the next order of 94 metal pieces from bin C29, and so on, and so forth.

(To this day, it is tough for me to imagine a job that is a worse mismatch for my aptitudes and preferences, except possibly for building model histories of men’s shoes.)

At the end of each day I would collect my pay (which was itself in non-descript metal pieces) and go home and take painkillers and try to scrub all the fine metal grit off my skin and try to cough all the fine metal grit out of my lungs and then cry myself to sleep and have nightmares about counting metal pieces. All of which, quite obviously, left no time for “Star Trek” pinball.

And so after a week, over the vociferous objections of my parents, who insisted that the metal pieces I was earning were likely to represent the difference between success and failure in life, I quit. Accordingly, I have blamed the various subsequent failures in my life on the metal pieces that never were.

So it stood until this week, when Hilary Rosen (who, for reasons inexplicable to me, is still allowed to show her face in public after her stint running the RIAA) made some crack disparaging Mitt Romney’s wife for being a stay-at-home mom. Tactically this was moronic, as everyone knows plenty of admirable stay-at-home moms, and also everyone knows that the most fruitful line of attack on Mitt Romney’s wife is that she married Mitt Romney, and let’s see how her “the angel Moroni pointed a shotgun at us and said we had to” excuse plays in the court of public opinion.

Which means that everyone and his brother is rushing to throw Hilary Rosen under one of a variety of buses. Bill Donohue, for instance, wants to throw her under some sort of “lesbian parent” bus, which I’m pretty sure runs on biodiesel, and I would love to throw her under the “she ran the RIAA, which means that nothing she says should ever be listened to by anyone ever” bus, but most people are focusing on the old “parenting is the hardest job there is!” bus.

It turns out, though, that I’m a parent, and so I happen to know that PARENTING IS NOT EVEN CLOSE TO THE HARDEST JOB THERE IS. Metal piece warehouse was a harder job. Burger King was a harder job. Even MATH FREAKING GRAD SCHOOL was a harder job. (As some versions of the bus insist that only mothering is the hardest job, I double-checked with Ganga, and she agrees with my analysis.)

That’s not to say that parenting isn’t work. It is, and occasionally it’s even very unpleasant work, like when it’s 3am and the baby won’t sleep and will scream if you don’t rock her, and you still haven’t prepared your slides for your 8am meeting with Hilary Rosen to present your new plan for permanently ruining the lives of music-downloading teenagers, and all you want to do is sleep and use your dreams to figure out a way to pretend like you care about “artists”. Or when she poops on you. (The baby, not Hilary Rosen, although that also sucks.) Or when you’re trying to write a blog post making fun of Hilary Rosen and the baby won’t stop screaming in your ear and banging on your keywinevsoivdkdsvl

But parenting is also a lot of fun. It’s a huge joy when you finally teach your kid how to Chicken Dance, or when she learns to swear, or the first time she asks you “please can you read me one more chapter before bed, daddy?” of Atlas Shrugged. No metal part ever even asked me about The Fountainhead!

I recognize that it’s uncharacteristic of me to stake out the middle ground like this, but I guess having a kid has been a deeply moderating influence and has taught me the value of compromise. So can’t we all just agree that parenting is nowhere near as hard as sorting and lifting and counting metal parts, that Hilary Rosen has no place in polite society, and that babies love Atlas Shrugged?

Why Have You Not Signed Up For BIL Already?

I’m sure you’ve heard of TED, which is a really expensive, really exclusive annual conference at which famous and/or accomplished people give lectures to wealthy and/or lucky people. Surprisingly, despite my fame, accomplishments, wealth, and luck, I have never been invited to attend or lecture. (Actually, it’s not that surprising, given that they once gave their TED Prize to Karen Armstrong, my mortal enemy, and that they seem to like Nathan Myhrvold, my other mortal enemy.1)

Luckily for me, there is a non-union, Mexican equivalent an open-source equivalent, the BIL conference, which costs only $50, and which is open to pretty much everyone. Three years ago they were kind enough to let me give my “Your Religion Is False” talk, and then two years ago they didn’t firm up the date until it was too late for me to make travel plans, and then last year they let me give my lukewarmly-received “How To Be Funny” talk.

This year I plan to outdo them all with my balanced discussion of intellectual property: “Hitler Loved Patents”. Although I have spent the majority of the past 10 years arguing on the Internet about intellectual property with various weirdos and libertarians and weirdo libertarians and libertarian weirdos, it has only recently become acceptable to express my views in public. And what better way than through a profanity-laden speed-talking Powerpoint presentation?

There will, of course, be a large number of other talks, many of which will be almost as entertaining and/or compelling as mine. There will also be, I’m told, a “sex-positive boiler room”2 and some sort of lockpicking workshop, one or both of which certainly addresses your hesitations about attending.

If it’s anything like last year, there will also be interesting breaks between sessions, where BILders socialize and where crazy people grab the empty mics and perform spoken-word-poetry-ish rants about free energy and capitalism, all the while people chuckle nervously and wonder whether this is a scheduled part of the performance or simply the result of too little security. There might be coffee too.

There will certainly be a huge assortment of burners, transhumanists, futurists, cryonicists, libertarians, anti-libertarians, polyamorists, monoamorists3, objectivists, subjectivists, artists, crossfitters, politicians, entertainers, hosts of invention-related television shows, hackers, humorists, Paul Grasshoffs, atheists, and doers and makers of all types. Many of them are my good friends, and many more will be by the time the weekend is over. (Also, many of them will be my enemies by the end of the conference, since you can’t exactly tell people that the industry they’ve dreamed of working in their whole lives is morally on par with the death gulags without alienating a few folks, but such is the price of progress.)

In addition, the whole event takes place on a boat, which has some sort of giggly significance that is lost on me but probably has something to do with some creepy anime that everyone except me downloads and watches illegally.

Anyway, Long Beach really isn’t that far from wherever you are, and $50 is less money than you’d spend buying a dozen Original Six Dollar Burger®s at Carl’s Junior, so why have you not signed up already? And in the event you need burgers that badly, Simone gave me this code for 20% off the registration, which will save you $10, which means you’ll still be able to buy two of those tasty, tasty Original Six Dollar Burger®s4 and have the conference weekend of your life.

So I guess I’m not really sure what your objection is at this point. Sometimes I hear “Joel, you’re biased because the whole event is organized and produced by your friends,” and sometimes I hear “Joel, surely you’re on the take from the Long Beach Convention and Visitors Bureau and/or Carl’s Jr.,” and still other sometimes I hear “Joel, you recommended that I attend the Libertarian National Convention in Anaheim in 2000, and that really sucked,” to which I can only respond, “were you at the same Libertarian Convention I was at, because I guarantee you that that was the most fun that anyone’s ever had in Anaheim in the history of mankind.”

So can you just go ahead and sign up already?

1. I’m only ten and I already got two mortal enemies.
2. No, I have no idea what this is either, although I suspect it has something to do with high-pressure stock trading.
3. Monoamorists. It’s a word. Look it up.
4. Six-dollars is what you put on your tax return, but the cash price is closer to $4.

Hacking Hacker News

Hacker News, if you don’t know it, is an aggregator / forum attached to Y Combinator. People submit links to news stories and blog posts, questions, examples, and so on. Other people vote them up or down, and still other people argue about them in the comments sections.

If you have unlimited time on your hands, it’s an excellent firehose for things related to hacking. If your time is more limited, it’s more challenging. People submit hundreds of stories every day, and even if you only pay attention to the ones that get enough votes to make it to the homepage, it’s still overwhelming to keep up:

What’s more, a lot of the stories are about topics that are boring, like OSX and iPads and group couponing. So for some time I’ve been thinking that what Hacker News really needs is some sort of filter for “only show me stories that Joel would find interesting”. Unfortunately, it has no such filter. So last weekend I decided I would try to build one.

Step 1 : Design

To make things simple, I made a couple of simplifying design decisions.

First, I was only going to take into account static features of the stories. That meant I could consider their title, and their url, and who submitted them, but not how many comments they had or how many votes they had, since those would depend on when they were scraped.

In some ways this was a severe limitation, since HN itself uses the votes to decide which stories to show people. On the other hand, the whole point of the project was that “what Joel likes” and “what the HN community likes” are completely different things.

Second, I decided that I wasn’t going to follow the links to collect data. This would make the data collection easier, but the predicting harder, since the titles aren’t always indicative of what’s behind them.

So basically I would use the story title, the URL it linked to, and the submitter’s username. My goal was just to classify the story as interesting-to-Joel or not, which meant the simplest approach was probably to use a naive Bayes classifier, so that’s what I did.

Step 2 : Acquire Computing Resources

I have an AWS account, but for various reasons I find it kind of irritating. I’d heard some good things about Rackspace Cloud Hosting, so I signed up and launched one of their low-end $10/month virtual servers with (for no particular reason) Debian 6.0.

I also installed a recent Ruby (which is these days my preferred language for building things quickly) and mongoDB, which I’d been meaning to learn for a while.

Step 3 : Collect Data

First I needed some history. A site called Hacker News Daily archives the top 10 stories each day going back a couple of years, and it was pretty simple to write a script to download them all and stick them in the database.

Then I needed to collect the new stories going forward. At first I tried scraping them off the Hacker News “newest” page, but very quickly they blocked my scraping (which I didn’t think was particularly excessive). Googling this problem, I found the unofficial Hacker News API, which is totally cool with me scraping it, which I do once an hour. (Unfortunately, it seems to go down several times a day, but what can you do?)

Step 4 : Judging Stories

Now I’ve got an ever-growing database of stories. To build a model that classifies them, I need some training data with stories that are labeled interesting-to-Joel or not. So I wrote a script that pulls all the unlabeled stories from the database, one-at-a-time shows them to me and asks whether I’d like to click on the story or not, and then saves that judgment back to the database.

At first I was judging them most-recent-first, but then I realized I was biasing my traning set toward SOPA and PIPA, and so I changed it to judge them randomly.

Step 5 : Turning Stories into Features

The naive Bayes model constructs probabilities based on features of the stories. This means we need to turn stories into features. I didn’t spend too much time on this, but I included the following features:

* contains_{word}
* contains_{bigram}
* domain_{domain of url}
* user_{username}
* domain_contains_user (a crude measure of someone submitting his own site)
* is_pdf (generally I don’t want to click on these links)
* is_question
* title_has_dollar_amount
* title_has_number_of_years
* title_references_specific_YC_class (e.g. “(YC W12) seeks blah blah)
* title_is_in_quotes

For the words and bigrams, I removed a short list of stopwords, and I ran them all through a Porter stemmer. The others are all pretty self-explanatory.

Step 6 : Training a Model

This part is surprisingly simple:

* Get all the judged stories from the database.
* Split them into a training set and a test set. (I’m using an 80/20 split.)
* Compute all the features of the stories in the training set, and for each feature count (# of occurrences in liked stories) and (# of occurrences in disliked stories).
* Throw out all features that don’t occur at least 3 times in the dataset.
* Smooth each remaining feature by adding an extra 2 likes and an extra 2 dislikes. (2 is on the large side for smoothing, but we have a pretty small dataset.)
* That’s it. We YAML-ize the feature counts and save them to a file.
* For good measure, we use the model to classify the held-out test data, and plot a Precision-Recall curve

Step 7 : Classifying the Data

Naive Bayes classifier is fast, so it only takes a few seconds to generate and save interesting-to-Joel probabilities for all the stories in the database.

Step 8 : Publishing the Data

This should have been the easiest step, but it caused me a surprising amount of grief. First I had to decide between

* publish every story, accompanied by its probability; or
* publish only stories that met some threshhold

In the long term I’d prefer the second, but while I’m getting things to work the first seems preferable.

My first attempt involved setting up a Twitter feed and using the Twitter Ruby gem to publish the new stories to it as I scored them. This worked, but it wasn’t a pleasant way to consume them, and anyway it quickly ran afoul of Twitter’s rate limits.

I decided a blog of batched stories would be better, and so then I spent several hours grappling with Ruby gems for WordPress, Tumblr, Blogger, Posterous, and even LiveJournal [!] without much luck. (Most of the authentication APIs were for more heavy-duty use that I cared about — I just wanted to post to a blog using a stored password.)

Finally I got Blogger to work, and after some experimenting I decided the best approach would be to post once an hour, all the new stories since the last time I posted. Eventually I realized that I should rank the stories by interesting-to-Joel-ness, so that the ones I’d most want to read would be at the top:

and the ones I want to read least would be at the bottom:

The blog itself is at

http://joelgrus-hackernews.blogspot.com/

Step 9 : Automate

This part was pretty easy with two cron jobs. The first, once an hour, goes to the Hacker News API and retrieves all new unknown stories (up to a limit of like 600, which should never be hit in practice). It then scores them with the last saved model and adds them to the database. In practice, the API isn’t working half the time.

The second, a few minutes later, takes all the new stories and posts them to the blog. The end result is a blog of hourly scored digests of new Hacker News posts.

Step 10 : Improve the Model

The model can only get better with more training data, which requires me to judge whether I like stories or not. I do this occasionally when there’s nothing interesting on Facebook. Right now this is just the above command-line tool, but maybe I’ll come up with something better in the future.

Step 11 : Profit

I’m still trying to figure this one out. If you’ve got any good ideas, the code is here.

Hyphen Class Post-Mortem

Last fall I signed up for two of the hyphen classes: the Machine Learning ml-class (Ng) and the Artificial Intelligence ai-class (Thrun and Norvig). Both were presented by Stanford professors but one of the conditions of taking the courses was that whenever I discuss them I am required to present the disclaimer that THEY WERE NOT ACTUALLY STANFORD COURSES and that I WAS NEVER ACTUALLY A STANFORD STUDENT and that furthermore I AM NOT FIT TO LICK THE BOOTS OF A STANFORD STUDENT and so on. (Caltech is better than Stanford anyway, even if whenever you tell people you’re in the economics department they always say, “we have one of those?!”)

My background is in math and economics, but I’ve taught myself quite a bit of computer science over the years, and I consider myself a decent programmer now, to the point where I could probably pass a “code on the chalkboard” job interview if that’s what I needed to do in order to support my family and/or drug habit.

I’d worked on some machine learning projects at previous jobs, so I’d picked up some of the basics, but I’d never taken any sort of course in machine learning. At my current job I’m the de facto subject matter expert, so I thought the courses might be a good idea.

The classes ended up being vastly different from one another. Here’s kind of a summary of each:

ml-class:

* Every week 5-10 recorded lectures, total 1-2 hours of lecture time. (There was an option to watch the lectures at 1.2x or even 1.5x speed, which I always used, so it might have been more like 3 hours in real-time. This means that if I ever meet Ng in real-life, he will appear to me to be speaking very, very slowly.)

* Most lectures had one or two (ungraded) integrated multiple choice quizzes with the sole purpose of “did you understand the material I just presented?”

* Each week had a set of “review questions” that were graded and were designed to make sure you understood the lectures as a whole. You could retake the review if you missed any (or if you didn’t) and they were programmed to slightly vary each time (so that a “which of the following are true” might be replaced with a “which of the following are false” with slightly different choices but covering the same material).

* Each week also had a programming assignment in Octave, for which they provided the bulk of the code, and you just had to code in some functions or algorithms. I probably spent 2-3 hours a week on these, a fair amount of that chasing down syntax-error bugs in my code and/or yelling at Octave for crashing all the time.

* Machine learning is a pretty broad topic, and this course mostly focused on what I’d call “machine learning using gradient descent.” There was some amount of calculus involved (although you could probably get by without it) and a *lot* of linear algebra. If you weren’t comfortable with linear algebra, the class would have been very hard, and the programming assignments probably would have taken a lot longer than they took me.

* The material was a nice mix of theoretical and practical. I’ve already used some of what I learned in my work, and if there was a continuation of the class I would definitely take it. As it stands I’m right now signed up for the nlp-class and the pgm-class, which should be starting soon, both of which are relevant to what I do.

* The workload, and the corresponding amount I learned, were substantially less than they would have been in an actual 10-week on-campus university course. This was great for me, since I also have a day job and a baby. If I were a full-time student being offered ml-class instead of a real machine learning class, I might feel a little cheated. (I saw a blog post by some Stanford student whining about this, but he was mostly upset that the hyphen classes were devaluing his degree. Someone should have reminded him about the disclaimer.)

* The class was very solidly prepared. The lectures were smooth and well thought out. The review questions did a good job of making sure you’d learned the right things from the lectures. The programming assignments were good in their focus on the algorithms, although that did insulate you from the real-world messiness of getting programs set up correctly.

* It certainly seemed like Ng really enjoyed teaching, and at the end of the last lecture he thanked everyone in a very heartfelt way for taking the class.

ai-class:

* Every week dozens of lectures, each a couple of minutes long, interspersed with little multiple choice quizzes. This was my first point of frustration, in that the quizzes were frequently about parts of the lecture that hadn’t happened yet. Furthermore, they often asked ambiguous questions, or questions that were unanswerable based on the material presented so far.

* Each week had a final quiz that you submitted answers for one time only. Then you waited until the deadline passed to find out if your answers were correct (and then you waited another day, because the site always went down on quiz submission day, and so they always extended the deadline by 24 hours). These quizzes were also ambiguous, which meant that if you wanted to get them correct you had to pester for clarifications (and sometimes for clarifications of the clarifications).

* This resulted in the feeling that the grading in the class was stochastic, and that your final score was more reflective of “can I guess what the quiz-writer really meant” than “did I really understand the material”. Although I didn’t particularly care about my grade in the class, I was still frustrated and disheartened by the feeling that the quizzes were more interested in *tricking* me than in helping me learn.

* What’s more, the quizzes often seemed to focus on what seemed to me tangential or inconsequential parts of the lesson, like making sure that I really, deeply understood step 3 of a 5-step process, but not whether I understood the other four steps or the process itself.

* The material also seemed very grab-bag, almost like an “artifical intelligence for non-majors” survey course.

* Anyway, partly on account of my finding the class frustrating, partly on account of time pressures, and partly because I didn’t feel like I was learning a whole lot, I dropped the ai-class after about four weeks.

* There were no programming assignments, but there was a midterm and a final exam, both after I quit the course. From what I could tell, they were longer versions of the quizzes, with the same problems of clarity and ambiguity. (I never unfollowed the @aiclass twitter, and during exam time it was a steady stream of clarifications and allowed assumptions.)

* Compared to the tightly-planned ml-class, the ai-class felt very haphazard. In addition, the ml-class platform I found more pleasant to use than the ai-class platform.

* I quit long before the last lecture, so I have no idea how heartfelt it was.


One thing about both classes: I *hate* lectures. I learn much better reading than I do being lectured at, and I found the lecture aspect of *both* classes frustrating. I have complained about this in many venues, but my prejudice is that if you’re using the internet to make me watch *lectures*, you’re not really reinventing education, because I still have to watch lectures, and I hate lectures. Did I mention that I hate lectures?

By way of comparison, I have also been doing CodeYear. It is currently below my level (I am plenty familiar with variables and if-then statements and for loops), but I don’t know much Javascript, and the current pace makes me hopeful that it will get interesting for me after another month or two.

If you don’t know that platform, it gives you a task (“create a variable called myName, assign your name to it, and print it to the console”) and a little code window to do it in. Then you click “run” and it runs and tells you if you got it right or not. There is a pre-canned hint for each problem.

What I really like about Codeacademy is that I can do it at my own pace. The lessons are wildly variable in quality, but I’m glad not to have to sit through hours of lectures every week. They also do “badges”, which I find more satisfying than I wish I did. That said, I suspect someone with no experience debugging code would find the experience impenetrable and waste hours tracking down simple syntax errors, and indeed I saw on Hacker News a post to this effect a few weeks ago.

In the end, despite all this, the way I learn best is through a combination of reading books and writing actual code. I’ve had to learn F# over the last month, which I’ve done by reading a couple of (quite nice) books and writing a lot of actual code. It’s hard for me to imagine the course that would have done me any better (or any faster).

Similarly, if I wanted to learn Rails (which some days I think I do and other days I think I don’t), I have trouble imagining a course that would do better for me than just working through the Rails Tutorial (which I have skimmed, which has convinced me that I could learn well from it).

Similarly similarly, I suspect that the right Machine Learning book (and some quality time with e.g. Kaggle) would have been much more effective for me than the ml-class was. But if such a book exists, I haven’t found it yet.