Category Archives: Data

T-Shirts, Feminism, Parenting, and Data Science, Part 1: Colors

Before I was a parent I never gave much thought to children’s clothing, other than to covet a few of the baby shirts at T-Shirt Hell. Now that I have a two-year-old daughter, I have trouble thinking of anything but children’s clothing. (Don’t tell my boss!)

What I have discovered over the last couple of years, is that clothing intended for boys is fun, whereas clothing intended for girls kind of sucks. There’s nothing inherently two-year-old-boy-ish about dinosaurs, surfing ninjas, skateboarding globes, or “become-a-robot” solicitations, just as there’s nothing inherently two-year-old-girl-ish about pastel-colored balloons, or cats wearing bows, or dogs wearing bows, or ruffles. Forget about gender, I want Madeline to grow up to be a “surfing ninja” kind of kid, not a “cats wearing bows” kind of kid. An “angry skateboarding dog” kind of kid, not a “shoes with pretty ribbons” kind of kid.

Accordingly, I have taken to buying all of Madeline’s shirts in the boys section, the result of (her boy-ish haircut and) which is that half the time people refer to her as “he”. This doesn’t terribly bother me, especially if she ends up getting the gender wage premium that people are always yammering about on Facebook, but it makes me wonder why such a stark divide between toddler boy shirts and toddler girl shirts. And, of course, it makes me wonder if the divide is so stark that I can build a model to predict it!

The Dataset

I downloaded images of every “toddler boys” and “toddler girls” t-shirt from Carters, Children’s Place, Crazy 8, Gap Kids, Gymboree, Old Navy, and Target. Because each one had their shirts at a different (random) website location, I decided that using an Image Downloader Chrome extension would be quicker and easier than writing a scraping script that worked with all the different sites.

I ended up with 616 images of boys shirts and 446 images of girls shirts. My lawyer has advised me against redistributing the dataset, although I might if you ask nicely.

Attempt #1: Colors

(As always, the code is on my GitHub.)

A quick glance at the shirts revealed that boys shirts tend toward boy-ish colors, girls shirts toward girl-ish colors. So a simple model could just take into account the colors in the image. I’ve never done much image processing before, so the Pillow Python library seemed like a friendly place to start. (In retrospect, a library that made at least a half-hearted attempt at documentation would probably have been friendlier.)

The PIL library has a getcolors function, that returns a list of

(# of pixels, (red, green, blue))

for each rgb color in the image. This gives 256 * 256 * 256 = almost 17 million possible colors, which is probably too many, so I quantized the colors by bucketing each of red, green, and blue into either [0,85), [85,170) or [170,255]. This gives 3 * 3 * 3 = 27 possible colors.

To make things even simpler, I only cared about whether an image contained at least one pixel of a given color [bucket] or whether it contained none. This allowed me to convert each image into an array of length 27 consisting only of 0′s and 1′s.

Finally, I trained a logistic regression model to predict, based solely on the presence or absence of the 27 colors, whether a shirt was a boys shirt or a girls shirt. Without getting too mathematical, we end up with a weight (positive or negative) for each of the 27 colors. Then for any shirt, we add up the weights for all the colors in the shirt, and if that total is positive, we predict “boys shirt”, and if that total is negative, we predict “girls shirt”.

I trained the model on 80% of the data and measured its performance on the other 20%. This (pretty stupid) model predicted correctly about 77% of the time.

Plotted below is the number of boys shirts (blue) and girls shirts (pink) in the test set by the score assigned them in the model. Without getting into gory details, a score of 0 means the model thinks it’s equally likely to be a boys shirt or a girls shirt, with more positive scores meaning more likely boys shirt and more negative scores meaning more likely girls shirt. You can see that while there’s a muddled middle, when the model is really confident (in either direction), it’s always right.

shirts_by_score

If we dig into precision and recall, we see

P(is actually girl shirt | prediction is “girl shirt”) = 75%
P(is actually boy shirt | prediction is “boy shirt”) = 77%
P(prediction is “girl shirt” | is actually girl shirt) = 63%
P(prediction is “boy shirt” | is actually boy shirt) = 86%

One way of interpreting the recall discrepancy is that it’s much more likely for girls shirts to have “boy colors” than for boys shirts to have “girl colors”, which indeed appears to be the case.

Superlatives

Given this model, we can identify

The Girliest Girls Shirt (no argument from me):

girliest_girl_shirt

The Boyiest Girls Shirt (must be the black-and-white and lack of color?):

boyiest_girl_shirt

The Girliest Boys Shirt (I can see that if you just look at colors):

girliest_boy_shirt

The Boyiest Boys Shirt (a slightly odd choice, but I guess those are all boy-ish colors?):

boyiest_boy_shirt

The Most Androgynous Shirt (this one is most likely some kind of image compression artifact, the main colors are boyish but the image also has some girlish purple pixels in it that cancel those out):

most_androgynous

The Blandest Shirt (for sure!):

blandest

The Most Colorful Shirt (no argument with this one either!):

coloriest

Scores for Colors

By looking at the coefficients of the model, we can see precisely which colors are the most “boyish” and which are the most “girlish”. The results are not wholly unexpected:

151.71
80.68
69.35
49.69
43.83
40.99
35.94
30.56
26.08
24.06
20.89
20.49
18.89
17.67
1.29
-17.37
-21.77
-29.95
-49.91
-56.4
-66.77
-69.52
-70.15
-82.17
-119.1
-175.2
-224.74

In Conclusion

In conclusion, by looking only at which of 27 colors are present in a toddler t-shirt, we can do a surprisingly good job of predicting whether it’s a boys shirt or a girls shirt. And that pretty clearly involves throwing away lots of information. What if we were to take more of the actual image into account?

Coming soon, Part 2: EIGENSHIRTS

ESPN, Race, and Presidents

Inspired by (and lifting large amounts of code from) Trey Causey’s investigation of the language that ESPN uses to discuss white and non-white quarterbacks, I similarly wondered about the language ESPN uses to discuss white and non-white Presidents. For instance, a common stereotype is that non-white Presidents assassinate their citizens using unmanned drones, while white Presidents assassinate their citizens using polonium-210. Do such stereotypes creep into sportswriting?

Toward that end, I used Scrapy to scrape all the articles from the ESPN website that matched searches for (president obama), (president bush), (president clinton), and so on. This gave me a total of 543 articles. Then, using Wikipedia, Mechanical Turk, and a proprietary deep learning model, I categorized each of these Presidents as either “white” or “non-white”.

Using NLTK, I tokenized each article into sentences and then identified each sentence as being about

  • one or more white Presidents
  • one or more non-white Presidents
  • both white and non-white Presidents
  • no presidents

Curiously, while there were very few “non-white” Presidents, there were nonetheless about four times as many “non-white” sentences as “white” sentences. (This is itself an interesting phenomenon that’s probably worth investigating.)

I then split each sentence into words and counted how many times each word appeared in “white”, “non-white”, “both”, and “none” sentences. Like Trey, I followed the analysis here, similarly excluding stopwords and proper nouns, which I inferred based on capitalization patterns.

Finally, for each word I computed a “white percentage” and “non-white percentage” by looking at how likely that word was to appear in a “white” sentence or a “non-white” sentence and adjusting for the different numbers of sentences.

After all that, here are the words that were most likely to appear in sentences about “white” Presidents:

plaque 5
severed 4
grab 4
investigation 3
worn 3
unable 3
child 3
suppose 3
block 3
living 3
holders 3
pounds 3
ticket 3
blackout 3
thrown 3
exercise 3
scene 3
televised 3
upon 3
executives 3

Clearly this reads like something out of “CSI” or possibly “CSI: Miami”. If I were to make these words into a story, it would probably be something macabre like

The President grabbed the plaque he’d secretly made from a living child‘s severed foot and worn sock. The investigation supposed a suspect weighing at least 200 pounds who could have thrown the victim down the block, not a feeble politician famous for his televised blackout when he tried to exercise but was unable to grab his toes.

In constrast, here are the words most likely to appear in sentences about “non-white” Presidents:

bracket 32
interview 21
trip 16
champions 16
fan 48 1
asked 35 1
carrier 11
celebrate 11
thinks 11
early 11
eight 11
personal 10
picks 10
appearance 10
far 9
hear 9
congratulating 9
given 9
troops 9
safety 9
fine 9
person 9

This story would have to be something uplifting like

The President promised to raise taxes on every bracket before ending the interview. As a huge water polo fan, he needed to catch a ride on an aircraft carrier for his trip to celebrate with the champions. “Sometimes I get asked,” he thinks, “whether it’s too early to eat a personal pan pizza with eight toppings. So far I always say that I hear it’s not.” His safety is a given, since he’s surrounded by troops who are always congratulating him for being a fine person with a fine appearance.

As you can see, it has a markedly different tone, but not in a way that obviously correlates with the stereotypes mentioned earlier. Whatever prejudices lurk at ESPN are exceedingly subtle.

Obviously, this is only the tip of the iceberg. The algorithm for identifying which sentences were about Presidents is pretty rudimentary, and the word-counting NLP techniques used here are pretty basic. Another obvious next step would be to pull in additional data sources like Yahoo! Sports or SI.com or FOX Sports.

If you’re interested in following up, the code is all up on my github, so have at it! And I’d love to hear your feedback.

Secrets of Fire Truck Society

Hi, I gave a talk at Ignite Strata on “Secrets of Fire Truck Society” and at the end I promised that for more information you could visit this blog. Unfortunately, I haven’t had time to write a blog post. Here are some links to tide you over until I do:

Hacking Hacker News

Hacker News, if you don’t know it, is an aggregator / forum attached to Y Combinator. People submit links to news stories and blog posts, questions, examples, and so on. Other people vote them up or down, and still other people argue about them in the comments sections.

If you have unlimited time on your hands, it’s an excellent firehose for things related to hacking. If your time is more limited, it’s more challenging. People submit hundreds of stories every day, and even if you only pay attention to the ones that get enough votes to make it to the homepage, it’s still overwhelming to keep up:

What’s more, a lot of the stories are about topics that are boring, like OSX and iPads and group couponing. So for some time I’ve been thinking that what Hacker News really needs is some sort of filter for “only show me stories that Joel would find interesting”. Unfortunately, it has no such filter. So last weekend I decided I would try to build one.

Step 1 : Design

To make things simple, I made a couple of simplifying design decisions.

First, I was only going to take into account static features of the stories. That meant I could consider their title, and their url, and who submitted them, but not how many comments they had or how many votes they had, since those would depend on when they were scraped.

In some ways this was a severe limitation, since HN itself uses the votes to decide which stories to show people. On the other hand, the whole point of the project was that “what Joel likes” and “what the HN community likes” are completely different things.

Second, I decided that I wasn’t going to follow the links to collect data. This would make the data collection easier, but the predicting harder, since the titles aren’t always indicative of what’s behind them.

So basically I would use the story title, the URL it linked to, and the submitter’s username. My goal was just to classify the story as interesting-to-Joel or not, which meant the simplest approach was probably to use a naive Bayes classifier, so that’s what I did.

Step 2 : Acquire Computing Resources

I have an AWS account, but for various reasons I find it kind of irritating. I’d heard some good things about Rackspace Cloud Hosting, so I signed up and launched one of their low-end $10/month virtual servers with (for no particular reason) Debian 6.0.

I also installed a recent Ruby (which is these days my preferred language for building things quickly) and mongoDB, which I’d been meaning to learn for a while.

Step 3 : Collect Data

First I needed some history. A site called Hacker News Daily archives the top 10 stories each day going back a couple of years, and it was pretty simple to write a script to download them all and stick them in the database.

Then I needed to collect the new stories going forward. At first I tried scraping them off the Hacker News “newest” page, but very quickly they blocked my scraping (which I didn’t think was particularly excessive). Googling this problem, I found the unofficial Hacker News API, which is totally cool with me scraping it, which I do once an hour. (Unfortunately, it seems to go down several times a day, but what can you do?)

Step 4 : Judging Stories

Now I’ve got an ever-growing database of stories. To build a model that classifies them, I need some training data with stories that are labeled interesting-to-Joel or not. So I wrote a script that pulls all the unlabeled stories from the database, one-at-a-time shows them to me and asks whether I’d like to click on the story or not, and then saves that judgment back to the database.

At first I was judging them most-recent-first, but then I realized I was biasing my traning set toward SOPA and PIPA, and so I changed it to judge them randomly.

Step 5 : Turning Stories into Features

The naive Bayes model constructs probabilities based on features of the stories. This means we need to turn stories into features. I didn’t spend too much time on this, but I included the following features:

* contains_{word}
* contains_{bigram}
* domain_{domain of url}
* user_{username}
* domain_contains_user (a crude measure of someone submitting his own site)
* is_pdf (generally I don’t want to click on these links)
* is_question
* title_has_dollar_amount
* title_has_number_of_years
* title_references_specific_YC_class (e.g. “(YC W12) seeks blah blah)
* title_is_in_quotes

For the words and bigrams, I removed a short list of stopwords, and I ran them all through a Porter stemmer. The others are all pretty self-explanatory.

Step 6 : Training a Model

This part is surprisingly simple:

* Get all the judged stories from the database.
* Split them into a training set and a test set. (I’m using an 80/20 split.)
* Compute all the features of the stories in the training set, and for each feature count (# of occurrences in liked stories) and (# of occurrences in disliked stories).
* Throw out all features that don’t occur at least 3 times in the dataset.
* Smooth each remaining feature by adding an extra 2 likes and an extra 2 dislikes. (2 is on the large side for smoothing, but we have a pretty small dataset.)
* That’s it. We YAML-ize the feature counts and save them to a file.
* For good measure, we use the model to classify the held-out test data, and plot a Precision-Recall curve

Step 7 : Classifying the Data

Naive Bayes classifier is fast, so it only takes a few seconds to generate and save interesting-to-Joel probabilities for all the stories in the database.

Step 8 : Publishing the Data

This should have been the easiest step, but it caused me a surprising amount of grief. First I had to decide between

* publish every story, accompanied by its probability; or
* publish only stories that met some threshhold

In the long term I’d prefer the second, but while I’m getting things to work the first seems preferable.

My first attempt involved setting up a Twitter feed and using the Twitter Ruby gem to publish the new stories to it as I scored them. This worked, but it wasn’t a pleasant way to consume them, and anyway it quickly ran afoul of Twitter’s rate limits.

I decided a blog of batched stories would be better, and so then I spent several hours grappling with Ruby gems for WordPress, Tumblr, Blogger, Posterous, and even LiveJournal [!] without much luck. (Most of the authentication APIs were for more heavy-duty use that I cared about — I just wanted to post to a blog using a stored password.)

Finally I got Blogger to work, and after some experimenting I decided the best approach would be to post once an hour, all the new stories since the last time I posted. Eventually I realized that I should rank the stories by interesting-to-Joel-ness, so that the ones I’d most want to read would be at the top:

and the ones I want to read least would be at the bottom:

The blog itself is at

http://joelgrus-hackernews.blogspot.com/

Step 9 : Automate

This part was pretty easy with two cron jobs. The first, once an hour, goes to the Hacker News API and retrieves all new unknown stories (up to a limit of like 600, which should never be hit in practice). It then scores them with the last saved model and adds them to the database. In practice, the API isn’t working half the time.

The second, a few minutes later, takes all the new stories and posts them to the blog. The end result is a blog of hourly scored digests of new Hacker News posts.

Step 10 : Improve the Model

The model can only get better with more training data, which requires me to judge whether I like stories or not. I do this occasionally when there’s nothing interesting on Facebook. Right now this is just the above command-line tool, but maybe I’ll come up with something better in the future.

Step 11 : Profit

I’m still trying to figure this one out. If you’ve got any good ideas, the code is here.

Hyphen Class Post-Mortem

Last fall I signed up for two of the hyphen classes: the Machine Learning ml-class (Ng) and the Artificial Intelligence ai-class (Thrun and Norvig). Both were presented by Stanford professors but one of the conditions of taking the courses was that whenever I discuss them I am required to present the disclaimer that THEY WERE NOT ACTUALLY STANFORD COURSES and that I WAS NEVER ACTUALLY A STANFORD STUDENT and that furthermore I AM NOT FIT TO LICK THE BOOTS OF A STANFORD STUDENT and so on. (Caltech is better than Stanford anyway, even if whenever you tell people you’re in the economics department they always say, “we have one of those?!”)

My background is in math and economics, but I’ve taught myself quite a bit of computer science over the years, and I consider myself a decent programmer now, to the point where I could probably pass a “code on the chalkboard” job interview if that’s what I needed to do in order to support my family and/or drug habit.

I’d worked on some machine learning projects at previous jobs, so I’d picked up some of the basics, but I’d never taken any sort of course in machine learning. At my current job I’m the de facto subject matter expert, so I thought the courses might be a good idea.

The classes ended up being vastly different from one another. Here’s kind of a summary of each:

ml-class:

* Every week 5-10 recorded lectures, total 1-2 hours of lecture time. (There was an option to watch the lectures at 1.2x or even 1.5x speed, which I always used, so it might have been more like 3 hours in real-time. This means that if I ever meet Ng in real-life, he will appear to me to be speaking very, very slowly.)

* Most lectures had one or two (ungraded) integrated multiple choice quizzes with the sole purpose of “did you understand the material I just presented?”

* Each week had a set of “review questions” that were graded and were designed to make sure you understood the lectures as a whole. You could retake the review if you missed any (or if you didn’t) and they were programmed to slightly vary each time (so that a “which of the following are true” might be replaced with a “which of the following are false” with slightly different choices but covering the same material).

* Each week also had a programming assignment in Octave, for which they provided the bulk of the code, and you just had to code in some functions or algorithms. I probably spent 2-3 hours a week on these, a fair amount of that chasing down syntax-error bugs in my code and/or yelling at Octave for crashing all the time.

* Machine learning is a pretty broad topic, and this course mostly focused on what I’d call “machine learning using gradient descent.” There was some amount of calculus involved (although you could probably get by without it) and a *lot* of linear algebra. If you weren’t comfortable with linear algebra, the class would have been very hard, and the programming assignments probably would have taken a lot longer than they took me.

* The material was a nice mix of theoretical and practical. I’ve already used some of what I learned in my work, and if there was a continuation of the class I would definitely take it. As it stands I’m right now signed up for the nlp-class and the pgm-class, which should be starting soon, both of which are relevant to what I do.

* The workload, and the corresponding amount I learned, were substantially less than they would have been in an actual 10-week on-campus university course. This was great for me, since I also have a day job and a baby. If I were a full-time student being offered ml-class instead of a real machine learning class, I might feel a little cheated. (I saw a blog post by some Stanford student whining about this, but he was mostly upset that the hyphen classes were devaluing his degree. Someone should have reminded him about the disclaimer.)

* The class was very solidly prepared. The lectures were smooth and well thought out. The review questions did a good job of making sure you’d learned the right things from the lectures. The programming assignments were good in their focus on the algorithms, although that did insulate you from the real-world messiness of getting programs set up correctly.

* It certainly seemed like Ng really enjoyed teaching, and at the end of the last lecture he thanked everyone in a very heartfelt way for taking the class.

ai-class:

* Every week dozens of lectures, each a couple of minutes long, interspersed with little multiple choice quizzes. This was my first point of frustration, in that the quizzes were frequently about parts of the lecture that hadn’t happened yet. Furthermore, they often asked ambiguous questions, or questions that were unanswerable based on the material presented so far.

* Each week had a final quiz that you submitted answers for one time only. Then you waited until the deadline passed to find out if your answers were correct (and then you waited another day, because the site always went down on quiz submission day, and so they always extended the deadline by 24 hours). These quizzes were also ambiguous, which meant that if you wanted to get them correct you had to pester for clarifications (and sometimes for clarifications of the clarifications).

* This resulted in the feeling that the grading in the class was stochastic, and that your final score was more reflective of “can I guess what the quiz-writer really meant” than “did I really understand the material”. Although I didn’t particularly care about my grade in the class, I was still frustrated and disheartened by the feeling that the quizzes were more interested in *tricking* me than in helping me learn.

* What’s more, the quizzes often seemed to focus on what seemed to me tangential or inconsequential parts of the lesson, like making sure that I really, deeply understood step 3 of a 5-step process, but not whether I understood the other four steps or the process itself.

* The material also seemed very grab-bag, almost like an “artifical intelligence for non-majors” survey course.

* Anyway, partly on account of my finding the class frustrating, partly on account of time pressures, and partly because I didn’t feel like I was learning a whole lot, I dropped the ai-class after about four weeks.

* There were no programming assignments, but there was a midterm and a final exam, both after I quit the course. From what I could tell, they were longer versions of the quizzes, with the same problems of clarity and ambiguity. (I never unfollowed the @aiclass twitter, and during exam time it was a steady stream of clarifications and allowed assumptions.)

* Compared to the tightly-planned ml-class, the ai-class felt very haphazard. In addition, the ml-class platform I found more pleasant to use than the ai-class platform.

* I quit long before the last lecture, so I have no idea how heartfelt it was.


One thing about both classes: I *hate* lectures. I learn much better reading than I do being lectured at, and I found the lecture aspect of *both* classes frustrating. I have complained about this in many venues, but my prejudice is that if you’re using the internet to make me watch *lectures*, you’re not really reinventing education, because I still have to watch lectures, and I hate lectures. Did I mention that I hate lectures?

By way of comparison, I have also been doing CodeYear. It is currently below my level (I am plenty familiar with variables and if-then statements and for loops), but I don’t know much Javascript, and the current pace makes me hopeful that it will get interesting for me after another month or two.

If you don’t know that platform, it gives you a task (“create a variable called myName, assign your name to it, and print it to the console”) and a little code window to do it in. Then you click “run” and it runs and tells you if you got it right or not. There is a pre-canned hint for each problem.

What I really like about Codeacademy is that I can do it at my own pace. The lessons are wildly variable in quality, but I’m glad not to have to sit through hours of lectures every week. They also do “badges”, which I find more satisfying than I wish I did. That said, I suspect someone with no experience debugging code would find the experience impenetrable and waste hours tracking down simple syntax errors, and indeed I saw on Hacker News a post to this effect a few weeks ago.

In the end, despite all this, the way I learn best is through a combination of reading books and writing actual code. I’ve had to learn F# over the last month, which I’ve done by reading a couple of (quite nice) books and writing a lot of actual code. It’s hard for me to imagine the course that would have done me any better (or any faster).

Similarly, if I wanted to learn Rails (which some days I think I do and other days I think I don’t), I have trouble imagining a course that would do better for me than just working through the Rails Tutorial (which I have skimmed, which has convinced me that I could learn well from it).

Similarly similarly, I suspect that the right Machine Learning book (and some quality time with e.g. Kaggle) would have been much more effective for me than the ml-class was. But if such a book exists, I haven’t found it yet.

Endogeneity, or “The Skill of the Brewmeister”

The latest OkCupid blog post is one of their more interesting ones:

No matter their gender or orientation, beer-lovers are 60% more likely to be okay with sleeping with someone they’ve just met. Sadly, this is the only question with a meaningful correlation for women.

Of course, once every dude starts asking this question on the first date and every girl figures out that her answer is a signal of how easy she is, the correlation will almost surely vanish. Even if a woman is willing to put out on the first date, that doesn’t mean she wants to advertise the fact early on. (It’s at least possible that I am out of touch with the youth of America and am wrong about this.)

Accordingly, I predict a brief surge in beer-lover questions (and guys trying to bring their first dates to Bierstubes, Bierhausen, Bierhalls, and the like), followed by the evolution of non-committal, correlation-breaking answers to “do you like the taste of beer?” like

  • Only if it comes from a big keg,
  • It depends on the skill of the brewmeister, and
  • I do like the taste of beer, but not on the first date.

If OkCupid were evil (which they probably are, now that they’re a division of IAC), they’d instead sell a limited number of subscriptions to this sort of information, so that dudes could use these questions without having to worry that the dating pool had been overfished.

If they were really evil (which they probably are, now that they’re a division of IAC), they’d report false correlations and then laugh at people who tried to put them into practice. However, I assume that if they’d done this they would have chosen a funnier question than “Do you like the taste of beer?”

I’ll explore this further in my next post, “The Only Question That Correlates With Whether Women Put Out Is ‘Did you ever find Bugs Bunny attractive when he put on a dress and played girl bunny?’”

New Coffee Plan Is Bad Karma

People throw away a lot of coffee cups. That was the motivation behind the BetaCup Challenge, a competition to create a more “sustainable” alternative to the disposable coffee cup.

If I’d found out about the competition while it was going on, I might have entered my “dig paper cups out of the trash and re-use them” idea, but I guess it’s kind of late for that, since they’ve announced the winner:

The Karma plan: A chalkboard at the coffee shop will chart each person who uses a reusable mug. The tenth person to order a drink with a reusable cup will receive his or her drink free. By turning a freebie program into a communal challenge, Karma Cup would create incentives for everyone to bring reusable mugs. (After all, the more people participate, the more free stuff is given away and the more likely you are to get something free.) That, in turn, would eliminate rather than simply redesign the nefarious disposable cup.

It’s pretty clear that the inventors of this idea (as well as the judges) have never worked retail, or they’d know the Hobbesian depths that consumers will sink to in order to get free stuff. The “Karma” plan is a recipe not only for barista-customer arguments, but also for chalkboard-stalking and line-cutting and fistfights. A better name might be the “There Will Be Blood” plan.

More interesting is the claim in the Fast Company article: “After all, the more people participate, the more free stuff is given away and the more likely you are to get something free.” It’s interesting because it doesn’t make any sense. The likelihood of getting something for free depends on two factors: whether you bring a reusable cup, and where the chalkboard count stands.

The first doesn’t depend on how many people participate — the decision to bring a reusable cup is (presumably) yours alone. The curious thing is that (in the absence of fistfights and line-cutting) the second doesn’t either. The relevant criterion is the “previous mug count,” which runs from 0 up to 9. If it’s 0 when you arrive, you move it to 1. If it’s 1, you move it to 2. And so on, and then if it’s 9 (which makes you the tenth customer), you get your coffee for free and it resets back to 0.

Now, as long as there are no fistfights and line-cutting, there’s no reason why the mug count should favor any number over the other. When you go to Burger King and they give you a number on your ticket, you should be just as likely to get a number ending in 0 as a number ending in 5, no matter how many customers are going through the place. So you should get a number ending in 0 about 10% of the time.

If we ignore strategic (i.e. fistfight) considerations, the same should be true of the mug count. It should be 9 about 10% of the time, which means that (if you bring a mug) you should get your coffee free about 10% of the time. This is true whether one customer out of 20 brings a mug or every customer brings a mug. (Obviously, if no one brings a mug the count will be stuck at 0 all day, but even if you’re the only customer who brings a mug, you should still get every 10th cup free.)

The big caveat again is that there are no fistfights. Probably, though, what happens is that you get lots of chalkboard-stalkers who hide in the restroom until the count reaches 9 and then cut in line to get a free coffee. In this case it’s easy to imagine that, unless you are one of the line-cutters, the chalkboard count will almost never be at 9 when it’s your turn to purchase.

However, the basic idea is good — psychologically a 10% chance of saving $2 is more exciting than for sure saving 20 cents. Accordingly, a superior implementation (from a fistfight perspective) might involve a random 10% chance of getting your coffee free when you bring your own mug. This would get rid of the incentives to cut in line, hide in the bathroom, and punch people. Plus, that way they could let you make your own mark on the chalkboard, which is better for both baristas (who won’t have chalky hands while they’re preparing your coffee) and customers (who like writing on the chalkboard because it reminds them of Coach Wickman’s geometry class).

It just needs a catchy, Eastern-religious name. I’m thinking “Samsara.”

Dear Facebook, I am skeptical of your claims.

Don’t get me wrong, I like data mining as much as the next guy, and without your help I might never have realized that “People who like Reason magazine also like Rand Paul” and that 13 of my friends like something called “Burn Notice.”

Nonetheless, for some reason your latest doesn’t ring true to me:


Here is what I suspect happened. About a year ago there was a brouhaha involving a pro-market op-ed piece the Whole Foods CEO wrote. Some people on the left called for a boycott of the stores, and in response a number of people “liked” Whole Foods on Facebook to show support.

My guess is that the Facebook “supporters” of Whole Foods are disproportionately those who liked the op-ed piece, and that Sarah Palin is indeed popular among that crowd. In this case the people who “like” Whole Foods on Facebook seem to be not representative of those who like Whole Foods in real life.