bookblog.net

 

Main
Search This Site

« back to can you tell if a writer is a man or a woman?
» forward to final climax in bed?

Discussion Archives
blindness
bridge of birds
a canticle for leibowitz
charlie and the chocolate factory
chronicle of a death foretold
a confederacy of dunces
confessions of an ugly stepsister
coraline
the curious incident of the dog in the night-time
descent into hell
the diamond age
don quixote
fight club
the five people you meet in heaven
fried green tomatoes at the whistle stop cafe
the ghost writer
good in bed
harry potter and the sorcerer's stone
a home at the end of the world
house of leaves
if on a winter's night a traveler
invisible monsters
the kite runner
life of pi
memoirs of a geisha
middlesex
mysterious skin
noir
norwegian wood
one for the money
the poisonwood bible
revenge
the secret life of bees
shopgirl
the solitaire mystery
the stupidest angel
thumbsucker
the time traveler's wife
troll
veronika decides to die
watch your mouth
a wrinkle in time

Monthly Archives
February 2006
January 2006
December 2005
November 2005
October 2005
September 2005
August 2005
July 2005
June 2005
May 2005
April 2005
March 2005
February 2005
January 2005
December 2004
November 2004
October 2004
September 2004
August 2004
July 2004
June 2004
May 2004
April 2004
March 2004
February 2004
January 2004
December 2003
November 2003
October 2003
September 2003
August 2003
July 2003
June 2003
May 2003
April 2003
March 2003
February 2003
January 2003
December 2002
November 2002
October 2002
September 2002
August 2002
July 2002
June 2002
May 2002
April 2002

 

August 15, 2003

The Gender Genie

After Andy brought this New York Times Magazine article about gender and word choice to our attention, we (me) here at BookBlog headquarters (my bedroom), decided to test the algorithm by hand-scoring a few passages. We chose a few books off the top of three piles conveniently located right behind us, and conducted our own unscientific survey. We spent an hour typing and counting and adding and subtracting, and discovered that the algorithm correctly predicted the author from our sample of 10 books 50% of the time.

Then, taking a cue from Rich and borrowing his idea for a textual gender predictor, we decided to create a little application of our own:

The Gender Genie

Despite Koppel and Argamon’s claim that their algorithm is 80% accurate, our application only manages near 50% just as our hand-scoring did.

Why would we bother to announce a gender-predicting program that’s right only half of the time? Well, we find it entertaining. Plus it amuses us when we put in passages written by a man and discover that he writes like a girl. And it’s pretty.



comments
Trackback Excerpt: Over at BookBlog [dot net] (very interesting site so far) a "Gender Genie" has been developed to determine if a passage was written by a male or a female. Just type in your passage and submit. Thankfully, I *do* write...
[Read More]

Trackback Excerpt: Does your writing reveal your gender?...
[Read More]

Trackback Excerpt: Yesterday Ogged admitted to being a woman. It's now official: according to the Gender Genie's analysis of that very post (minus the quotes), he's female. (Thanks to jill/txt for the pointer to the Genie, which resides at Bookblog.)......
[Read More]

According to this, I write like a girl.

So does Rich. But Mary writes like a boy.

Somehow, that makes me laugh.

Actually, it's funny. Stylistically, the algorithm rewards specificity and qualification as male, and connectivity and negation as female.

Therefore, "The two balls" is male. "Not with balls" is female.

I'm divorced, and Mary's got a pair, metaphorical though they be. Works for me. :-D

Rich is onto something there. It rewards specificity by adding points while punishing connectivity and negation by taking away points. Because of the formula being written in this way, I've had writings by some women come up with a negative score. I find it very interesting how it makes femaleness seem almost debilitating when it comes to writing.

And, Rich, that's probably the nicest compliment you've ever given me. Keep that thought in mind as I finally get around to ripping apart the last comment you left on one of the threads below. I've been sidetracked by several mini-projects recently and have been neglecting you.

The above comment, by the way, scores female.

Has anyone been able to get "female" for writing in passive voice?

Some friends of mine fooled around with this and are getting 100-percent male results for paragraphs in the passive voice.

"She was the person with whom he spent time."

Passive, and female.

I put in a few passages from my site, my blog and a friend's blog. His stuff came up female and mine (I'm a "girl") came up male about 7 out of 9 times. I'm sorry I didn't keep track of the true frequency.

Interesting little diversion you've made there.

We were having a discussion about the "Gender Genie" at FadetoBlack.com/discussions when I noticed the disappointing results your so-called "Genie" achieves:

quote:
------------------------------
Am I right?
yes 9989 (45.82%)
no 11812 (54.18%)
------------------------------

That's WORSE than you'd expect from random selection! Have there been any adjustments made to improve the accuracy of the Gender Genie? Are any modifications planned? As it is, it doesn't seem any more than guesswork.

If you don't plan any changes, I'd suggest you replace the algorithm with random selection. Such a system could be considerably quicker, conserve resources and improve accuracy.

I also read the nature.com article. It mentioned that the test is 98% accurate in distinguishing non-fiction from fiction. However, the 2000-word article of my own that I submitted was non-fiction, but the Gender Genie calculated that I am a female. I understand the "female" outcome correlates with fiction. Are there any plans for an on-line Genre Genie that distinguishes fiction from non-fiction?

Julia, thanks for trying it. We think it’s fun.

Aussieintn, thanks for your suggestions but I didn’t write the algorithm, just the program that runs it. Until Koppel and Argamon update their research, we’re going to leave it alone since we already knew (see the post above) that it would run at only about 50% accuracy. If you pay attention to the words it searches for and how the score is calculated, you’ll see that it is a bit more than guesswork even if the tally makes it look like flipping a coin. In addition, the stats are entirely user-dependent, so I wouldn’t trust them implicitly.

No plans so far for a Genre Genie, so feel free on jumping at that idea if you feel like putting in the time to build it.

And you know the bit above in which we say it makes us laugh when we find out a guy writes like a girl? Bwahahahahahahahaha!

Being that it only seems to be correct about 50% of the time, you might as well just install a script that randomly guesses the gender of the author. It would be as accurate.

Not impressed. It even said Mary was a male based on this news post.

On my blog I posted a few comments about why I think the design of the form may be encouraging more people to hit "no" rather than "yes"... that might explain why the program seems to be guessing (or worse). Here's part of what I wrote:

As it is now, you paste in your text, push a button, read the results, and then either push the "go back button", erase your old text and paste in new text; or, you tell the Genie whether it was right or wrong, view a popup with the results, close the popup, push "go back," erase the old text, and paste in new text.

When I give feedback to the Genie, I am committing myself to some extra (boring and unnecessary) steps. When the Genie is right, I am not particularly motivated to tell it so; when the Genie is wrong, I am more motivated to tell it. Thus, it may be the case that the feedback form is attracting more negative than positive responses.

I guess I write like a girl half of the time.

I wonder if the size of the sample has any bearing. I tried six samples of 1 to 6 paragraph length. Does anyone know if this thing does any better with longer samples?

Correct me if I'm wrong, but doesn't this favor third person omnitient as male, something most non-fiction is written in, and first person as female? I'm only guessing on this, but it seems that a large number of personal pronouns and possesives are used in first person fiction work, as opposed to the more "detatched" form of non fiction work.

Josh, I'm not impressed by the fact that you didn't bother to read the previous comments and didn't ascertain that both points you mentioned have already been covered.

I love the predilection of people who have no clue where an algorithm comes from to post critiques of the people who wrote the page. "Not impressed," indeed.

I'm thinking some clarification of the algorithm might be in order... Who knows whether the NYT simplified the algorithm, or if it's a "parlor trick" version of a more elaborate system Koppel and Argamon used to get the 80% they claimed.

Has anyone found a way to contact these guys?

As it is now, you paste in your text, push a button, read the results, and then either push the "go back button", erase your old text and paste in new text; or, you tell the Genie whether it was right or wrong, view a popup with the results, close the popup, push "go back," erase the old text, and paste in new text.

When I give feedback to the Genie, I am committing myself to some extra (boring and unnecessary) steps. When the Genie is right, I am not particularly motivated to tell it so; when the Genie is wrong, I am more motivated to tell it. Thus, it may be the case that the feedback form is attracting more negative than positive responses.

Pffffft.

Dennis, you’re absolutely right in that the way the feedback question is set up, it doesn’t give you much motivation to tell it it’s right. When the program was in beta testing, Andy came up with the idea to have it monitor its accuracy. I made it simple on purpose, but at the time had no idea it would become so popular. I’ve been looking into adding some more code to make it track more variables (like how many woman have been misidentified as male and vice versa), but I doubt it’ll offer anyone any more motivation to tell it when it’s correct. Polling is entirely user-dependent, so I’d take such figures with a grain of salt.

Roy, in my experience with it so far, I’ve discovered that it works much better with larger blocks of text. When the researchers came up with the algorithm, the texts they used were all over 40,000 words. Longer texts mean more keywords, so word count probably has something to do with its accuracy.

Ben, I’m not sure the perspective of the text has anything to do with it. The theory is that men write about things while women write about connections. If that’s the case, then a man writing in the first person should also be using a lot of the masculine keywords.

Rich, I tried slogging through their paper to see if it offered more insight, but it put me to sleep. If you manage to read it and find anything that would help make the application more functional, let me know.

Some other things skewing your results:

People are probabably putting in passages from their blogs a majority of the time. Blogs talk about people more than things, a lot of the time, and it's "personal" writing. That makes it fairly "feminine" in scope.

Also, I agree that people are probably clicking no more than yes, and not always clicking "yes" when they get a truly accurate read. This would be especially prevalent in men that put in a blog passage, get a "female" result, and click "no" because they think the tool is picking fun of them. :)

Other observations:

When I input passages by two or more authors, my results are not very accurate, regardless of the genders: I think this is just too skewed to get good results.

When I put in fiction that I or other women wrote, I get female. When I put in fiction that many of my male friends wrote, I get male. I clicked "yes" for you several times! I don't find that fiction or nonfiction skews the results that much at all!

Lastly, when I checked out a passage I wrote in first person from the point of view of a male, I got a "male" result. This, to me, is what I find most valuable about the tool--it's accuracy regarding my gender impersonation. I will probably use it on occasion just for that reason.

Amanda, thanks very much for your feedback and kind words.

Mary, I'll look over the paper and get back to you.

Amanda, your observations are spot-on. The application has several limitations, which we were aware of before releasing it. However, I went ahead with finishing it because I had no idea it was going to get picked up by thousands of people on the Internet.

Other limitations include: not picking up on s-apostrophes, numbers in the middle or the end of a word, and not knowing how to distinguish when apostrophe-s is a contraction or a possessive. One of these days I’ll get around to making it smarter, but I’ve got several projects in front of it right now.

Your idea for using The Gender Genie to test the "voice" of a text is a good one. Several days ago, I was contacted by a well-known author saying that he used it on a work-in-progress for the very same purpose.

"Josh, I'm not impressed by the fact that you didn't bother to read the previous comments and didn't ascertain that both points you mentioned have already been covered."
Heh. Perhaps I did read them. Then, having ascertained that both of the points I mentioned had been covered, came to the conclusion that they needed to be further reiterated to drive home the fact that this script is utterly inane. A point you apparently knew from the start, yet plod forward anyway for some percieved noble cause. Way to go.

 

Category Archives
book news
book reviews
club news
other cool sites
site news
stuff about us
textbooks

Support BookBlog
Author:
Title:

Keyword:
Additional Features:
 First Edition
 Signed
 Dust Jacket
 Any Binding
 Hard Cover
 Soft Cover