je désire le monde

Old enough to know better, young enough to do it anyway.

Urbanely urban nerd. Apple fanatic. Technophile. Audiophile. Protolinguist.

Recent Tweets @


Tricky Linguistics x

The novel-sentence argument is one of the classic arguments used in favour of a generative approach to language, seen also in Chomsky’s sentence Colourless green ideas sleep furiously. Since I can at any moment produce a sentence that no one’s ever heard before, and moreover since I can produce and understand a sentence that is completely nonsensical, language can’t be simply repeating memorized phrases but must involve the ability to generate new phrases of my own and understand phrases based on their structure and not just lists of words. 

The really neat thing is that it doesn’t even require friendly milk or colourless green ideas in order to create a novel sentence. We’re doing it all the time. Chances are, the last sentence that you typed that wasn’t a formulaic greeting (“how are you?” and the like) and is longer than 8 or 10 words will actually get zero hits if you google it with with quotation marks.

I actually just tested two phrases from the first paragraph and they both passed: “The novel-sentence argument is one of the classic arguments used in favour of a generative approach to language" and "Since I can at any moment produce a sentence that no one’s ever heard before” although of course they’ll stop working soon after I publish this post.

(via qc-linguisticsclub)

Learners of foreign languages are prisoners of their own phonologies.
Bruce Hayes (via silvanasickofyourshit)

SAYS YOU! *tries hardest to produce arabic ع*

(via estifi)

(via didyoudrinkmygingerale)

  • Studying math in kindergarten: 2+2 = 4
  • Studying language in kindergarten: "A, B, C, D, E, F, G"
  • Me: 😄
  • Studying math in middle school: 4x-3 = 2x+7
  • Studying language in middle school: "The quick brown fox jumps over the lazy dog."
  • Me: 😀
  • Studying math in high school: ∫5x⁴ dx = x⁵+c
  • Studying language in high school: "It was the best of times, it was the worst of times."
  • Me: 😐
  • Studying math in college: |v₁ x v₂| = |v₁||v₂|sin θ
  • Studying language in college: S → N VP; VP → V NP; NP → Det N
  • Me: 😧
  • Studying math in grad school: □P → ¬◇¬P
  • Studying language in grad school: □P → ¬◇¬P
  • Me: 😱


In case you hadn’t heard, J.K. Rowling has been recently discovered to be writing crime novels under the pseudonym Robert Galbraith. Language Log has a guest post by Patrick Juola, one of the computational/forensic linguists who found that Galbraith’s novel was statistically more similar to Rowling’s The Casual Vacancy than to several other crime novels. 

Language is a set of choices, and speakers and writers tend to fall into habitual, or at least common, choices. Some choices come from dialect (the reason an Englishman drives a lorry but an American a truck), some from social pressure (if I need to impress someone with my vocabulary, I can utilize a polysyllabic lexicon instead of just using big words), and some just seem to come. An example of the latter category is in the use of many function words. If you ask yourself where the salad fork is relative to the plate, you quickly realize that it’s usually to the left of the plate. Or is it? It’s just as likely to be “on” the left of the plate, “at” the left of the plate, or perhaps “to” the left SIDE of the plate. Same fork, same position, and at least four different choices for how to describe it, none of which correspond to any sociolinguistic or cognitive variable with which I’m familiar.

But what we do know is that much of this apparently free variation is actually rather static at least at an individual level. So by studying examples of documents a person has written, we can build a model of the kind of choices that person makes.[…] Mosteller and Wallace studied the writing styles ofThe Federalist Papers in the mid-60s and showed, for example, that Alexander Hamilton never used the word “whilst” but that James Madison never used the word “while.” More interestingly, they both used the word “by,” but Madison consistently used it twice as often.


I was approached by a reporter, Cal Flyn, from the Sunday Times, to assess this kind of variation in the writings of “Robert Galbraith,” a first-time novelist and author of The Cuckoo’s Calling. (I learned later from the papers that the paper had received an anonymous tip via Twitter that Galbraith was the pen name of J.K. Rowling. And in retrospect there were a lot of other clues as well. For example, Galbraith apparently was surprisingly good at describing women’s clothing, possibly suggesting a female author.) Would I be willing to look into this? I said yes, of course, but with a couple of conditions. First, I needed clean (machine readable) copies of Cuckoo, and clean samples of something comparable undisputedly by Rowling herself. Secondly, I needed other comparable samples from other writers (distractor authors, to use the common term) to assess the degree of variation.

For the past ten years or so, I’ve been working on a software project to assess stylistic similarity automatically, and at the same time, test different stylistic features to see how well they distinguish authors. […] First, most people are average in word length, just as most people are average in height. Very few people actually write using loads of very long words, and few write with very small words, either. Second, you learn that average word length isn’t necessarily stable for a given author. Writing a letter to your cousin will have a different vocabulary than a professional article to be published in Nature. So it works, but not necessarily well. A better approach is not to use average word length, but to look at the overall distribution of word lengths. Still better is to use other measures, such as the frequency of specific words or word stems (e.g., how often did Madison use “by”?), and better yet is to use a combination of features and analyses, essentially analyzing the same data with different methods and seeing what the most consistent findings are. That’s the approach I took. (Read the rest at Language Log)

Juola stresses that this type of analysis can only show that different types of writing are more or less similar to each other, not that a certain person was definitely the author of a particular text, but evidently it was enough evidence to convince Rowling to admit to the pseudonym. 

This type of analysis reminds me of a news story a while back showing that people who get along with each other are more likely to use similar frequencies of function words, and that this relationship holds for both famous correspondents like Freud and Jung as well as for modern couples on speed-dates


Linguist puns. Ya gotta love ‘em!

(via o-eheu)




Unnecessary “fillers” in our speech. I’d rather have “like” than up-talking, though (if we had to choose one, that is). Ewwww, up-talking. Then again, a combination of the two would render me homicidal maniac.

yes, colloquial speech is stupid

discourse particles are stupid

quotative particles are stupid

fillers are stupid

lower registers of speech = stupid!!!!!!woah aaa/

Wow fuck you if you’re really that stuck on yourself that you can’t accept other people speak differently than you, including “like”, up talk and vocal fry/creaky voice.

(via politicsoflanguage)


  • Franciscus puer qui homo est.
  • Quomodo meum nomen paene scivisti?!
  • Proximam scientiam multorum habeo.


My education in a nutshell


The Lonely Island - SEMICOLON (feat. Solange)

Posted due to punctuation and Solange Knowles - both relevant to our interests.


i don’t just like reduplication, i LIKE IT like it

(via officialmidwest)