Sunday, August 28, 2011


Well, here in Philadelphia, we've just braved Hurricane Irene. From what I've heard, damage here was relatively minimal, and we haven't lost power. My friends further north in NYC are in my thoughts, cause it looks like they got really hammered.

The silver lining here for me is that I was able to go collect data from the Weather Underground station about six blocks away from where I live. Here are the numbers.

We got 5.68 inches of rain, which fell most steadily between 6PM and midnight last night.

Barometric pressure, on the other hand, hit the floor at 6AM today.

As for wind speeds, there are two measures from the weather station. Speed is, I believe, average wind speed over the reporting time bin (which varies between 1 and 7 minutes...), and Gust is, I believe, the maximum speed during that time bin. Either way, our max wind speeds were around 11PM last night, and they've stayed pretty high into this afternoon.

Tuesday, August 23, 2011

Earthquake: Do your part for data collection!

An earthquake just happened on the East Coast, my first! It turns out the US Geological survey has an online survey for earthquakes called "Did you feel it?" and the data is freely available! So

Go take the survey!

As of now, it looks like survey response has really petered out.

But you can download the data and some graphs here, in the downloads tab.

I whipped up this quick visualization of the responses.

Look at that big depressing gap in the response data, right where the epicenter was! And all across Pennsylvania.

If you're from those areas, you really ought to go take the survey!


Well, I feel a little stupid. It looks like there are two locations on the USGS site for this earthquake, and the one I was looking at is not up-to-date... Maybe I don't feel so stupid, it's not the best kind of design.

The real data to download is here. I've already updated the links above.

And here's the real visualizations. Here's the raw data:

And here's mean values across a grid.

Wednesday, August 17, 2011

Does blogging do me any good? A quantitative analysis.

I've been wondering if blogging does me any good. I don't mean for the heart and soul. I enjoy blogging and am going to keep it up (except for those end-of-semester hiatuses). But I've been wondering if blogging does me any good professionally, or whatever. Obviously, "a professional or whatever good" is hard to define, so I'll define it according to the data that I have.

I maintain, along with this blog, an academic website where I have all of my more serious research stuff. I've got Google analytics set up on both my blog, and my academic site, keeping track of page views. So, if I can detect that page views of my blog drive some page views to my academic website, then I'll conclude that blogging is doing me some professional good. This makes a certain kind of sense, since what matters to me at this particular stage of my professional life is getting my ideas out there, and my ideas are catalogued on my academic site.

The raw data

Here is one year's worth of traffic to Val Systems. Those two huge spikes are thanks to Mark Liberman, who reblogged my post about Brittany Spears' tongue, and to the Car Talk Guys, who linked to my post about their short-a system on the Car Talk site for a bit Sociological images, where I guest posted about a "grammar" book.

Now here is the traffic from my academic site, and my research page on that site from the same time period.

As you can see, my academic site gets a lot less page views than my blog. Prospects are not very bright.


My first step of analysis was to figure out how correlated page views of each site were within each site. That is, how correlated are page views on my blog with page views from one day later on my blog, or two days later, etc. To calculate this, I used the acf() function in R. Here's the autocorrelation function from my blog. The x-axis represents how many days into the future you're comparing page views, and the y-axis represents the correlation between page views separated by that many days.

It looks like page views on my blog are pretty well correlated with the pages views from one day before (0.45). After that, there is a correlation drop off, which I'll interpret as new-post-decay. It seems like influence that a single new post has on my blog traffic is fairly minimal after five days.

Here's the autocorrelation function for my academic site.

As you can see, the over-all size of the correlations are much smaller than for the blog. This is most likely because each new post is a new event that happens on my blog, which can have an effect which lasts for a few days, whereas nothing happens on my academic site in the same way. However, there is an apparently cyclic pattern, where page views are most positively correlated at 7 day intervals, and most negatively correlated at 3 to 4 day intervals.

Duh! Who does work on the weekends?

To factor out this cyclic pattern, I fit a linear regression of page views for my academic site and research page with weekday as a categorical predictor. I'll use the residuals from these regressions for doing the cross-correlation.


Next, I checked the cross-correlation of (residualized) page views. This checks to see how correlated page views are between any two of the sites at different time lags. First, here's the cross correlation of my main academic site and my research page. I knew these would have to be highly correlated, since my research page is the most clicked link on my main page.

Correlations with negative lag indicate that visits to my research page were correlated with visits to my main academic site a few days later. Positive lags mean visits to my academic page indicate that visits to my academic site were correlated with visits to my research page a few days later. The correlation at 0 indicates how correlated visits to my academic page and my research page were on the same day.

Unsurprisingly, the only strong correlation between visits to my main academic site and my research page are on the same day. That spike around 10 days makes no sense, so it's probably just noise.

So, drum-roll please, how correlated are visits to my blog and my main academic site?

I would analyze this as bupkis. Likewise for my research page.

To sum up

It looks like blogging is just a fun diversion for me right now. Even though it would have been a lot of fun to come to my advisor or department chair with strong results that blogging is professionally fruitful, I'm fine with the way things turned out.

However, I shouldn't have been surprised. If I was trying to use blogging as a platform for promoting my professional work, I wasn't doing it very well. If you're looking at my blog now (vs an RSS subscription), you may notice that I've added some links to the right, which lead to my academic site, and to my github site. Why not try to make blogging work for me a little bit?

Sunday, August 14, 2011

Max Weber on why there is no decision process for research

In the process of moving, I've come across a bunch of books from my undergrad Sociology minor days, including a book of collected works by Max Weber. You may know him best for the notion of the Protestant work ethic.

At any rate, the volume includes text from a lecture called Science as a Vocation (available free online here), which I've decided to read through because of its personal relevancy, and I've come across this wonderful paragraph.
"Nowadays in circles of youth there is a widespread notion that science has become a problem in calculation, fabricated in laboratories or statistical filing systems just as 'in a factory,' a calculation involving only the cool intellect and not one's 'heart and soul.' First of all, one must say that such comments lack all clarity about what goes on in a factory or in a laboratory. In both, some idea has to occur to someone's mind, and it has to be a correct idea, if one is to accomplish anything worthwhile. And such intuition cannot be forced. It has nothing to do with any cold calculation. Certainly calculation is also an indispensable prerequisite. No sociologist, for instance, should think himself too good, even in his old age, to make tens of thousands of quite trivial computations in his head and perhaps for months at a time. One cannot with impunity try to transfer this task entirely to mechanical assistants if one wishes to figure something, even though the final result is often small indeed. But if no 'idea' occurs to his mind about the direction of his computations and, during his computations, about the bearing of the emergent single results, then even this small result will not be yielded."

This seems to me to be a nice enough refutation, 90 years prescient, of that strange Wired article from a few years ago which claimed that big-data is going to kill the scientific method.

It also resonates with an issue near and dear to my heart: promoting statistical literacy within linguistics. And that takes a two pronged approach. The first is developing statistical competency to be able to run and analyze your own statistics, without relying on semi-automated techniques, like stepwise regression, or put slightly differently, transferring the task entirely to mechanical assistants. The second is to be sure to treat statistical methods as tools for investigation, not to reify them as the objects if inquiry themselves, nor their results as god's truth, spoken by its R-acle.

Tuesday, August 9, 2011

Miraculous Thought Transference

I've already blogged about what I didn't like about Mark Pagel's TED talk. I'm not going to beat up on it more, specifically. Rather, I'd like to problematize the meme that he kicked it off with.
"Each of you possesses the most powerful, dangerous and subversive trait that natural selection has ever devised. It's a piece of neural audio technology for rewiring other people's minds. I'm talking about your language, of course, because it allows you to implant a thought from your mind directly into someone else's mind, and they can attempt to do the same to you, without either of you having to perform surgery." [emphasis added]
Hopefully by now, you've caught on to my own subversive juxtaposition. Briefly, I think this meme is cuter than it is true.

I call it a meme, because I seem to recall it showing up in Steven Pinker's The Language Instinct, and I'm sure it's popped up other places too. Obviously, this meme brushes right up against other issues regarding language and thought. For instance, is language the structure of thought, and does language somehow constrain our thoughts? I'm not well versed enough in these issues to comment, and I only mention them here in order to say that I won't be saying anything about them, except for what I have already said.

Did that make sense? If so, I have succeeded in externalized telepathy. If not, that's sort of my point. Unsuccessful thought implants are a pervasive fact. Just ask the customer and the project leader, or the teacher and the student. If it were so easy to implant thoughts in others' minds, would schooling really take so long? Perhaps thought implant rejection can be blamed on external factors, like inattention on the hearer's part, or the complexity of the thought being transmitted, but I'd be surprised if that was all there was to it.

I'd guess, and this is where I enter into purest speculation, that successful communication between a speaker and hearer has a lot more to do with the fact that people are willing to attribute minds and intentional stances to just about anything, including other people, than with the design specifications of language.

In fact, the ability to implant (false) beliefs in someone else's mind is most definitely not only possible within the domain of language. Just ask Marcel Marceau.

Or, puzzle over this interesting item.

Perhaps language is better  than other natural forms of communication at transmitting propositional content, but it's certainly not ideal for it either. If it were, then there wouldn't have been any need to develop formal logic, or propositional calculus.

So there is the problem that I want to create for this meme. Language does not really "implant a thought from your mind directly into someone else's mind," and insofar as it does, it doesn't do so uniquely above all other forms of communication. It's a pretty meme though, sort of like a poem about linguistics, and it's attention grabbing. But if it matters whether it's true and accurate, I don't think it stands up.

Wednesday, August 3, 2011

Language, Communication, and iPhone

I'm a bit of a caffeine junky. Every day, regardless of where I am, I need to get my fix. I've also been very lucky to do some international traveling, which has put me in the situation where I need a coffee, but I don't speak the local language. And you know what? I've always successfully ordered and paid for my coffee, and even gotten what I intended to order.

Ok, enough speaking in parables. My point is that communication is not the same thing as language, and even complex economic transactions can be successfully carried out with only communication and no language.

And that's why I'm not a big fan of this TED Talk by Mark Pagel, called How language transformed humanity.

I think his introduction is far too simplistic, especially with regards to his passing comments about language acquisition. He says
"Just imagine the sense of wonder in a baby when it first discovers that merely by uttering a sound, it can get objects to move across a room, as if by magic, and maybe into its mouth."
It is obvious that there must be more to the secret sauce of language acquisition than that. Even Nim Chimpsky was able to work out that by merely waving his hands around, he could get things into his mouth. Just read his quotations: Wikipedia/Nim Chimpsky/Quotations. But Nim never acquired language.

There's also something strangely self defeating about his entire evolutionary argument. He seems to say that humans evolved language as a means to the end of creating large, modern societies. I'm sure he doesn't really think it worked like that. Evolution isn't goal oriented, and he's a biologist. Anyway, the last part of his talk is devoted to the "problem" of language diversity, and how we use it to build barriers between populations. The whole talk, laid out in one sentence, becomes:
Humans evolved language in order to encourage cooperation and to build large societies, but then, we actually used it to build divisions between population groups, and that's a problem because of globalization.
How on earth could language be failing at the very goal for which it was apparently evolved?

Now, I'm not saying the world would be exactly the same if there was no language. We probably wouldn't have an iPhone, as Pagel playfully illustrated in his talk. But how much language do we really need to achieve the goal of a large society, and arrive at iPhone? Does language really need to be recursive? If we couldn't say
  • I know [that you hate me].
could we still have arrived at iPhone? Who really needs relative clauses anyway? On the flip side, what if language were more "permissive," and we could say
  • Whati did you see the man who bought ti

These are technical properties of language I'm talking about. They may seem like little details, but they're actually very fundamental to very nature of language. And it's almost impossible to connect them directly to the evolutionary story Mark Pagel is telling. All that story needs is some means of communication, but says nothing about why we have the specific system of language that we do, out of all the possible systems that could have existed.

Needless to say, linguists never concern themselves with questions like "is the evolutionary consequence of high applicatives an iPhone?" and good thing too.

* * *

One thing that I did like was that he said "Tower of B[ei]bel." That's the way I say it.


Apparently Pagel has a habit of saying strange things in public places: LanguageLog/Scrabble tips for time travelers?. Hat tip to Charles Yang.

Disqus for Val Systems