Monday, July 11, 2011

Communication Density and Dialect Boundaries

One linguistics topic which non-specialists are almost always interested in is dialect geography, and I don't think that's strictly due to their desire to have regional biases confirmed. It seems like almost everybody has a genuine interest in where and how people speak differently from themselves. Granted, once you move away from fairly shallow lexical differences into phonetic and phonological ones, a lot of people's eyes glaze over.

When it comes explaining why dialect boundaries are in one place, rather than another, dialect geographers tend to have two answers. First, different regions have different historical settlement patterns. Bill Labov frequently points out that the current phonological boundary between the North and the Midland in the United States coincides with boundary between where log cabins were built versus A-frame houses, which itself coincides with two different immigration streams with different points of origin on the East coast.

Second, there are differential rates of communication between regions. Langauge appears to be transferred crucially by face-to-face communication. If two regions have stronger ties of communication between themselves than with other regions, then we think they're probably going to have more similar dialects. This was basically Keelan Evanini's argumentation about why Erie, PA basically has a Western Pennsylvania dialect, even though it had historically been part of the North.

Given this second hypothesis about why dialect boundaries exist where they do, I was pretty excited to see these results coming out of the Senseable City Lab, which in collaboration with AT&T and IBM Research, has produced maps illustrating how US counties cluster together in terms of cell phone traffic and sms traffic.

The lines between communication clusters are exactly those that I would expect to define dialect boundaries. So, I took the call and sms community maps, and superimposed the major dialect boundaries from the Atlas of North American English. Here are the results.

Communication clustering by Calls

Communication Clustering by SMS

Honestly, I'm a little disappointed with the outcome. I expected that for very large dialect regions, like the West and the South, they would would contain many different communication clusters, so that's fine. Where both a dialect boundary and a communication boundary line up with a state boundary, I don't think it should be counted as an alignment. If there's any tendency for people to be more likely to move within state lines than across state lines, then this alignment along state lines is probably better explained by the first factor, settlement history, than communication density.

The crucial place to look for an alignment between communication and dialects seems to be the Ohio, West Virgina, Pennsylvania trifecta. In neither map does it look like communication density lines up quite right. Certainly, Pennsylvania is cut in half into a Western and Eastern region, but it seems like the Western PA dialect extends further East, almost to the threshold of Philadelphia.

Ohio doesn't seem to be sliced up quite right either. In the calls data, Cleveland clusters with the rest of the state, while with the SMS data, it clusters with Western PA. Dialectally, Cleveland is neither like the rest of Ohio nor Western PA. Rather, it is more similar to Toledo and Detroit to the West, and Buffalo to the East.

There are other unfortunate non-alignments, like how Baltimore is clustered with Virginia, while dialectally it's more similar to Philadelphia, and New England isn't chopped up communicationally the way it is dialectally.

I'll conclude by saying that first, pat answers to explain natural phenomena don't always work out, and second, these communication clusters make some dialect boundaries pretty mysterious. If everyone in Ohio is clustered together into a cell phone calling community, then why don't they all talk the same? The answer to this probably has to do with a third factor: meaningful social divisions which are distinct from communication divisions, but remember what I said about pat answers?

7 comments:

  1. Could you elaborate more on the cluster methodology? I have been exposed to it before but I am not sure exactly how it works.

    ReplyDelete
  2. Steve, do you mean their cluster methodology, or the dialect methodology? The dialect geography approach is pretty simple minded, and basically done by hand. For many different dialect features, you draw a line on a map based on some threshold and see who's in and who's out. When a whole bunch of lines for different features bundle together (and they do), you draw a dialect boundary there.

    What the MIT people did, I dunno, but I imagine it was not done by hand.

    ReplyDelete
  3. Thanks for the shout out, Joe! At least both maps clearly group Erie with Western PA.

    I saw the cell phone map in the Times, and was also very excited by it. Unfortunately, I haven't yet been able to figure out anything about their methodology or if their data is available...

    ReplyDelete
  4. This page from AT&T provides a few details about their methodology. I'm not sure what it means, but they say the clustering was done using a modularity algorithm.

    ReplyDelete
  5. Hey, I believe in modularity! I like it.

    ReplyDelete
  6. Hey, I believe in modularity! I like it.

    ReplyDelete
  7. Steve, do you mean their cluster methodology, or the dialect methodology? The dialect geography approach is pretty simple minded, and basically done by hand. For many different dialect features, you draw a line on a map based on some threshold and see who's in and who's out. When a whole bunch of lines for different features bundle together (and they do), you draw a dialect boundary there.

    What the MIT people did, I dunno, but I imagine it was not done by hand.

    ReplyDelete