Thursday, August 20, 2009

This is a blog readable post

My university is installing and testing a new, campus wide alarm system, which they are calling the Siren Outdoor System (SOS). Is it overkill? The whole campus and surrounding areas are already under constantant video surveilance, and they have a pretty good emergency texting program. I don't know what this says about the student body (or more importantly, their parents) if they can't feel safe with out a giant alarm system safety net, but if the university has got the time and money to burn, god bless. That's not the point of my post.

To quote one of the notification e-mails about the system:
The purpose of the SOS is to transmit voice intelligible emergency messages and alert tones to the outdoor campus environment during crisis events.

I do believe that a "voice intelligible" emergency message is one which is intelligible to voices. At best, it might be a message which is intelligible as a voice. I don't think it can really mean what they want it to mean: an intelligible voice emergency message. See: "human readable".

Perhaps I'm suffering from a bracketing error, and they mean it's a

[ voice [ intelligible [ emergency message ] ] ]

but based on the rest of the e-mail, it seems pretty clear that they mean it's a

[ [voice [intelligible] ] [ emergency message] ]

It just can't mean what they want it to mean.

Sunday, March 15, 2009

Data Loss and Human Knowledge

I've been thinking a lot about data generation and maintenance. I generate a lot of data when I'm just sitting around doing nothing, and even more when I'm working. Having a good system for data organization is key, but I'm becoming increasingly concerned about my data's lifespan.

For a while now, I've been trying to save as much of my spreadsheet data in plain text format as possible. It's the natural thing for me to do as I've gotten more proficient with R and now Python, but it also saves my data from getting tied up in proprietary formatting that I'm not sure how the proprietors will maintain it. With everything else, I've also been trying to strip it down to the simplest appropriate standard.

But the fact is that no matter what I do, I still always feel as if my data is at risk. Call me data-paranoid I guess. Maybe I just lack a reliable system for maintaining multiple complete backups of my necessities, but no matter how many copies I have in however many places, hard drives crash and servers fail. The probability of losing an ephemeral data file seems infinitely more likely to me than destruction of the equivalent hard copy.

It occurs to me that an unimaginable amount of data is lost on a daily basis world-wide, but its impact seems negligible. I wonder how this global data loss compares to other historical losses.

The most catastrophic data loss for western civilization that I know of would be the fire at the Library at Alexandria. According to my layman's understanding, the Library at Alexandria was the largest collection of human knowledge at the time, the destruction of which may have contributed to the West's decent into the Dark Ages. This is the system crash to end all system crashes. So how much was lost?

I've tried to do some ballbark figures here. According to Wikipedia, the library's collection was between 500,000 and 1,000,000 scrolls. Who knows how much information was on all of these scrolls, but if they were all Torahs, they'd have 304,805 characters. According to my friend Kyle, I should figure about 1.8 bits per character (his citiation was Brown et. al 1992, I'll update with the full citation when I can). Here's the final equation:

  • 500,000 ~ 1,000,000 undecorated scrolls ≈ 32 ~ 64 gigabytes.


That's a lot of data, but is also about equivalent to one really bad laptop crash! How many Alexandrias happen every day? Of course, an equivalent information catastrophe today would involve a proportional loss of information, and I'm not sure how to begin calculating the size of current human knowledge.

One more interesting tidbit: By character count, Wikipedia would be about 11,500 scrolls. The Wikipedia database size as of October 2006 (from here) was 4.4 GB, which translates to about 68,900 scrolls.

Friday, January 23, 2009

Vowel Plotting

I recently wrote a script to plot the data published with the Atlas of North American English in an interesting way. The script and documentation is on my website here. I'm working on version 1.1 now, which should make it a much smarter and flexible script.

The output of the script looks like this:

It produces a plot like this for every dialect region and for every speech community in that dialect region.

I think the greatest strength of this way of plotting subsystems is the representation of the short vowel system. You can very clearly see the Northern Cities Shift, the Canadian Shift, and the Pittsburgh Shift (apparently not isolated to Pittsburgh).

Other relationships aren't as immediately clear, though. My goals for the future are to work out some plotting mechanisms to make other relationships of interest more clear, such as the relationship of tense-lax pairs, or perhaps highlighting of mergers splits and oppositions of particular interest.

I'd also like to work out how to incorporate information about other dimensions of vowel data, such as duration. The best idea I've had about incorporating vowel duration data in F2xF1 plots involved utilizing gray scales (inspired by Visual Display of Quantitative Information), but there are a lot more details to work out on that.

Wednesday, January 7, 2009

It's a Wonderful Life

Today, a friend of mine and I were in a coffee shop, and we happened to eaves drop on a conversation a customer was having with the barista. I only caught this bit of what the customer was saying:
...No, not "merry." I'm married

My friend turned to me and said, "Somewhere, a linguist is getting their wings."

Saturday, January 3, 2009

"How's the dissertation going?"

One of these days, I'll write a dissertation. It'll probably take me a little while, and well intentioned people who don't know any better will probably ask me how the process is going.

In preparation for that day, I've already prepared a response that captures the same anxiety, induces the same guilt, and perhaps calls the same pain to mind.
Well meaner:"So how's your dissertation going?"
Sufferer:"Oh, it's going good! How about you? How's your dental health doing?"

Disqus for Val Systems