Tuesday, July 10, 2012

Visualizing Graphical Models

I'm anticipating presenting research of mine based on Bayesian graphical models to an audience that might not be familiar with them. When presenting ordinary regression results, there's already the sort of statistical sniper questions along the lines of "What if the effect is actually being driven by this other correlate?" and "That effect might result from assumptions a, b, and c of the test." etc. Sometimes these questions are useful, but sometimes they seem to detract from the substantive issues at hand. And frequently, I see talks get way too bogged down in anticipating questions like this by cramming too much statistical detail into their talk, leaving not enough time to do justice to the theoretical importance of their results.

Add to this the customizability of graphical models, the number of possible distributions and parameter settings, and the notion that "Bayesian" =  "subjective", and I'm really feeling stressed out by the presentational task ahead of me.

So, I'm trying to figure out a good way to both make the model I've built fully available and accessible to someone who can't read JAGS code, has a little bit of presentational pizzaz, and also allows me to focus in on the parameters of specific interest. I started off trying to use Graphviz to produce directed graphs, and wound up with this (an actual level in the model I'm hoping to present).
 It's all a ton of spaghetti, difficult to hilight the particular parameters of interest, and doesn't represent some important distinctions (like stochastic and deterministic nodes).

I've moved on from Graphiz to trying to build an interactive tree diagram using the Javascript InfoViz Toolkit. It's been kind of slow going, since I don't know any Javascript, and am still trying to sort out what functions are basic and which ones are defined by the toolkit. Click on the image below to visit the visualization.

It's getting there, but I'm not convinced yet that it'll do the job of making the whole model digestible. For one, I'm modeling effects at a few different levels. The token level is represented in this visualization, but I'm also looking at speaker level effects, treating the linguistic context as a within speaker variable, and at word level effects. The way I'm setting things up now, that's going to call for two more trees like this one.

Maybe the lesson here is that I should just fit and present a simpler model, but remember those sniper questions? I'm worried that if I leave out someone's favorite correlate, I'll 1) have to deal with it in the questions and 2) they'll leave unconvinced, or rather, they'll leave convinced that it was their favorite correlate doing the work all along. But these are really research anxieties that no visualization toolkit on earth could assuage.

7 comments:

  1. I think Dave Blei's students typically use OmniGraffle to write up graphical models. It doesn't come close to giving the level of detail you're looking for, but it has the strength of being standard practice.

    ReplyDelete
  2. I think Dave Blei's students typically use OmniGraffle to write up graphical models. It doesn't come close to giving the level of detail you're looking for, but it has the strength of being standard practice.

    ReplyDelete
  3. If you are looking at Javascript toolkits, you should definitely check out MxGraph.


    We use it at InsightMaker.com to allow people to build compartmental ODE models

    ReplyDelete
  4. Yeah, but that would also require laying down $100 to buy OmniGraffle.

    ReplyDelete
  5. I think the style John Kruschke uses in Doing Bayesian Data Analysis is pretty clear. He says he draws them by hand.
    http://doingbayesiandataanalysis.blogspot.com/2012/05/graphical-model-diagrams-in-doing.html

    ReplyDelete
  6. gephi.org (aka "photoshop for graphs") might also be useful for you. it's free but in beta, so make sure to save a lot. anyway, it's fairly flexible and allows you to manually manipulate things (when that kind of freedom is desirable), and also to upload data sets programmatically or via flat file

    ReplyDelete
  7. I've also struggled with visualization of graphical models. I've had some luck with graphviz as long as the number of variables isn't too high, but once you surpass a certain number of variables, it is difficult to visualize the model in any package. A few things I've tried:


    -I often manually produce visualizations for Bayes Nets or SEM models (the types of models I'm generally visualizing) in a system dynamics modeling package called Vensim, simply because I like the curvy arrows it produces.
    -I like some of the layout algorithms in the program yEd, but the final graphs aren't clean enough looking for me, so I end up laying out the graph manually using the yEd graph as a reference.
    -There is also a plugin to use graphviz as the layout engine in Visio (GraphVizio), which I have occasionally found useful.
    -In addition to leaving out variables to make visualizations easier (like you mentioned doing), I've also sometimes "clustered" variables or "summarized" groups of variables... not changing the underlying model but having a visualization with these summarized groups of variables as an overview... then as you go deeper in, you can show the actual interactions between all of the variables.


    But I've never found an automated visualization tool that I've been completely happy with for large models... so thanks for sharing the things you're experimenting with. It's really helpful to see what others are trying.

    ReplyDelete