Monday, June 30, 2014

This clickbait-headlined Bloomberg article makes the claim that "The Villages", a proliferant retirement community in Florida, is the most rapidly-growing metropolitan area in the US. Something seems vaguely cyberpunk about the place, possibly because it is obscenely large and chock-full of people at the tail ends of their rich lives. That's the traditional "overbearing dystopian future society" part, at least. 

So how is The Villages different from a university of equivalent size? Ohio State (er, The Ohio State University) has about 57,000 students on one campus, making it one of the most populous universities in the nation. It's not quite the 110,000 headcount claimed for The Villages, but I'm assuming that number includes resident employees as well. If we include OSU's non-student employees, their total "resident" population is closer to 87,000, not counting commuter students. 

This is my point: the population size isn't what intrigues me about The Villages. It's not the monolithic overlord problem, either. It's having that many affluent, elderly people in one place. It's a recipe for highly-concentrated success.


Thursday, June 26, 2014

Unlabeled, but not forgotten.

Today I learned about Positive-Unlabeled learning, a type of semisupervised machine learning approach. This is the general problem: if you want a machine learning method to do binary classification, you need to start with examples of items which fit into one classification or the other. This is much easier and more efficient when you can safely say that everything in Column A is not in Column B and vice-versa. That isn't the case with some data. Rather, it's either labeled (Column A) or unlabeled (maybe Column B, or maybe Column A but just unlabeled).

PU learning can be used to define negative examples for protein function prediction.  Citation below:
Youngs N, Penfold-Brown D, Bonneau R, Shasha D (2014) Negative Example Selection for Protein Function Prediction: The NoGO Database. PLoS Comput Biol 10(6): e1003644. doi:10.1371/journal.pcbi.1003644.
I had a brief look at Google's Material Design guide today after seeing it linked by Andy Baio. It's essentially a series of guidelines about how to make an interface or product look Google-y. Much of it is good design advice for other projects. The color palettes are one such example.

When future historians want to know what 2014 looked like, this will be a fairly accurate record. I hope we (or just Google, at least) can improve on a few things; their cascading menus have always looked messy and non-intuitive to me. They just pop up all over the screen.

Wednesday, June 25, 2014

If I had to choose the most insulting sentence currently available in the English language, it would likely be the following:
Those people are mistaken, for reasons I explained in a series of tweets.
The source is here. For more immediate context, it's in reference to people who are in favor of the Oxford comma. The referenced Twitter arguments are of the usual Twitter caliber: glib, poorly organized, and myopic. The source organization, Poynter, is intimately concerned with journalism so I can forgive their editor's concern over generally trivial grammatical squabbles. I have more difficulty with the general concept of "you're wrong because of arguments I've previously made in the worst possible format."

Tuesday, June 24, 2014

I got married this past Saturday! Married to a lady. Photographs available on request. There's also a video on the way. Stay tuned for that.*

In other news, mice can eat a diet of bacteriophage T7 with few ill effects. Who says negative data never gets published?

*"Stay tuned", beyond being a culturally antiquated idiom, is an interesting bit of skeuomorphic language. It's a relic of a time when viewers could be asked to stay on a particular radio or TV station. Most modern radios and TVs aren't manually tuned. The audience also can and will change stations at their own volition. Asking them to remain in place is like asking them to wear the same pants for a week.

Friday, June 13, 2014

Wednesday, June 11, 2014

I've been trying out the Android beta of Swell Radio - it's really quite nice. The basic idea is that it treats podcasts as radio shows, so you can just start the app up and it'll just keep on streaming. I had tried it on iOS a year or so ago and was disappointed to see that Android development was lagging. It looks like they're making progress now.

Its recommendation engine still needs regular maintenance. An average User Experience goes like this:

  1. Swell plays hourly NPR news update
  2. Swell plays a few more NPR clips
  3. Swell plays a Wall Street Journal business update, or at least all of its ads 
  4. User skips to next item in playlist out of irritation
  5. Swell plays another NPR clip
  6. Swell plays a set of Comedy Central standup clips
The issue is transitions. It wouldn't hurt for the app to announce the next item in the list so I'm not left wondering why Paul F. Tompkins is on Morning Edition. I mean, the guy loves NPR so it's not too far-fetched.

Monday, June 09, 2014

Slate apparently hasn't heard of metonymy.

In related links, @SavedYouAClick is just great. It's instant answers to all those unnecessary questions in internet news headlines. I wonder if the process could be automated.
I've been working with some very large protein-protein interaction networks lately. Many of them come out like this in Cytoscape:
An artistic re-interpretation. Not actual data. As if there was anything to find in there, anyway.
They're ominously big but still appear to be scale-free. Strict adherence to power law distributions makes me a bit suspicious even though it shows up everywhere.

Friday, June 06, 2014

Today I learned about the Girvan–Newman algorithm. It's a fairly quick way to break up networks of interacting components into clustered communities. It doesn't really join components in any way they aren't already connected. It just removes connections which don't meet betweenness requirements. If the starting network is a hairball, the final product is either distinct clusters or just a bunch of smaller hairballs (I seem to be getting the latter, for the most part, but my current networks are several thousand nodes with more than 40 thousand edges).

The clusterMaker2 plugin for Cytoscape has an implementation of this kind of community clustering. It uses GLay. The plugin hasn't been quite as fast for me as the GLay authors report but I'm content to blame that on my old* Core2 Quad 2.4GHz and/or the massive changes in Cytoscape over the last few years.


*This one came with the lab and was originally used for working with sequencing data. I'm going to use it until it dies and/or catches on fire.

Thursday, June 05, 2014

I read this Nature news piece about phage therapy today. It's great that phage might finally be taken seriously as an alternative to carpet-bombing bacterial infections with antibiotics.

This sentence caught my eye: "Nature provides an almost inexhaustible supply: no two identical phages have ever been found." That claim isn't entirely true. While bacteriophage genomes are staggeringly diverse, many contain very similar conserved sequences. If we're only considering two phages to be identical if each nucleotide in one genome is present in the other, then it's very close to true. We could, of course, pick two phage out of the same culture and have a good chance of their genomes being 100% identical. Depending on the phage, we may only be talking about few thousand nucleotides (ΦX174 is only about 5 thousand base pairs, for example!). 

Let's not even start on the issue of how frequently we isolate genetically identical bacteria. We'd like to think that a culture of any one bacterial species is a monoculture, but how many cells in that culture are completely genetically identical at any one time? How about their transcriptomes? I suppose that the original idea remains true: bacteria can produce great genetic variation, so bacteriophage can do so just as well.

Bar none

"Kick the bar chart habit", they say. "Use box plots instead and you can show a distribution of your data points", they claim. They're right, of course. I suspect that most people, whether they're scientists or just members of the 24-hour-TV-news audience (there's some overlap between the two groups, of course) prefer bar charts because they're so straightforward. Higher bars are greater in value. What could be easier?

Besides scientific papers and TV news, the biggest influence on chart type is likely Excel. Hell, the newer versions let users create tiny, almost-unreadable sparkline bar charts. The only worse option may be pie charts. My introductory statistics professor was always wary of using the wrong chart at the wrong time, but he constantly warned us that there was never a right time for a pie chart. I'm usually inclined to agree.

So, bottom line: avoid using the default chart options in Excel and/or chart types which sound like desserts.

Wednesday, June 04, 2014

Nothing feels less productive than fighting with Git. The official documentation is the real problem: it's roughly as arcane as Norse runes.* I managed to find a few tutorials today, at least:

  • Atlassian Git tutorial - just a basic rundown of the common commands.
  • Try Git - I find that the Github documentation is just as bad as the Git docs: it makes the basics sound easy but doesn't provide enough detail to know what to do when things go totally wrong. This does the same thing in an interactive way. It's great for the basics but not much more than that.

*They all read like this.

Tuesday, June 03, 2014

An overly-compressed Sierpinski gasket. Say that fast enough and you sound like an irate plumber.

Fractals generally make good desktop wallpaper but viewing them as compressed bitmaps is a bit existentially disappointing. It's like going to the zoo: you know the animals you're seeing are just stand-ins for the authentic complexity of nature.

Monday, June 02, 2014

I've been awfully busy lately between planning for a wedding (well, my own wedding), trying to finish one paper in lab and analyzing the results for another, and a show I'm in this week (see below).
I get to make funny voices on-stage. It's convenient since I'd be making funny voices anyway.
It's still fun to sacrifice life-snippets to the Blog Gods (blods?) so I'll start posting some short bits more frequently. Here's one such item.
A rich stew.
This is what happens when you store rich nutrient agar in a recycling bin for a week or so. It was meant to be a temporary location for some poorly-melted yeast growth media but was forgotten. Now we have a survey of the organisms present in laboratory waste receptacles (at least, we would have if I hadn't just scraped it all out into a biohaz bag).