Tuesday, October 18, 2011

An inspiring talk

I recently went to a seminar talk by Michael Nielsen that I really enjoyed. So here are a couple notes and thoughts on that. Nielsen spoke mainly of what he calls "extreme open science", and touched on many of the topics he covers in this essay. This is an area I've been "working" in for the last few months, in quotes because the work is mainly thinking, and, as a result, there aren't that many artifacts to point to. Nielsen emphasized that the system by which scholarly work is evaluated currently does not incentivize open sharing of results, code, data etc. He also brought up the same example I always use, namely that of the NIH's requirement for depositing of genomic data resulting from work they funded, as a case in which a funding agency mandate is creating an incentive for sharing and, consequently, open science. Nielsen is in a position that allows him to be quite inclusive of all relevant good ideas, and vague about whether he's advocating any specific approach in particular. In a sense his attitude seems rather descriptive, so I'm finding reading what he's written informative and quite comprehensive.

At the same time I was trying to find a way to justify my own first stab at an approach in this field, in a way that was consistent with his framework. By necessity, anything I have to say about this has to be a lot more focused, and it has to be translatable to artifacts, code, methods, what have you. That has led me to think of the situation in terms of the following diagram:


The idea here is that the odds of a scientist sharing or not sharing the raw fruits of his or her labor (as opposed to just publishing something *about* those raw fruits) is a function of the incentive to share them and the ease of doing so. For the purposes of this diagram, then, I'm assuming that, even if there is no incentive, if it is very easy to share, a scientist will choose to do so (i.e. the x-intercept is somewhere to the right of 0). Conversely, even if there is an incentive, difficulty in sharing will overwhelm the incentive (i.e. the y intercept is also upward of 0). You can set up this diagram differently by making different assumptions about the intercepts, and of course you can argue about whether the line separating YES (inclined to share) from NO (disinclined to share) should be straight, curved, or whatever. I'm neither a psychologist nor a game theoretician, so I'm sure there's a more sophisticated version of this somewhere out there, but I don't know where to look for it. In any case, the gist of the point I'm trying to make is this: there seem to be two strategies for shifting behavior from NO to YES. The first, shown by the red arrow, is to increase the incentive to share. This is what Nielsen focused on in his talk, making references to the scientific patronage system of yesteryear, and alluding to the possibility of something similar emerging today (e.g. innocentive.com). The second strategy, shown by the blue arrow, is to increase the ease (i.e. decrease the difficulty) of sharing. As a technologist, this is the strategy I've been thinking and writing about. While I love the idea of institutions like NIH and innocentive.com changing the incentive landscape, I, personally, have no influence over them and no voice with which to advocate the wider adoption of such policies. But as a technologist, someone involved with designing open source, next generation system-ware, I do have a say about the environment that scientists conduct science in. My ideas, then, center around making that environment such that it increases the ease of sharing simply by virtue of what it is.

More about the specifics of that coming soon...

Wednesday, October 12, 2011

Steve Yegge's rant

I absolutely *loved* Steve Yegge's rant about Google and platforms. His point is, incidentally, what I tried to hammer in to our Capstone students last Spring Quarter. At least one project (snuffle.us) totally got it.

Tuesday, November 9, 2010

OKMO tech report

Back in 2003 I did some work for my mom. Yep, my mom. She's awesome! She's a doctor and she needed a hardware/software solution for allowing her patients to record "subjective appetite related sensations", namely how hungry, full etc, they were feeling, at particular times. I wrote an all open-source based piece of software for PalmOS 4 that allowed them to do just that. The remarkable thing is just how much research that little piece of code enabled. The tech report describing the design considerations and research enabled by this little program, OKMO, is here.

Thursday, March 4, 2010

A bugreport for XOAD

I realize that XOAD is a dead project and all, but we still have a website around that uses it, and which, in turn, is used by a fair number of people. This website recently broke, probably due to an upgrade in the Apache PHP module (to v. 5.2). This cost me a few sleepless nights, but I also learned a bunch in the process.

The symptom we were seeing was an unserialization error. Basically, XOAD has its own code for serializing Javascript objects into PHP format. The server-side then calls PHP's unserialize() on them. Some of those serialized objects were not unserializing properly. Specifically, objects with deeply nested __meta fields were failing to unserialize. It took days to figure out what the hell was going on, mainly because I've never looked at AJAX before in my life, having spent the last 8 years hacking numerical code in C. In the end it was instrumentation of the offending XOAD code with error_log()s that solved it. And it turned out that the bug was due to human error, a human error that's gone undetected FIVE years!

The error is in classes/Client.class.php in the XOAD_Client::register() function. This function builds objects that are then passed on to the serializer. The line that reads:
$attachMeta[$key] = $valueType;
should be:
$attachMeta[$key] = $value; 

That's it. Isn't it reminiscent of those sentences that completely change meaning when you move around their commas?

So why did this bug show up now? I only have a guess as to the answer. The effect of this bug was that in objects with deep (as in 2 levels down, don't imagine anything extreme) __meta attributes, some of the fields that should have been filled with the values of variables were filled with their types (duh). From a semantic point of view, this did not affect our application, because we never looked into those deep levels of the objects in question. But what about from the point of view of serialization? When the field for which we get the type instead of contents is __size, and __size is then used during serialization in a meaningful way, then there's the potential for problems. Consider this fragment of a correctly serialized object:
"__meta";O:6:"string":7:{s:1:"0";N;s:1:"1";N;s:1:"2";N;s:1:"3";N;s:1:"4";N;s:1:"5";N;s:1:"6";N;}
vs the same fragment serialized incorrectly, with the size field (derived from the __size attribute of the object being serialized) replaced with "int":
"__meta";O:6:"string":int:{s:1:"0";N;s:1:"1";N;s:1:"2";N;s:1:"3";N;s:1:"4";N;s:1:"5";N;s:1:"6";N;}

I suspect that our previous version of the Apache PHP module contained an unserialize() function that unserialized on the basis of curly brackets only. Based on curly bracket matching, the second fragment will unserialize just fine. But if the newer Apache PHP module actually looks at the size field, which precedes the contents of the array in curly brackets, then the first fragment will unserialize, whereas the second won't ("int" not being a valid array size). Is the second, stricter unserializing method the correct one? Absolutely! I don't know that this is what happened, but if I'm right, I can deduce that PHP has improved. I can get behind that.

Tuesday, January 19, 2010

Playing catchup

In the last two months I've moved cities and jobs, returning to my beloved Santa Barbara and UCSB. That explains why blogging activity came to an abrupt halt. As things start to settle into a new routine, I think that it's important to do a debrief of sorts regarding things I learned and ideas I generated in my last postdoc. So you can expect a couple posts on single-cell fluorescence microscopy and quantitative measurements of intracellular protein networks. I also have some ideas about how I think a big impact could be made in High Performance Scientific Computing, the field I used to be in and that I'm transitioning into once again. Stay tuned...

Monday, October 5, 2009

Idea of the day: systematic specification of cmdline parameters

You know how now there are fantastic ways to generate documentation from within source code by using special annotations? I'm referring to things like perldoc and javadoc. They probably call it "self-documenting code" or something equally cheesy, but it's really just a simple markup strategy + a convention for what information is considered useful. Well, my idea today is not exactly analogous, but it's close. Why isn't there a way to systematically specify the format, syntax and semantics of commandline parameters to UNIX programs? For instance, for every binary in a software distribution there could be an XML file (following an as of yet undefined schema, whose definition is really the technical challenge here) specifying all the ways in which that binary can be run. A lot of effort has gone into standardizing the format of "flags", the little -single_char or --blah_blah bits preceding arguments in commandlines. Ditto about --help output. In fact, it's now possible to automagically generate man pages from well-written --help output with help2man. Would it really be such a leap to start including XML specifications for commandline inputs? The potential uses for automation are limitless!

Wednesday, September 23, 2009

How I learned to stop worrying and love twitter

I couldn't resist using this title, but the point of this post is more nuanced. Well, at least what's in my head is more nuanced; now to what extent I'm going to succeed in conveying it... we'll see.

Once upon a time, several months ago, I decided to check out twitter. I was curious, and I overcame my inertia when I heard that there was a vibrant community of climbers breaking new ground in social media. So I started an account under the username "slampoud". Normal, right? I mostly used it to communicate with climbing tweeps, to post announcements of my blogposts on "Little did I know..." (my climbing blog) and "If pressed" (my review blog), and to ruminate. All fine, all within the normal use parameters of twitter.

Except that I found that because of all the indexing, republishing and referencing websites out there, the page rank of my twitter posts was fairly high, and they came to dominate the google search results for "slampoud". Since I'm a UNIX geek, my username is really a professional handle for me. Yet, during that time, if you did a search for "slampoud" you'd get results that included quips about climbing, compliments and gripes about products, and random ruminations about cats and traffic. I found that such search results were disorienting and detracted from my identifiability as a practicing geek.

There was an additional problem. Social media is the new playground for sophisticated folks in product evangelism and promotion. There are companies that do it right, companies that do it wrong, people who do it right and people who do it wrong, as in any field. But by being plugged into the twitter climbing community, by playing by its rules (and there's a fairly delicate, emergent web of RTs) I found that I was being exposed to a lot, way too much, product promotion. Conversely, of course, when it was information about a product that I needed, that information was readily available. Then again, its location next to so much promotion made even useful information suspect.

In a way, I was proud to have slowly gotten the hang of how the thing worked, of who was what to whom, who was worth listening to and who needed to be filtered out, and what the bots and spammers were attracted by. But, also, the situation was becoming slightly ludicrous: if there was someone I felt I needed to filter out, then why not simply drop them from my feed (or "unfollow" them, in twitter-speak)? 9 times out of 10 I felt the obligation not to, sometimes because this was clearly a newb learning the ropes, sometimes because the offender and I were embedded in a network of relationships with others, a network whose balance I didn't want to upset -- both real world reasons applied to an electronic social network of people I essentially didn't know!

So, because I was alarmed at the fact that my professional handle was being overwhelmed by gibberish, and because I was vaguely nauseated by the amount of product promotion that was passing before my eyes, and, finally, because I was put off by the fact that I was applying real world social mores to my twitter-verse, I deleted that account (or thought I had -- turns out twitter keeps them around in case you change your mind, which begs the question: how does one make a fresh start on twitter?). I believe my final tweet was "I am so burned out on this twitter nonsense. Buh bye." I probably used less punctuation, though.

But I'm not a luddite. In fact, more than anything, I find myself mesmerized by twitter, and especially by marketing on twitter, in the same way I'm mesmerized by stock exchange data. You have to admit that to a geek it's fascinating. There are patterns in the thing, and it plugs into or mirrors the real world in interesting ways, yet defies its rules in more interesting ways. It definitely has a pulse.

So I'm back, after a fashion. I'm now "dubid0", the username portion of my throw-away yahoo email. I'm rebuilding my network of climbing friends, who I missed like hell. I'm excluding some sources of mostly noise -- and some good, I have to say there wasn't anyone who was all noise -- that I felt too guilty to exclude last time around. I'm lurking more and RT'ing less, though it's hard to kick the habit of RT'ing my favorite article or blogpost of every morning. And, in the meantime, I'm debating the utility of bringing back the "slampoud" account in a professional capacity. Yeah, I guess I'll do it.