Bayesian networks have proven extremely useful for classifying events and documents, reliability analysis, and in many other fields. Essentially, wherever a well-defined chain of causation given between many pieces of data exists, a Bayes net can help provide probabilities for the “hidden variables” of a system: in the cases above, for example, the category a document belongs to, the probability a system will fail if a certain component fails, etc.
As part of another project here at Highgroove, I’ve been developing a gem called glymour that learns a Bayes net’s structure automatically, which is important when it becomes impractical to manually define causal relationships (e.g. when taking into account dozens of different variables). Working on an open-source gem while relating it to a larger project has made me understand one of its many benefits: open source code is kind of like a constrained writing in which you are constrained to being as general-purpose as possible (and reasonable).
Writing a piece of open source software – especially, of course, a gem or some other kind of package/plugin – forces the coder to write in a highly modular and iterative way. Beyond the normal considerations of reusable code, DRY, etc., all possible use cases must be considered. For example, though I intend to use glymour mainly with ActiveRecord objects as stores for sample data, working on it as a gem had me quickly realize that others using it might be reading from a file, or user input, or many other scenarios. Thus, glymour instead uses a user-defined block for retrieving data, allowing for any of these possibilities. In turn, this made testing much easier (since a simple array could be filled with hashes of test data).
It’s been rewarding working on something intended for the public; we try to work open source as much as we can, and really digging in to an open project has helped me understand why.
Does writing open source software benefit your style? How?