Writing about this for a mixed audience is hairy. I thought of this line from The Hudsucker Proxy: Norville Barnes is explaining his idea for the (yet to be named) hula hoop. To him, it's a simple, beautiful, fully conceived idea; to everyone else he sounds like either a crazy person, an idiot, or both. Hopefully I can do better.
Fair warning: this post far exceeds my target length of ~750 words. Couldn't be helped.
Don't Call Us, We'll Call You
First, Ancho is a framework. What does that mean? In a traditional ("imperative" or "procedural") computer program, your program has control, and calls other libraries of functions to do some of its work. A framework, on the other hand, is a skeleton that already implements most of the program, and leaves you to write just the parts that are unique to your application. The framework is in control most of the time, and it only "calls" your program code when it needs to -- thus, "don't call us, we'll call you." The computer science term is inversion of control.
If that's a little too abstract, let's look at a physical example. Most farm implements use a standard "interface" to connect to a tractor: a power takeoff (PTO) shaft, a three-point hitch, and hydraulic hoses. You can mix and match lots of different devices (hay rakes and balers, seed drills, fertilizer spreaders, etc.) to many different tractors because they all implement these same interfaces. The tractor is in control; it supplies the implement with power through the PTO, and control impulses through the hydraulic hoses. The implement builds on the foundation of the tractor to provide its specific function.
A framework must provide all the generic functions needed by the various applications that will build on it. It must also define the application programming interface, or API, that governs how applications interact with the framework.
A key part of that definition is explaining to the application programmer how their code will be called. I mention that last part because I've seen it done badly in certain frameworks such as Apple's WebObjects. To continue the tractor analogy, it failed to adequately describe to the implement builder what the tractor was going to do.
A software application framework must clearly define how your code can call on, or be called by, code written by others. This requires carefully specified conventions for other applications to discover the services provided by your application. An example that we're all familiar with might be the cut/copy/paste feature in desktop and mobile operating systems. The Django web framework, in particular, does this poorly.
Finally, a good framework must take all this abstraction, complexity and thoughtful design and hide it behind an API that seems simple and natural. The importance of this cannot be understated; if using the framework is more painful than starting from scratch, no one will use it.
You Are Here
So, what are the extension points that Ancho will define? What is our equivalent to the three-point hitch, power takeoff and hydraulic copulings? Let's start by first defining Ancho's surroundings.
Everything you need to do is inside the dotted line. Your model sits on top of Ancho, which relies on various other underlying parts. Your model code tells Ancho what to do.
Terms & Levels of Analysis
It's going to be easier to discuss what Ancho does and expects if I first define a few terms.
- Model. A particular system to be modeled and simulated, as specified by code that you write or explicitly include in the definition. The definition will consist mostly of Variables, Functions and State objects as defined below.
- Run. A single execution of your Model by the framework, which will result in many Sequences, and may be executed in parallel on many Nodes. You can think of a Run as "what the framework does when you hit the button that says Go."
- Sequence. A single time series within a Run, consisting of several Frames. A single Sequence will always run on a single Node, because otherwise there's too much data to move around from one machine to another. Several Sequences could be running in parallel on several Nodes, but this should not present a problem. If you're a nerd like me, you could think of each Sequence as a simulated parallel universe. A Sequence in Ancho's terminology is like a "trial" in most Monte Carlo literature. I'm using the different term because most Monte Carlo simulations don't offer the ability to do a time series, and Ancho does.
- Node. A single computer in a cluster of computers that are all running identical copies of the Ancho framework and your Model code.
- Frame. A single "time slice", "turn" or "interval" in a time series. I chose the term from the film and video world; most people have an intuitive understanding of a single frame of film. Ancho is explicitly designed to handle dynamic simulations where the values and behavior in one Frame can depend on the system's values from any previous one.
- State. State as in "the state of things," not "the State of Missouri."
- Sequence. A Sequence always has a State that is passed from one Frame to the next. If being able to access variables from earlier Frames is "the past," then the Sequence's State is "the present." The State has its own namespace separate from the variable namespace, and it can hold complex objects with their own behavior.
- Variable. A Variable is a namespaced value in your model. They can be defined in any one of several ways, but what makes them alike is that they all live in the same namespace, and they all have (potentially) different values in each Frame. Randomized variables are assigned values for each Frame by the framework, based on a probability distribution and other parameters that you specify. This sounds hard but the framework should make it easy. (I'll cover Ancho's randomization services in more detail in a future post.)
- Function. Functions are not themselves randomized, but they can do any kind of logic, arithmetic or other operation that you can dream up. Functions can do anything you want them to, from simple arithmetic, neural networks, or procedural algorithms, to calling external binaries, web services or remote procedures running on other machines. The inputs to a Function are the current State, and any "history" of the system in the form of past Frames -- in other words, the present and the past. They can also use the values of other Variables in the current Frame. In addition to returning a value like a normal Variable, a Function can also:
- Have external effects on a Sequence's State. That sentence will make the Computer Science guys cringe, because a "pure" function is supposed to return a value without having any external effects.
- Make decisions about whether or not to continue running the simulation. For instance, your function might compute that in this Frame of this Sequence, your business plan resulted in bankruptcy, or you outlived your retirement savings, or whatever.
Computer science people may be worried that this is a lot to pack into a single method call, and I agree. I had thought that an Ancho Function would actually be a class definition using the command pattern. Non-computer science people: don't sweat it. These aren't the droids you're looking for.
What Ancho Does
So, having said all that, what does Ancho do? It loads your Model, spawns one or more Nodes, and executes a Run. That Run consists of potentially thousands of Sequences, each of which has potentially thousands of Frames.
In each Sequence, the State is initialized based on your Model code. As each Frame is executed in turn, Ancho does the following things in order:
- Creates randomized values for all of your (randomized) Variables.
- Computes the values of all of your Functions, starting with ones that are specially marked as starting points and based on all of the current State, past Frames and other Variables.
- Based on the actions of the Functions, update the State of the Sequence for the next Frame.
This continues until a set number of Frames has executed, or one of the Functions tells Ancho to stop. At that point, the data for this Sequence is stored for later analysis. Ancho continues until the specified number of Sequences has been run, or some other criteria has been satisfied.
After all of the Sequences have run, the results of the entire Run are aggregated for analysis. This analysis will ultimately result in a data package that can be explored using a separate application that is either desktop, mobile or web-based.
Sounds simple, right?