CTOs and analysts of all stripes know that understanding the problem they are trying to solve should precede selecting analytical tools. Yet this is far more difficult than it seems, particularly as the variety of problems we come across multiply. One of the major stumbling blocks for analysis in particular is terminological fuzziness. We often (wrongly) treat terms like "uncertain," "difficult," and "complex" as interchangeable -- when, as economist Scott Page points out, the differences between them are instructive. Page has a useful distinction between all of these that can help us match tool to problem. And by going through his schema, I hope to convince you that there may be more possible dimensions to analytics than tools fielded by current market leaders imply.
Say we have a problem that involves trying to find the best solution from a range of alternatives. Are we able to consider each solution simultaneously and rank them? If so, analysis should be simple -- with one gigantic caveat. Generally, the problem here is uncertainty about the solutions themselves. We have a state of belief about what to do, but are we sure we correctly understand the nature of the solutions? In this situation, the best thing to do is to gather more information about the problem and potential solutions. Tools that help reveal patterns, connections, and inferences make the choices we face seem much less daunting, particularly when they can ingest and analyze large and/or unstructured data and guide our intuition. And contrary to Chris Anderson's famous column, it's here where computational tools most approximate the traditional idea of the scientific method. We have a defined question, we want to test the potential explanations/solutions, and we use tools to investigate. When we get the information we need, we're that much closer to making the right choice. We can't completely eliminate uncertainty about explanations/alternatives, but we can at least reduce it.
What if the problem is too many potential solutions to evaluate at once? Then it becomes a question of difficulty. Here we move more to the algorithmic and artificial intelligence-centric aspect of analytical tools. With a defined problem, problem function to optimize, and set of problem constraints, we search a solution space in a structured manner with algorithms that will find the best solution and prune away ones we don't want. When tractable, these problems take the form of things like the classic Knight's Tour problem solved with divide-and-conquer algorithms. Often we'll have to rely on problem-specific heuristics to either find a solution (admissible heuristics) or at least find it some of the time (inadmissible heuristic). Many real-world problems outside of computer science textbooks require algorithms that make very little assumptions about the nature of the problem. These are all usually combinations of random search of the solution space and iterative hill-climbing/improvement towards better solutions. Some academics within my school (George Mason University) have developed useful tools for deploying these metaheuristic algorithms to solve problems.
Before I continue, I'd like to make the point that both of these formulations of the problem makes one big assumption about the problem: the fitness landscape of potential solutions is fixed. Peaks and valleys don't really change too much. For example, say the company is uncertain about what market to choose in the first example. It has some some rough ideas to evaluate, and it uses data science tools to gather information on the problem. The decision is made, and then it's on to the next problem. Or perhaps a government agency is having a difficult time finding the best way to deal with an job shop scheduling problem. It uses Ant Colony Optimization to find an acceptable answer. Sure, the problem itself may change, requiring either information-gathering or solution search to be redone. But we can find optimal or at least satisfactory solutions for discrete instances of each problem as they come.
The last kind of problem, however, is very different. Here the problem isn't really that we have a lack of information or that moving through the range of options is very difficult. Consider that every year, you have to organize a family gathering. Different groups of your extended family have varying preferences about where to go. You have to find a destination that satisfies them all. Unfortunately, each previous vacation (and their level of satisfaction with it) influences their preferences in the current planning round. If your cousin Frank didn't like the last choice, he'll be that much more likely to dislike your current idea. To make matters worse, maybe there was something about the last vacation that put Frank and your wife's sister Paula at loggerheads, souring both on the previous vacation simply because it caused a flare-up. Whether or not you make a good choice is dependent on whether or not you can predict something the family aggregate will like.
There's so many dimensions to the problem, and what qualifies as a good solution evolves in time. There's also a high degree of randomness involved -- there are obviously qualitative patterns that re-occur in your vacation satisfaction outcome, but sometimes there can be instances where the outcome is a complete surprise. And because each variable in the problem is so highly interconnected, there may be unintended consequences to seemingly benign choices. You have a feeling that there are a couple of key dynamics driving the problem. It could be, for example, that your in-laws and your side of the family have a finite tolerance for how long they interact with each other, and that a resort/cruise vacation creates a situation that pushes them beyond their maximum tolerance level. You may not be able to increase their tolerance for each other, but you could design a vacation that doesn't, more often than not, push them past a critical tolerance value. Don't put them on a cruise ship, maybe give them more free time so they won't see each other as much. But how would you find that out?
I spend a lot of time on this precisely because this is a kind of problem that standard conceptions of analytic decision tools often neglect. What you really need is a way to experiment on the problem. You could collect a lot of data and discover some interesting patterns that would tell you what kinds of outcomes you want to explain. But what you need is a way to experiment to figure out combinations of variables are causing the patterns and how altering some parameters might change the way the problem evolves. And since there isn't necessarily a hard dependent variable --> independent variable setup, you have to think about multi causal dynamics and relationships. You could conduct some experiments, but the family fights that might result could leave you sleeping on the couch for the foreseeable future.
CTOs and analysts should look more at tools like NetLogo and other lightweight simulation environments that can easily and intuitively demonstrate dynamics of problems. These tools can be calibrated to existing data and validated against the observed patterns (like, for example, the aggregate vacation satisfaction levels). These tools, however, require some programming knowledge and lack the point-and-click usability of data science tools produced by companies like the many we track here at CTOvision (see our special report on analytical tools). What would make a difference, as my GMU colleague David Masad argues, would be modeling tools that could ingest large quantities of data, run algorithms, and automatically generate a model environment and detailed behavior specifications. That way, model behavior would be automatically calibrated to data and decision makers could just play around with model settings and see what happens.
The academic and research communities currently have the tools to deal with complex problems. But while the analyst community has tools to deal with uncertainty and difficulty, integrated and user-friendly tools to deal with complexity are far and few between. The company that produces tools capable of combining all three problem-solving methods -- reducing uncertainty with knowledge discovery, finding optimal model behaviors and settings by cutting through difficulty, and allowing decisionmakers to explore and experiment on complexity -- could very well produce the analyst community's killer app.