Questions and Answers

We expect that this brief overview will raise as many questions as it answers. Since our conceptualization of this approach is still evolving, a number of these questions remain unresolved. However, discussions with colleagues have raised several points that deserve at least speculative discussion:

Doesn't this method require huge amounts of raw data?

Yes, but this is becoming less of a concern. As companies of all sizes continue to automate, they are being flooded with raw data. Indeed, most large companies already collect data on at least some their important process signals. For example, if a manufacturer uses a modern operations management system (e.g., SAP), bar-codes its products, or has installed a plant-floor data acquisition system, it is already collecting data on many important process signals. If a service operation uses a workflow management or document management system, it is probably making similarly relevant measurements. The real problem is that there is currently no effective way to simplify and visualize the dynamic behavior that this wealth of data reflects.

How many signals can be measured this way?

At present, this approach is too complicated to be used on every measured variable. However, it seems likely that many organizations will have three or four key signals that are worth the effort to track and are not well served by the organization's other information gathering methods.

Isn't this what researchers do when they build models to explain various aspects of operations and organizational behavior?

The mechanics of the analog approach are similar to conventional model-building. However, it is important to understand that the proposed approach is intended to be "descriptive" rather than "explanatory". The equation selected to describe the data does not need a theoretical justification. It only needs to be malleable enough to conform to the underlying data. Of course, if two functions are equally malleable and one is more intuitive, it would be convenient to use the one that is more easily understood.

We envision that explanatory model-building will occur after the analog signals have been described. For example, the "cloud" of points in Figure 3 has a distinctive structure that suggests something other than chance is at work. Faced with such a pattern, one might examine the circumstances behind the learning curves at different locations in the cloud to see if there are any systematic reasons for the pattern. The analog approach should make the analyst's job much easier by eliminating the random process noise that is typically so confusing. Thus, the analyst will be able to focus on the far more important variations in the shapes of analog signals.

Do current data mining tools adopt the analog approach?

The answer seems to be no. Most of the current data mining tools try to ferret out relationships between discrete chunks of data. They make quantitative comparisons, they look for hierarchical relationships, they try to induce decision rules, and they search for repeating patterns in text and data. All of these approaches deal with discrete data "cells" or tables. Thus, we believe that viewing data in terms of analog signals is different, but highly complementary, to current approaches.

Why not just use linear regression?

If the underlying signal is linear, a linear model is appropriate. Linear models are much easier to apply. However, many real-world phenomena are non-linear and the linear model would cause too much valuable information to be lost in the compression process. In addition, now that we have the necessary computational horsepower and high-quality statistical tools, there is no longer a performance reason to avoid fitting the more complex shapes.

What mathematical function will fit all of the different "signal patterns" that are likely to be found in practice?

There is none, unless the equation is very complex and has a large number of variables. The advantages of working with 2 or 3-dimensional data would be lost. Instead, it may be preferable to build a "toolkit" of equations that work well with specific problems. This may not be as limiting as it sounds. Time-series data can only behave in a limited number of ways. It can go up, go down, approach a limit, achieve a peak or trough, or oscillate. Pareto's law suggests that a small number of signal patterns will probably describe a large number of phenomena.

Is this likely to be packaged and sold as a software product?

Yes and no. It should be fairly straightforward (not to say easy) to develop a database add-on that would fit supplied mathematical functions to the time-series "signals". The challenge, however, will be to build a reliable toolkit of mathematical functions and meaningful analyses. This argues that, while the analytical engine and user interface may be "shrink-wrapped", the portfolio of functions is likely to evolve and improve over time, quite possibly as a collaboration between industry and universities. SAS Institute is a possible model for this type of business.

Do we have to develop special software to apply the analog approach?

No. We have been able to explore this with conventional statistical analysis tools (notably SAS). Dedicated software could eventually be easier to use, but effective prototypes are possible developed with the software and hardware that is available now.

How far have you gotten in developing this approach?

Work is well underway, but there is still a long way to go. We have applied the approach to several simple forms of "signals", including the learning curves, with promising results. However, there are many residual uncertainties and mechanical details that remain to be explored. We anticipate that, while this approach can be applied to a number of simple problems immediately, it will continue to evolve and improve over time.

This research is being conducted jointly with David Nembhard, a colleague at Auburn's College of Business.

Uzumeri Home Page

Link to Auburn Home Page