Tuesday 24 July 2018

Linear and exponential regression using MicroStrategy

MicroStrategy Data Mining Services has been evolving to include more data mining algorithms and functionality. One key feature is MicroStrategy Developer’s Training Metric Wizard. The Training Metric Wizard can be used to create several different types of predictive models including linear and exponential regression, logistic regression, decision tree, cluster, time series, and association rules.


Linear and exponential regression

The linear regression data mining technique should be familiar to you if you have ever tried to extrapolate or interpolate data, tried to find the line that best fits a series of data points, or used Microsoft Excel’s LINEST or LOGEST functions.
Regression analyzes the relationship between several predictive inputs, or independent variables, and a dependent variable that is to be predicted. Regression finds the line that best fits the data, with a minimum of error.
For example, you have a dataset report with just two variables, X and Y, which are plotted as in the following chart:


Using the regression technique, it is relatively simple to find the straight line that best fits this data, as shown below. The line is represented by a linear equation in the classic y = mx + b format, where m is the slope and b is the y-intercept.


Alternatively, you can also fit an exponential line through this data, as shown in the following chart. This line has an equation in the y = b mx format.


So, how can you tell which line has the better fit? Many statistics are used in the regression technique. One basic statistic is an indicator of the goodness-of-fit, meaning how well the line fits the relationship among the variables. This is also called the Coefficient of Determination, whose symbol is R2. The higher that R2 is, the better the fit. The linear predictor has an R2 of 0.7177 and the exponential predictor has an R2 of 0.7459; therefore, the exponential predictor is a better fit statistically.
With just one independent variable, this example is considered a univariate regression model. In reality, the regression technique can work with any number of independent variables, but with only one dependent variable. While the multivariate regression models are not as easy to visualize as the univariate model, the technique does generate statistics so you can determine the goodness-of-fit.