Friday, 3 August 2018

Logistic regression in MSTR


Logistic regression is a classification technique used to determine the most likely outcome from a set of finite values. The term logistic comes from the logit function, shown below:
Notice how this function tends to push values to zero or one; the more negative the x-axis value becomes, the logit function approaches a value of zero; the more positive the x-axis value becomes, it approaches a value of one. This is how regression, a technique originally used to calculate results across a wide range of values, can be used to calculate a finite number of possible outcomes.
The technique determines the probability that each outcome will occur. A regression equation is created for each outcome and comparisons are made to select a prediction based on the outcome with the highest likelihood.

Evaluating logistic regression using a confusion matrix

As with other categorical predictors, the success of a logistic regression model can be evaluated using a confusion matrix. The confusion matrix highlights the instances of true positives, true negatives, false positives, and false negatives. For example, a model is used to predict the risk of heart disease as Low, Medium, or High. The confusion matrix would be generated as follows:
Proper analysis of the matrix depends on the predictive situation. In the scenario concerning heart disease risk, outcomes with higher false positives for Medium and High Risk are favored over increased false negatives of High Risk. A false positive in this case encourages preventative measures, whereas a false negative implies good health when, in fact, significant concern exists.

Tuesday, 24 July 2018

Linear and exponential regression using MicroStrategy

MicroStrategy Data Mining Services has been evolving to include more data mining algorithms and functionality. One key feature is MicroStrategy Developer’s Training Metric Wizard. The Training Metric Wizard can be used to create several different types of predictive models including linear and exponential regression, logistic regression, decision tree, cluster, time series, and association rules.


Linear and exponential regression

The linear regression data mining technique should be familiar to you if you have ever tried to extrapolate or interpolate data, tried to find the line that best fits a series of data points, or used Microsoft Excel’s LINEST or LOGEST functions.
Regression analyzes the relationship between several predictive inputs, or independent variables, and a dependent variable that is to be predicted. Regression finds the line that best fits the data, with a minimum of error.
For example, you have a dataset report with just two variables, X and Y, which are plotted as in the following chart:


Using the regression technique, it is relatively simple to find the straight line that best fits this data, as shown below. The line is represented by a linear equation in the classic y = mx + b format, where m is the slope and b is the y-intercept.


Alternatively, you can also fit an exponential line through this data, as shown in the following chart. This line has an equation in the y = b mx format.


So, how can you tell which line has the better fit? Many statistics are used in the regression technique. One basic statistic is an indicator of the goodness-of-fit, meaning how well the line fits the relationship among the variables. This is also called the Coefficient of Determination, whose symbol is R2. The higher that R2 is, the better the fit. The linear predictor has an R2 of 0.7177 and the exponential predictor has an R2 of 0.7459; therefore, the exponential predictor is a better fit statistically.
With just one independent variable, this example is considered a univariate regression model. In reality, the regression technique can work with any number of independent variables, but with only one dependent variable. While the multivariate regression models are not as easy to visualize as the univariate model, the technique does generate statistics so you can determine the goodness-of-fit.

Thursday, 21 June 2018

Freeform SQL in MSTR

The Freeform SQL functionality adds great flexibility to MicroStrategy’s query and reporting capabilities. Traditionally, you use the MicroStrategy Engine to generate SQL to run against one specific relational database for a report. In addition to generating reports in the traditional way, you can use your own customized SQL statements to generate reports from operational systems included in a MicroStrategy project. This capability can save time since you do not need to place the data into a data mart or data warehouse first.


The Freeform SQL feature allows you to use your own SQL statements to access data from various ODBC data sources, including relational databases, Excel files, and flat files, as long as they are included in the MicroStrategy environment. Since you create your own SQL statements to create reports with Freeform SQL, a strong knowledge of how to create and use SQL statements is essential.
Details on how to create Freeform SQL reports from these ODBC data sources are discussed in this section.
The following image shows the Freeform SQL Editor, where you define the SQL statement for the report. Notice the different panes for different purposes.

Saturday, 9 June 2018

ACL in MSTR

An access control list (ACL) is a list of users and groups and the access permission that each one has to objects in a MicroStrategy project. Different users may have different permissions on the same object.
When you use existing objects (including project objects) in Freeform SQL column mapping, the ACLs of these objects are used. However, new attributes and metrics created in Freeform SQL reports inherit the default ACL defined in the Project Configuration Editor. You can modify the default ACL in MicroStrategy Developer by right-clicking a project and selecting Project Configuration. In the Project Configuration window point to Project definition, then Security, and then for Set Freeform SQL and MDX objects default security select Modify. The Properties[XDA Objects] dialog box is displayed. The Permissions list has the following settings: 

The user who creates the new attributes and metrics with Freeform SQL is automatically given the Full Control permission of the new objects.

The changed settings will only affect the new attributes and metrics created subsequently in Freeform SQL reports, but not those objects created prior to the change.
User
Children
Administrator
Full Control
Everyone
View
Public/Guest
View

Thursday, 31 May 2018

All about security filters

A security filter is an object that you assign to users or groups, which limits the result set when users execute reports or browse elements. Security filters enable you to control what warehouse data users can see, at the MicroStrategy level. This function is similar to database views and row level security.
For example, two regional managers each have a different security filter assigned to their MicroStrategy user account, based on their respective regions – one in the Northeast and the other in the Southwest. If these two users with different security filters run the same report, they may get different results based on the criteria defined in their security filters.
A security filter has these parts:
  • Filter expression: Specifies the subset of the data that a user can analyze.
  • Top range attribute: Specifies the highest level of detail that the security filter allows the user to view. If this is not specified, the user can view every level higher than the specified bottom range attribute. For details on levels and dimensionality, see the Metrics chapter in the MicroStrategy Advanced Reporting Guide.
  • Bottom range attribute: Specifies the lowest level of detail that the security filter allows the user to view. If this is not specified, the user can view every level lower than the specified top range attribute. For details on levels and dimensionality, see the Metrics chapter in the MicroStrategy Advanced Reporting Guide.
Top and bottom range attributes place aggregation ceilings and floors on metrics used on a report. If neither a top nor a bottom range attribute is specified, the security filter is applied to every level of analysis.
Within a project, a security filter may be specified for any user. Whenever a user submits a report execution request, the security filter is applied. It can also be applied to the user's element requests (when the user browses attribute elements within a project), if that feature is enabled.

Prerequisites

  • The security filter you want to assign to a user or group must already exist in MicroStrategy Developer.
  • To assign security filters, you must have the necessary privileges.

To assign a security filter to a user or group

You assign a security filter to a single user or group within a single project by using the Group or User Editor. To assign security filters to multiple users or groups across multiple projects or within a single project, you or another administrator must use MicroStrategy Developer.
This method allows you to assign a security filter to a single user or group.

Wednesday, 18 April 2018

How Joins works on data blending datasets inside of Dossiers or Documents in MicroStrategy

MicroStrategy documents and dossiers include a join behavior setting on datasets as seen in the screenshot below. This setting is built to control joins between the two datasets but only in specific scenarios. This can lead to confusion when using the product and results other than what is expected in some scenarios.

User-added image

The main misunderstanding centers around reporting scenarios with lack of metrics. The MicroStrategy product has been built from the ground up to focus on metrics and analysis of aggregated and calculated values. In terms of join behavior this means the setting applies to metric joins between datasets. To be more specific the product creates a metric dataset for the join between datasets and then outer joins that to the attributes. With no metric dataset the attributes are just outer joined to each other. This leads to situations where join behavior is set to secondary (inner join) between datasets but an outer join is still seen as no metric is used from the datasets.

This is best illustrated with an example. In the example below there are two datasets join1 and join2. Join1 has data for 2014 and 2015 while Join2 has data for 2015 and 2016. When just the attributes are added from both datasets (notice at least one attribute is included from each dataset that is only sourceable from its dataset) a full outer join is seen between the data. This is despite both datasets being set to inner join for their join behavior.

User-added image

Now if the metrics from each dataset are added to the grid the data can be seen to now perform the inner join specified in the join behavior. 

This means in scenarios where an inner join is desired between datasets it is key to make sure metrics are used from each dataset by the grid. This can be done directly through being placed on the grid, through hiding the metrics on the grid or by building view filters based on the metrics.

Sunday, 5 November 2017

Snowflake or star? Which one is the better approach for designing a project from scratch?

It's a very old question and certainly a very important one before starting a new project from scratch. Though there are lot of factors which must be considered before coming to the final decision, here are some insights on the same

Microstrategy prefers snowflake schema over star. Parent child relationship is one of the most important part of schema design and it's always easier to design when each dimension has it's own lookup table. Though de-normalized dimensions works without any issues and avoids lot of joins, but if you have prompts in your reports then it's always better to stick with snowflake.

Lets take an example- suppose you have a prompt on month. Now in star schema you have date and month in the same table but in snowflake you'll have a month table separately. Now in the month table you have 365 rows and in month table you have 12 only. The prompt basically gets the values by firing a select distinct in the lookup table. You can understand the performance impact of distinct query when the number of rows is more. It may not effect much in this example but when your lookup table has thousands of records then it's huge. Hope this helps