In this blog, I'll dive into Xpress Insight’s mirror database in relation to Tableau, providing insights into how best to configure it. This topic was first introduced in my previous blog, Visualizing your data in Xpress Insight.
Within an Insight application the input and result data can vary from kilobytes up to a gigabyte. To reduce the data footprint, it is stored in a highly compressed format within the Insight repository. Although this is good from a data footprint point of view, it is a format that Tableau is not able to easily consume. To resolve this, the mirror database is used with the sole purpose of exposing only key entities of an application in an uncompressed format so they can be consumed by Tableau.
The entities to mirror are defined through the application companion file for the application. At its simplest, the user must define the prefix to use for each of the tables and each of the mirror tables to appear in the mirror database. For example, the FlowShop example provided with Xpress Insight uses such a configuration:
When deciding on which entities to add to a mirror table, you need to keep the following rules in mind, a mirror table can only have either:
- One or more array entities with the same index set.
- Looking to the FlowShop example above, startj and DUR both use the same index set of MACH and JOBS.
- A set.
- Looking to the FlowShop example above, a table could be added to hold the STATES set as follows:
- One or more scalars by repeating the entity row for each scalar specifying the scalar in the name attribute for the entity.
- Looking to the FlowShop example above, a table could be added to hold the AvgMakespan and Tardiness scalars as follows:
- All parameters by entering parameters in the name attribute for the entity.
- For example:
When an application is uploaded to Xpress Insight, the mirror database tables defined in the application companion file are created but will contain no data. Loading a scenario will update the tables that have entities associated to them, which are input data. Executing a scenario will update the tables that have entities associated to them, these entities are result data. When a scenario is executed, the scenario data (for that scenario) is deleted from the mirror database during the completing phase of the scenario execution. This is the final stage of the scenario execution and is also when the data from the execution is stored within the Xpress Insight repository.
Using the default mirror strategy known as partial mirroring, a mirror table will only be populated if the total number of rows to be inserted into that table for the scenario is less than 50,000. Otherwise, the table is populated when a user opens a Tableau view within Insight that uses this table.
This approach balances Tableau view responsiveness, database volumes, and execution time; it should be suitable for most apps. Although this is not relevant for the simple FlowShop example, which uses and generates a small amount of data, it becomes far more critical when an application is using/generating large amounts of data.
With large amounts of data, there may be Tableau views which are infrequently viewed by the users that use large amounts of data. Where this is the case, it makes sense to update the mirror database only when the data is required, rather than slow down the overall scenario execution time (this can be overridden and will be discussed later in this post). To provide this capability internally, Xpress Insight holds a mapping of Tableau views to mirror tables, which is refreshed on user log in or when an application update is initiated. Therefore, it is important that Tableau workbooks are published via Xpress Insight rather than directly to Tableau server using Tableau Desktop when partial mirroring is used.
Although multiple users can access the Tableau views, the scenario data is only written once to the database. Each time a user accesses a Tableau view within the Xpress Insight client, the Xpress Insight server ensures that the data is up to date. It will also write an entry into the insight_security table to indicate that the user is authorized to access this data. If you're interested in more detail on this subject, a future post will discuss how to best to secure the data used by a Tableau view within Xpress Insight, so keep an eye on this community blog.
Overriding the default settings
Typically, the database mirror is defined in the application companion file as follows:
As the sync-after-execution attribute has not been included in the definition, it is defaulted to "auto" and Xpress Insight will use the default mirror strategy known as partial mirroring. If you wish to populate all of the mirror database tables regardless of the data size, then the sync-after-execution attribute should be included and set to a value of "true." Typically, this is used where execution time is not important but availability of the data for the Tableau views is.
It could also be used when developing new or modifying existing Tableau views where you will want all of the data available at the time of development. In this case, it is recommended that the application companion file is not altered, but partial mirroring is disabled by unchecking the Partial mirroring enabled checkbox located on the Tableau page of the Xpress Insight Web Admin client.
The sync-after-execution attribute can also be applied at the mirror-table level. For example:
<mirror-table name="interval" sync-after-execution="true">
This is particularly useful in cases where you know your tables have more than 50,000 rows per scenario that are commonly used in the Tableau views. In this situation, you may want to mirror the data at execution time to speed up the load time of those views.
Finally, it is possible to manually control when a mirror table is populated by specifying the Tableau workbooks or workbook views that use it. For example:
<exclude workbook="Forecast" view="Summary"/>
It is recommended not to use this, as it will be deprecated in the future. Instead, partial mirroring should be used.
- For most applications, the out of the box configuration can be used.
- Where large amounts of data will be written to the mirror database tables, you should start to consider what is more important – scenario execution times or the speed in which the Tableau views display prioritized by how often these Tableau views are viewed.
- With this information you should be able to make an informed decision whether you need to mirror any of the larger tables on execution.
Check out the FICO Xpress Optimization Community for more resources and information.