Choosing the Right Data Model
Customers often run into the question of which type of data model they should use, ElastiCube (see Introducing ElastiCubes), Live (see Introducing Live Models), or B2D (see Build to Destination). This topic covers some of the key aspects to consider when choosing between these options.
Note:
The common use cases listed below are generic guidelines. Sisense recommends that you discuss your needs and requirements with Sisense's technical team to help ensure your business expectations are met.
Common Use Cases
This section covers common use cases and some of the most relevant factors to consider when choosing between the Live, ElastiCube, B2D, and "hybrid" types of data models.
Live
Good use cases for Live include: trading and stock market analysis, banking, ecommerce and high-volume sales analytics, and call center and customer service analytics.
Use Live when:
-
Data models have more than 1 billion records. (ElastiCube data models are limited to 1 billion records.)
-
Frequent data updates are required and can be supported by the pace of the customer ETL schedule.
-
Data is sourced from a cloud data warehouse (CDW), updated frequently and/or in real time, and the query cost is low, (examples: Redshift, BigQuery).
-
Data is already structured neatly and extensive transformation is not required, (there are no custom tables in Live).
-
Data source must be adjusted according to the user or group that is accessing the analytics. (Use the Dynamic ElastiCubes plugin to support this.)
-
There is a preference to work with the dialect / syntax of the underlying data source.
-
There are lots of tenant specific models. With Live you can leverage dynamic live connections, (see Managing Live Dynamic Connections) and do not need to worry about build scheduling.
-
System resources are constrained, (CPU, RAM).
ElastiCube
Good candidates for ElastiCube include analytics for the entertainment, healthcare, education, and marketing industries.
Use ElastiCube when:
-
There isn't a CDW and there are multiple data sources, (that is, a “data mashup” is required).
-
Data transformations and cleansing are required, but there isn’t already a data preparation solution in your data stack.
-
There is a CDW, but there are significant fees to the CDW vendor based on the volume of queries.
-
The dataset is growing, but it is currently less than 1 billion records per data model. When the amount of data grows to exceed 1 billion records for any single data model, then move to B2D or the Live model.
-
Data refresh cadence is infrequent, or tied to an ETL pipeline.
-
The table build option Append, or the table build option Accumulate By (incremental), is required. (See Defining ElastiCube Build Settings.)
Build to Destination
Benchmarking is a good use case for B2D.
Move to B2D when:
-
Upsert is required. For more information, see Build to Destination Benefits and Considerations.
-
Existing ElastiCube holds a very large amount of data.
-
Data preparation is required (by a Sisense end user) on top of an existing CDW.
-
Customer has access to Amazon S3 storage and a CDW (like Snowflake). Note that B2D works like a Live model, with all queries going through the CDW - which increases the costs associated with the CDW and Amazon S3 storage.
Hybrid
Use "Hybrid" (combination of ElastiCube and Live) when:
-
Dashboards must include both historical and current data. In many cases the ElastiCube data model is used to provide historical data while the Live model is used to provide the current data.
-
The ElastiCube is for a data set with varied slices (at a high level of granularity), while the Live data model supports visuals for deep-dive transactional level or historical (archived) insights.
-
A single dashboard includes both widgets that require real-time data, and other widgets that do not.
Important Considerations
In addition to the use cases found above, there are other relevant factors to consider. The following side-by-side summary table lists and describes many of the most important characteristics of the ElastiCube, Live, and B2D data model types.
ElastiCube | Live | B2D | Considerations | |
---|---|---|---|---|
Data Refresh RateDetermines the age of the data in your dashboards |
Determined by the frequency of the ElastiCube builds |
Determined by the customer update frequency |
Determined by the frequency of the B2D builds on the cloud data warehouse (CDW) |
How frequently do I need to pull new data from the source database? |
Data Transformation OptionsIncludes custom columns, custom tables, importing from multiple data sources, and custom code |
Extensive transformations are supported on the ElastiCube, (custom columns, custom tables, importing from multiple data sources, custom code) |
Limited - transformations must be run on the cloud database |
Extensive transformations are supported on the ElastiCube, (custom columns, custom tables, importing from multiple data sources, custom code) Custom transformations can also be performed using Sisense SQL or the destination SQL. |
Does my data require enriching or pre-conditioning? Can my data be pre-aggregated? Note: ElastiCubes should NOT be treated as an alternative to an ETL tool or a CDW, they are meant to be used in parallel. Operations should be completed in source, and imported in an ElastiCube when the data is ready for reporting. |
Load on the Cloud Data WarehouseAny application that increases the load on the CDW should be examined closely |
Not applicable during queries. During build only if the CDW is the data source. |
Constant queries to the CDW on an ongoing basis |
Constant queries to the CDW on an ongoing basis. Build should run on a small resource. |
Does the analytical system stress my CDW? Can the query load be avoided by using a database replica? |
Database AvailabilityThe level of availability is a critical factor for your analytical system |
Data sources must be connected to the ElastiCube during build |
Crucial – Live data model must be accessible at all times |
Crucial – Data model must be accessible at all times |
How frequently are the data sources offline? How critical is my analytical system? Is being offline (or showing out-of-date information) acceptable? |
Additional Vendor CostsVarious database vendors use different charging models |
Data is queried once from the data sources per ElastiCube build. No additional vendor costs for users to access the ElastiCube after the build. |
Database vendors may charge for data queries to the CDW |
Database vendors may charge for data queries to the CDW. Build can run on small resource to reduce costs. |
How many users are using my dashboards? What is my build frequency? Which data model will result in lower costs? What is the tipping point? Is it worthwhile to pay more for better performance? |
Database Size LimitationsSisense ElastiCubes and CDWs have different limitations |
Support for up to 1 billion records in a single ElastiCube |
Limitations are according to the CDW vendor |
Limitations are according to the CDW vendor |
What is the amount of data I need in my data model? How much historical data do I need to store? Can I reduce the amount of data? (For example, by trimming historical data, reducing the number of columns, and so on.) |
Query PerformanceDepends on the underlying work required to fetch and process data |
Query performance depends on the hardware and the complexity of the data. Good performance is provided by multiple layers of in-memory caching. |
Query performance depends on the CDW size |
Query performance depends on the CDW size |
How sensitive is the client to delays in receiving their query results? How much latency is acceptable? |
Available Connector Types |
Wide range of connectors, both native and generic JDBC (see CData) |
N/A |
Wide range of connectors, both native and generic JDBC (see CData) |
Sisense supports hundreds of data connectors, (see Data Connectors). |
Caching OptimizationThe caching options differ for ElastiCubes and CDWs |
Sisense supports results based caching and reusing subqueries caching (Recycler) |
Results based caching in Sisense |
Results based caching in Sisense. Optional use of Sisense Acceleration. |
Do I want to leverage Sisense's query caching? How long do I want the caching to store data? |
Dashboard Design Limitations |
Supports all widget types |
Supports all widget types |
Supports all widget types |
All widget types are supported, but not all formulas and functions, (such as mode, standard deviation, R formulas) |