Choosing the Right Data Model

Customers often run into the question of which type of data model they should use, ElastiCube (see Introducing ElastiCubes), Live (see Introducing Live Models), or B2D (see Build to Destination). This topic covers some of the key aspects to consider when choosing between these options.

Note:

The common use cases listed below are generic guidelines. Sisense recommends that you discuss your needs and requirements with Sisense's technical team to help ensure your business expectations are met.

Common Use Cases

This section covers common use cases and some of the most relevant factors to consider when choosing between the Live, ElastiCube, B2D, and "hybrid" types of data models.

Live

Good use cases for Live include: trading and stock market analysis, banking, ecommerce and high-volume sales analytics, and call center and customer service analytics.

Use Live when:

  • Data models have more than 1 billion records. (ElastiCube data models are limited to 1 billion records.)

  • Frequent data updates are required and can be supported by the pace of the customer ETL schedule.

  • Data is sourced from a cloud data warehouse (CDW), updated frequently and/or in real time, and the query cost is low, (examples: Redshift, BigQuery).

  • Data is already structured neatly and extensive transformation is not required, (there are no custom tables in Live).

  • Data source must be adjusted according to the user or group that is accessing the analytics. (Use the Dynamic ElastiCubes plugin to support this.)

  • There is a preference to work with the dialect / syntax of the underlying data source.

  • There are lots of tenant specific models. With Live you can leverage dynamic live connections, (see Managing Live Dynamic Connections) and do not need to worry about build scheduling.

  • System resources are constrained, (CPU, RAM).

ElastiCube

Good candidates for ElastiCube include analytics for the entertainment, healthcare, education, and marketing industries.

Use ElastiCube when:

  • There isn't a CDW and there are multiple data sources, (that is, a “data mashup” is required).

  • Data transformations and cleansing are required, but there isn’t already a data preparation solution in your data stack.

  • There is a CDW, but there are significant fees to the CDW vendor based on the volume of queries.

  • The dataset is growing, but it is currently less than 1 billion records per data model. When the amount of data grows to exceed 1 billion records for any single data model, then move to B2D or the Live model.

  • Data refresh cadence is infrequent, or tied to an ETL pipeline.

  • The table build option Append, or the table build option Accumulate By (incremental), is required. (See Defining ElastiCube Build Settings.)

Build to Destination

Benchmarking is a good use case for B2D.

Move to B2D when:

  • Upsert is required. For more information, see Build to Destination Benefits and Considerations.

  • Existing ElastiCube holds a very large amount of data.

  • Data preparation is required (by a Sisense end user) on top of an existing CDW.

  • Customer has access to Amazon S3 storage and a CDW (like Snowflake). Note that B2D works like a Live model, with all queries going through the CDW - which increases the costs associated with the CDW and Amazon S3 storage.

Hybrid

Use "Hybrid" (combination of ElastiCube and Live) when:

  • Dashboards must include both historical and current data. In many cases the ElastiCube data model is used to provide historical data while the Live model is used to provide the current data.

  • The ElastiCube is for a data set with varied slices (at a high level of granularity), while the Live data model supports visuals for deep-dive transactional level or historical (archived) insights.

  • A single dashboard includes both widgets that require real-time data, and other widgets that do not.

Important Considerations

In addition to the use cases found above, there are other relevant factors to consider. The following side-by-side summary table lists and describes many of the most important characteristics of the ElastiCube, Live, and B2D data model types.

  ElastiCube Live B2D Considerations
Data Refresh Rate

Determines the age of the data in your dashboards

Determined by the frequency of the ElastiCube builds

Determined by the customer update frequency

Determined by the frequency of the B2D builds on the cloud data warehouse (CDW)

How frequently do I need to pull new data from the source database?

Data Transformation Options

Includes custom columns, custom tables, importing from multiple data sources, and custom code

Extensive transformations are supported on the ElastiCube, (custom columns, custom tables, importing from multiple data sources, custom code)

Limited - transformations must be run on the cloud database

Extensive transformations are supported on the ElastiCube, (custom columns, custom tables, importing from multiple data sources, custom code)

Custom transformations can also be performed using Sisense SQL or the destination SQL.

Does my data require enriching or pre-conditioning?

Can my data be pre-aggregated?

Note: ElastiCubes should NOT be treated as an alternative to an ETL tool or a CDW, they are meant to be used in parallel. Operations should be completed in source, and imported in an ElastiCube when the data is ready for reporting.

Load on the Cloud Data Warehouse

Any application that increases the load on the CDW should be examined closely

Not applicable during queries.

During build only if the CDW is the data source.

Constant queries to the CDW on an ongoing basis

Constant queries to the CDW on an ongoing basis.

Build should run on a small resource.

Does the analytical system stress my CDW?

Can the query load be avoided by using a database replica?

Database Availability

The level of availability is a critical factor for your analytical system

Data sources must be connected to the ElastiCube during build

Crucial – Live data model must be accessible at all times

Crucial – Data model must be accessible at all times

How frequently are the data sources offline?

How critical is my analytical system? Is being offline (or showing out-of-date information) acceptable?

Additional Vendor Costs

Various database vendors use different charging models

Data is queried once from the data sources per ElastiCube build. No additional vendor costs for users to access the ElastiCube after the build.

Database vendors may charge for data queries to the CDW

Database vendors may charge for data queries to the CDW.

Build can run on small resource to reduce costs.

How many users are using my dashboards?

What is my build frequency?

Which data model will result in lower costs? What is the tipping point?

Is it worthwhile to pay more for better performance?

Database Size Limitations

Sisense ElastiCubes and CDWs have different limitations

Support for up to 1 billion records in a single ElastiCube

Limitations are according to the CDW vendor

Limitations are according to the CDW vendor

What is the amount of data I need in my data model?

How much historical data do I need to store?

Can I reduce the amount of data? (For example, by trimming historical data, reducing the number of columns, and so on.)

Query Performance

Depends on the underlying work required to fetch and process data

Query performance depends on the hardware and the complexity of the data.

Good performance is provided by multiple layers of in-memory caching.

Query performance depends on the CDW size

Query performance depends on the CDW size

How sensitive is the client to delays in receiving their query results?

How much latency is acceptable?

Available Connector Types

Wide range of connectors, both native and generic JDBC (see CData)

N/A

Wide range of connectors, both native and generic JDBC (see CData)

Sisense supports hundreds of data connectors, (see Data Connectors).

Caching Optimization

The caching options differ for ElastiCubes and CDWs

Sisense supports results based caching and reusing subqueries caching (Recycler)

Results based caching in Sisense

Results based caching in Sisense.

Optional use of Sisense Acceleration.

Do I want to leverage Sisense's query caching?

How long do I want the caching to store data?

Dashboard Design Limitations

Supports all widget types

Supports all widget types

Supports all widget types

All widget types are supported, but not all formulas and functions, (such as mode, standard deviation, R formulas)