Choosing a Data Strategy for Embedded Self-Service
Data management is the process of ingesting, storing, organizing, and maintaining the data created and collected by an organization. The data model used:
- Connects effectively and securely to multiple data sources
- Handles advanced permission management, including data security rules and sharing management
- Enhances the original data in several ways - including custom tables, custom columns, custom import queries, custom code, and calendar manipulation (e.g. fiscal year) - to prepare the data for analytics
The Sisense platform enables the customer to create a semantic layer that is integrated into their automatic and manual data pipelines. OEM customers create and manage the data model as part of their data architecture. The data model is fully integrated into the customer's data sources to support advanced analytics in scale and support multiple use cases. OEM customers build and maintain their data model using dimensional modeling knowledge. The model has a direct effect on the dashboard designers' analytical capabilities and the overall dashboard performance. OEM customers invest a lot of effort to achieve an efficient data model to better serve their analytics. Many dashboard designers' business requirements can be optimized and enhanced through data model optimizations. Aspects that must be considered to determine the best data strategy:
- One model for all customers, where the data is separated logically
- The data is mixed in the source, and in the data model.
- Data security rules are applied to restrict the data shown to each customer
- The OEM builds the data model and dashboards. The data model is usually large and complex.
- The customers are only viewers.
- It can be an ElastiCube or Live model, depending on the data volume.
- Regulations may prohibit use of this model.
- One model per customer, where the data is separated physically by the OEM
- The OEM separates the data by customer in the source via the schema
- Each customer has their own data model with the same schema.
- The OEM builds the data model and dashboards. Dashboard maintenance is easy for the OEM as changes are made once and applied to all customer data models.
- The OEM customers are dashboards designers and viewers.
- It can be an ElastiCube or Live model, depending on the data volume.
- One model per customer, where the data is stored by the customer
- The data source belongs to the OEM's customer
- Each customer has its own data model with the same schema
- The OEM builds the data model and dashboards.
- The OEM customers are dashboards designers and viewers.
- Live models are used.
- Strictest regulations may require this model.
- Refresh rate
One of the most fundamental aspects of determining your data model is your data's refresh rate. The data refresh rate refers to the age of the data in your dashboards.
- For Creating Live Models and Adding Live Connections, the data displayed on your dashboards is near-realtime, as every query is passed directly to the customer's database. A good example of using a live model (due to refresh rate requirements) is a dashboard showing live stock prices.
- For Building ElastiCubes, the data displayed on your dashboard is current to the last successful build event. Every widget query is handled internally in SIsense. A good example of using an ElastiCube (due to refresh rate requirements) is a dashboard showing historical stock prices. In this case, a daily ETL process will provide results that are good enough. To make a choice based on this factor, answer the following questions:
- How frequently do I need to pull new data from the database?
- Do all my widgets require the same data refresh frequency?
- How long does an entire ETL process take?
- Operational database load
Your operational databases do more than just serve your analytical system. Any application loading the operational databases must be closely examined.
- For Live models, Sisense constantly queries information from the operational databases, and feeds the results into the dashboard widgets. This happens every time a user loads a dashboard or a refresh operation occurs.
- For Elasticubes, Sisense highly stresses the operational databases during an ETL process as it reads all tables. To make a choice based on this factor, answer the following questions:
- Does the analytical system stress my operational database(s)?
- Can the query load be avoided by using a "database replica"?
- Operational database availability
Operational database availability is critical for collecting information for the analytical system.
- For Live models, all queries are redirected to your data sources. If the data source is not available, widgets will generate errors and not present any data.
- For ElastiCubes, data source availability is critical during the ETL process. If the data source is not available, the data in your widgets will always be available, but not necessarily be up to date. To make a choice based on this factor, answer the following questions:
- How frequently are analytical data sources offline?
- How critical is my analytical system? Is being offline (showing out-of-date information) acceptable?
- Database size
The amount of user data Sisense can store in each ElastiCube is limited.
- For Live models, there is no limitation as data is not imported to Sisense, only the data's schema.
- For Elasticubes, the rule-of-thumb limitation is 300M rows per ElastiCube (based on the number of columns and data types). To make a choice based on this factor, answer the following questions:
- How much data is needed in my data model?
- How much history must be stored?
- Can the amount of data be reduced (e.g., by trimming historical data, reducing the number of columns, etc.)?
- Latency
Query performance also depends on network performance, which is required to fetch data. Although every widget generates a query, the underlying data model will determine the work necessary to execute it.
- For Live models, queries may have to work across a network that introduces latency issues.
- For ElastiCubes, every query is handled inside Sisense and is therefore not subject to network latency issues. To make a choice based on this factor, answer the following questions:
- How sensitive is the client query result delays?
- When showing real-time data, is this extra latency acceptable?
- Connector availability
Sisense supports hundreds of data connectors. However, not all connectors are available for Live data models due to the performance of some connectors. A "slow" connector, or one that requires a significant amount of processing, may lead to a bad user experience when using Live models (i.e., widgets take a long time to load).
- For Live models, Sisense limits the number of data connectors to a few high-performing connectors that include most data warehouses and high performance databases.
- For ElastiCubes, Sisense allows the user to utilize all native or partner data connectors. To make a choice based on this factor, answer the following questions:
- Does my data source's connector support both data model types?
- Should I consider moving my data to a different data source to allow live connectivity?
- Caching optimization
Sisense optimizes performance by caching query results. In other words, query results are stored in memory for easier retrieval, in case they are re-executed. This ability provides a great benefit and improves the end-user experience. To make a choice based on this factor, answer the following questions:
- Do I want to leverage Sisense's query caching?
- How long do I want to cache data?