JDBC Federator
JDBC Federator is a data connector that can combine multiple single-tenant databases or schemas into a single multi-tenant ElastiCube. This is useful, for example, for performing cross-analyses between different tenants. If you have multiple clients with their own database schemas, creating an ElastiCube for each will be difficult to manage, and writing a Union statement is not scalable. Using JDBC Federator solves these problems.
JDBC Federator is built on the JVM Connector framework and is conceptually similar to the Deploying a Custom Connector. For JDBC Federator to work, it requires a JDBC driver for the databases you want to connect to. You can download JDBC drivers from here.
This document contains the following main sections:
Setting Up JDBC Federator
Setting up the JDBC Federator requires several steps:
Editing the description.json File
To edit the description.json file:
-
In Sisense, click Admin, search for and select System Management, and click File Management.
Note:
The File Management feature must be enabled in the Feature Management section in order for it to appear here.
Note:
On your local Linux machine, this maps to
/opt/sisense/storage
.Important:
Do not add or remove any files or folders from the root directory.
-
Navigate to connectors > federator > description.json.
-
Append the following to the payload section:
"additionalClassPaths": ["/opt/sisense/storage/connectors/federator/jdbcdrivers/*"], "jvmParameters" : ["-Xmx8G"]
Note:
Set the RAM allocation (-Xmx8G ) in GB appropriately. For example, 4G, 8G, or 12G.
-
The description.json file is now as follows:
{
"fileVersion": "config://sisense/connector/federator#1",
"configurationType": "json",
"payload": {
"provider": "federator",
"displayName": "Federator",
"version": "1.1.0",
"category": "Database",
"jdbcClass": "com.sisense.connectors.jdbc.JayDBCQueryFederator",
"isAccumulativeSupported": true,
"isWebECMSupported": true,
"startAsUser": false,
"additionalClassPaths": ["/opt/sisense/storage/connectors/federator/jdbcdrivers/*"],
"jvmParameters" : ["-Xmx8G"]
}
}
Editing the usedConnectors.json File
To edit the usedConnectors.json file:
-
In Sisense , click Admin, search for and select System Management, and click File Management. Note that the File Management feature must be enabled in the Feature Management section in order for it to appear here.
-
Navigate to connectors > usedConnectors.json.
-
Append "federator" to both of the arrays:
"availableConnectors": [..., "federator"]
"displayConnectors": [..., "federator"]
The usedConnectors.json file should now look like this:
{
"customConnectors":true,
"availableConnectors":[
"athena",
"EC2EC",
"CSV",
"GoogleAnalytics",
"GenericJDBC",
"Google Spreadsheets",
"sql",
"Excel",
"GoogleAdwords",
"SalesforceJDBC"
"federator"
],
"displayConnectors":[
"athena",
"EC2EC",
"CSV",
"GoogleAnalytics",
"GenericJDBC",
"Google Spreadsheets",
"sql",
"Excel",
"GoogleAdwords",
"SalesforceJDBC"
"federator"
],
"externalConnectorsPath":"/opt/sisense/storage/connectors"
}
Setting Up the jdbcdrivers and tenants Folders
To set up the jdbcdrivers and tenants folders:
-
Download the following files:
-
Save the downloaded files to System Management > File Management > connectors > federator.
-
In that location, create two new folders:
- tenants
- jdbcdrivers
-
Add the relevant JDBC driver JAR file to the jdbcdrivers folder.
Restarting the Connector Pod
Restart the Kubernetes Connector Pod. Note that a restart is required when:
- You updated the descriptor.json file
- You add or update the jdbc driver JAR file
You do not have to restart the Connector Pod when you update the tenantdetails.csv and jdbc_query.properties files (next steps).
Configuring the tenantdetails.csv File
The tenantdetails.csv file lists the details for each tenant and is saved here:
/opt/sisense/storage/connectors/federator/tenantdetails.csv.
The CSV file must contain at least one tenant column, which typically maps to your database schemas. The tenant column must contain unique values.
If your tenants are located across multiple database servers with different usernames and passwords, you can add additional columns for each.
You can add as many columns as you like, based on the variables across tenants.
The following is an example of a tenantdetails.csv file:
Configuring the jdbc_queries.properties File
-
Click Admin, search for and select System Management, and click File Management > connectors > federator.
Note:
The File Management feature must be enabled in the Feature Management section in order for it to appear here.
-
Open the jdbc_query.properties file for editing.
-
In the table.names property, add a comma-delimited list of tables.
-
For each value in the table.names list, add a corresponding entry for its query. In the example, below, there's a table called
table.names=Customer,Invoice
and so there are corresponding query entries forCustomer.query
andInvoice.query
.Note:
- The table names are logical names and do not need to match the database names
- do not leave any spaces in the tables.names list
- Any value from the tenantdetails.csv file is reverenced using the <column name> format. For
example:
<tenant>
-
Make sure that the details for each of the properties are correct.
# tenant csv file location
tenant.config.file=/opt/sisense/storage/connectors/federator/tenantdetails.csv
# folder where the incremental state of the tables is stored
#tenant.incr.folder=/opt/sisense/storage/connectors/federator/tenants
# table names as a comma delimited list, do NOT leave spaces
table.names=Customer,Invoice
# sql queries for each table
Customer.query=SELECT '<tenant>' as "Tenant", * from <tenant>.Customer
Invoice.query=SELECT '<tenant>' as "Tenant", * from <tenant>.Invoice
# column names for incremental loads (optional)
#xxxx.increment.col=CreatedDate
#yyyy.increment.col=ID
# misc settings
incr.date.format=yyyy-MM-dd HH:mm:ss.SSS
fault.tolerance=Y
connection.test.query=SELECT 1
#For DB2
#connection.test.query=SELECT * from SYSIBM.SYSDUMMY1
#For PostGres
#record.fetch.size=100000
Setting Up Incremental Load
When you run a build, the ElastiCube reloads the entire data set. If you have a lot of data, it could take a long time to load. An incremental load, however, only loads new data. This configuration instructs JDBC Federator to look at specific columns (such as CustomerID or InvoiceDate) to determine what the latest data is, and to only load that data.
To set up incremental load:
-
Click Admin, search for and select System Management, and click File Management > connectors > federator.
Note:
The File Management feature must be enabled in the Feature Management section in order for it to appear here.
-
Open the jdbc_query.properties file for editing.
-
Uncomment the
tenant.incr.folder
property. -
Specify the incremental columns for each table you want to load incrementally. For example:
customer.increment.col=CustomerID
invoice.increment.col=InvoiceDate
-
In Sisense, in the ElastiCube , set the table build behavior to Append.
Setting the Federator Connection URL
Federator appears in the list as an available connector.
Set the connection URL for Federator as follows:
JDBC URL|ConfigFile=<path>/jdbc_query.properties
Example:
jdbc:redshift://redshift.aws.com/sales|ConfigFile=/opt/sisense/storage/connectors/federator/jdbc_query.properties
Note:
The connection URL can contain dynamic elements from the tenantdetails.csv file.
For example:
jdbc:redshift://redshift.aws.com/<tenant>|ConfigFile=/opt/sisense/storage/connectors/federator/jdbc_query.properties
Troubleshooting
Connection Error - Could Not Connect to DB. java.lang.ClassNotFoundException:
Check the following
- The JDBC driver JAR file for the database is uploaded to the jdbcdrivers folder
- additionalClassPaths entry is added to the descriptor.json file
- The connector pod was restarted after the settings were made
Can connect but do not see any tables.
Check the following:
- The ConfigFile property is appended to the JDBC URL
- The tenantdetails.csv file and the jdbc_query.properties file exist
- The SQL queries in jdbc_query.properties are valid.
- Detailed error message can be found on the logs for the connector pod
Viewing Logs
Logs are located in:
/var/log/sisense/sisense/<connector pod logs>
You can also check the logs within the connector pod:
kubectl -n sisense logs <connector pod name> --since=5m
FAQs
-
Is there a limitation on the number of data sources or schemas I can connect to?
There is no limit, the only limit is the total number of rows across all the tenants.
-
Where does the unioned data reside before creating the table inside the ElastiCube? Is it cached on the server?
There is no UNION in the Federator. The Federator runs individual queries in sequence and passes the data to the ElastiCube , row by row.
There is no caching or buffering in memory or on disc.
-
All my tenants are on the same database. Should I UNION the tables in the import query or use the Federator? What would be the advantage and disadvantage of using the Federator?
The Federator supports incremental loads at the tenant level. A single UNION query will not support this.
Management and setup is also much easier in the Federator than doing a UNION for each table. For example, if you add 20 schemas and then want to add the 21st schema, for the UNION method you will have to go to the ElastiCube , change every table, and rebuild the entire cube. In the Federator, you only need to add a single line in the tenantdetails.csv file.
-
Can I Federate data across different database vendors (e.g. SQL Server and Postgres)?
No, you cannot federate data across different vendors in a single table. You can have multiple federations in an ElastiCube . For example, Table 1 (multiple SQL Server databases) Table 2 (multiple Postgres databases).
For this you need to create multiple sets of property and CSV files. -
If my ElastiCubes get too large, can I split the Federations into multiple ElastiCubes?
Yes, you can. Create multiple sets of property and CSV files, and then use the Dynamic ElastiCube plugin from the Sisense Market Place to have your dashboards switch from one ElastiCube to another, based on the tenant.
-
Is the Federator data size-sensitive?
The Federator is only sensitive to data size as far as the data size limits of the ElastiCube are concerned.
-
What data sources can I use in the Federator?
The following databases (via their JDBC Drivers) are supported:
- "com.microsoft.sqlserver.jdbc.SQLServerDriver"
- "oracle.jdbc.driver.OracleDriver"
- "com.ibm.as400.access.AS400JDBCDriver"; "com.ibm.db2.jcc.DB2Driver"
- "com.simba.athena.jdbc.Driver"
- "org.postgresql.Driver"
- "com.mysql.jdbc.Driver"
- "com.amazon.redshift.jdbc42.Driver"
- "net.snowflake.client.jdbc.SnowflakeDriver"