Python data science tools for Qlik
Announcements
Version 8.0 has been released. Get it here or with Docker.
This release adds the capability to use pre-trained scikit-learn, Keras or REST API based models with Qlik. More on this here.
Table of Contents
- Introduction
- Note on the approach
- Docker Image
- Pre-requisites
- Installation
- Usage
- Qonnections 2019 Workshop
Introduction
Qlik’s advanced analytics integration provides a path to making modern data science algorithms accessible to the wider business audience. This project is an attempt to show what’s possible.
This repository provides a server side extension (SSE) for Qlik Sense built using Python. The intention is to provide a set of functions for data science that can be used as expressions in Qlik.
Sample Qlik Sense apps are included and explained so that the techniques shown here can be easily replicated.
The current implementation includes:
- Supervised Machine Learning : Implemented using scikit-learn, the go-to machine learning library for Python. This SSE implements the full machine learning flow from data preparation, model training and evaluation, to making predictions in Qlik.
- Unsupervised Machine Learning : Also implemented using scikit-learn. This provides capabilities for dimensionality reduction and clustering.
- Deep Learning : Implemented using Keras and TensorFlow. This SSE implements the full flow of setting up a neural network, training and evaluating it, and using it to make predictions. Deep Learning models can be used for sequence predictions and complex timeseries forecasting.
- Use of pretrained ML models in Qlik : Pre-trained scikit-learn, Keras and REST API based models can be called from this SSE, allowing predictions to be exposed within the broader analysis and business context of a Qlik app. The implementation also allows for What-if analysis using the models.
- Named Entity Recognition : Implemented using spaCy, an excellent Natural Language Processing library that comes with pre-trained neural networks. This SSE allows you to use spaCy’s models for Named Entity Recognition or retrain them with your data for even better results.
- Association rules : Implemented using Efficient-Apriori. Association Rules Analysis is a data mining technique to uncover how items are associated to each other. This technique is best known for Market Basket Analysis, but can be used more generally for finding interesting associations between sets of items that occur together, for example, in a transaction, a paragraph, or a diagnosis.
- Clustering : Implemented using HDBSCAN, a high performance algorithm that is great for exploratory data analysis.
- Time series forecasting : Implemented using Facebook Prophet, a modern library for easily generating good quality forecasts. Now with the ability to use multiple regressors as input.
- Seasonality and holiday analysis : Also using Facebook Prophet.
- Linear correlations : Implemented using Pandas.
Further information on these features is available through the Usage section below.
For more information on Qlik Server Side Extensions see qlik-oss.
Disclaimer: This project has been started by me in a personal capacity and is not supported by Qlik.
Demonstration Videos
Forecasting, Clustering & Supervised Machine Learning:
Deep Learning & Additional Regressors with Prophet:
Clustering COVID-19 Literature:
Note on the approach
In this project we have defined functions that expose open source algorithms to Qlik using the gRPC framework. Each function allows the user to define input data and parameters to control the underlying algorithm’s output.
While native Python script evaluation is possible in Qlik as demonstrated in the qlik-oss Python examples, I have disabled this functionality in this project.
I prefer this approach for two key reasons:
- Separation of the Python implementation from usage in Qlik: App authors in Qlik just need to be able to use the functions, and understand the algorithms at a high level. Any complexity such as handling missing values or scaling the data is abstracted to simple parameters passed in the Qlik expression.
- Security: This server side extension can not be used to execute arbitrary code from Qlik. Users are restricted to the algorithms exposed through this SSE. Security can be further enhanced by running the SSE on a separate, sandboxed machine, and securing communication with certificates.
Docker Image
A Docker image for qlik-py-tools is available on Docker Hub. If you are familiar with containerisation this is the simplest way to get this SSE running in your environment.
If you want to install this SSE locally on a Windows machine, you can jump to the Pre-requisites section.
To pull the image from Docker’s public registry use the command below:
docker pull nabeeloz/qlik-py-tools
The image uses port 50055 by default. You can add encryption using certificates as explained here.
docker run -p 50055:50055 -it nabeeloz/qlik-py-tools
Containers built with this image only retain data while they are running. This means that to persist trained models or log files you will need to add a volume or bind mount using Docker capabilities for managing data.
# Store predictive models to a Docker volume on the host machine
docker run -p 50055:50055 -it -v pytools-models:/qlik-py-tools/models nabeeloz/qlik-py-tools
# Store log files to a bind mount on the host machine
docker run -p 50055:50055 -it -v ~/Documents/logs:/qlik-py-tools/core/logs nabeeloz/qlik-py-tools
# Run a container in detached mode, storing predictive models on a volume and logs on a bind mount
docker run \
-p 50055:50055 \
-d \
-v pytools-models:/qlik-py-tools/models \
-v ~/Documents/logs:/qlik-py-tools/core/logs \
nabeeloz/qlik-py-tools
# Run a container in detached mode, storing predictive models on a volume , logs on a bind mount and restart the container on reboot
docker run \
-p 50055:50055 \
-d \
--restart unless-stopped \
-v pytools-models:/qlik-py-tools/models \
-v ~/Documents/logs:/qlik-py-tools/core/logs \
nabeeloz/qlik-py-tools
# Run a container in detached mode, restart on reboot, store models and logs to bind mounts, and use certificates for secure communication
docker run \
-p 50055:50055 \
-d \
--restart unless-stopped \
--name qlik-py-tools \
-v ~/sse_PyTools_generated_certs/sse_PyTools_server_certs:/qlik-py-tools/pem-dir \
-v ~/Documents/models:/qlik-py-tools/models \
-v ~/Documents/logs:/qlik-py-tools/core/logs \
nabeeloz/qlik-py-tools python __main__.py --pem_dir=/qlik-py-tools/pem-dir
Pre-requisites
- Qlik Sense Enterprise or Qlik Sense Desktop
- Python >= 3.4 <= 3.6.9. The recommended version is 3.6.8.
- Note: The latest stable version of Python for this SSE is 3.6. The
pystan
library, which is required forfbprophet
, is known to have issues with Python 3.7 on Windows.
- Note: The latest stable version of Python for this SSE is 3.6. The
- Microsoft Visual C++ Build Tools
Installation
This installation requires Internet access. To install this SSE on a machine without Internet access refer to the offline installation guide.
-
Get Python from here. Make sure you get the 64 bit version. Remember to select the option to add Python to your PATH environment variable.
- You’ll also need a recent C++ compiler as this is a requirement for the
pystan
library used byfbprophet
. One option is to use Microsoft Visual C++ Build Tools. If you are having trouble finding the correct installer try this direct link. An alternative is to use themingw-w64
compiler as described in the PyStan documentation.- If you’re using the Visual Studio installer, select the Visual C++ Build Tools workload in the installer and make sure you select the C++ compilers in the optional components:
- If you’re using the Visual Studio installer, select the Visual C++ Build Tools workload in the installer and make sure you select the C++ compilers in the optional components:
-
Download the latest release for this SSE and extract it to a location of your choice. The machine where you are placing this repository should have access to a local or remote Qlik Sense instance.
- Right click
Qlik-Py-Init.bat
and chose ‘Run as Administrator’. You can open this file in a text editor to review the commands that will be executed. If everything goes smoothly you will see a Python virtual environment being set up, project files being copied, some packages being installed and TCP Port50055
being opened for inbound communication.- Note that the script always ends with a “All done” message and does not check for errors.
- If you need to change the port you can do so in the file
core\__main__.py
by opening the file with a text editor, changing the value of the_DEFAULT_PORT
variable, and then saving the file. You will also need to updateQlik-Py-Init.bat
to use the same port in thenetsh
command. This command will only work if you run the batch file through an elevated command prompt (i.e. with administrator privileges). - Once the execution completes, do a quick scan of the log to see everything installed correctly. The libraries imported are:
grpcio
,grpcio-tools
,numpy
,scipy
,pandas
,cython
,joblib
,pyyaml
,pystan
,fbprophet
,scikit-learn
,hdbscan
,spacy
,efficient-apriori
,tensorflow
,keras
and their dependencies. Also, check that thecore
andgenerated
directories have been copied successfully to the newly createdqlik-py-env
directory. - If the initialization fails for any reason, you can simply delete the
qlik-py-env
directory and re-runQlik-Py-Init.bat
.
-
Now whenever you want to start this Python service you can run
Qlik-Py-Start.bat
. - Now you need to set up an Analytics Connection in Qlik Sense Enterprise or update the Settings.ini file in Qlik Sense Desktop. If you are using the sample apps make sure you use
PyTools
as the name for the analytics connection, or alternatively, update all of the expressions to use the new name.- For Qlik Sense Desktop you need to update the
settings.ini
file. There may be two copies of this file; one atC:/Users/<User ID>/Documents/Qlik/Sense/
and another atC:/Users/AppData/Local/Programs/Qlik/Sense/Engine
. Add the SSE settings to both files.
SSEPlugin=PyTools,localhost:50055;
- For Qlik Sense Enterprise you need to create an Analytics Connection through QMC:
- The Analytics Connection can point to a different machine and can be secured with certificates:
- For Qlik Sense Desktop you need to update the
-
Finally restart the Qlik Sense engine service for Qlik Sense Enterprise or close and reopen Qlik Sense Desktop. This step may not be required if you are using Qlik Sense April 2018 and beyond.
If a connection between Python and Qlik is established you should see the capabilities listed in the terminal.
Capabilities may change as this is an ongoing project.
Usage
We go into the details of each capability in the sections below.
Sample Qlik Sense apps are provided and each app includes extensive techniques to use this SSE’s capabilities in Qlik.
Most of the sample apps require the Dashboard Extension Bundle which was released with Qlik Sense November 2018.
Documentation | Sample App | Additional App Dependencies |
---|---|---|
Correlations | Correlations | None. |
Clustering | Clustering with HDBSCAN | None. |
Predictions with pretrained models | Predictions with scikit-learn and Keras | Follow the pre-requisites and steps in the documentation. If using Qlik Sense Desktop you will need to download the data source, create a data connection named AttachedFiles in the app, and point the connection to the folder containing the source file. |
Machine Learning | Train & Test Predict K-fold Cross Validation Parameter Tuning K-fold CV & Parameter Tuning Complex Forecasting with scikit-learn |
Make sure you reload the K-fold Cross Validation or Train & Test app before using the Predict app. If using Qlik Sense Desktop you will need to download the data source, create a data connection named AttachedFiles in the app, and point the connection to the folder containing the source file.The forecasting app is best understood together with the Deep Learning section below. Here we just use more traditional ML algorithms rather than Deep Learning for producing the forecast. Make sure you reload the app before using the final sheets to make predictions. The data source for this app can be found here. |
Deep Learning | Complex Forecasting with Keras | Make sure you reload the app before using the final two sheets to make predictions. If using Qlik Sense Desktop you will need to download the data source, create a data connection named AttachedFiles in the app, and point the connection to the folder containing the source file. |
Forecasting | Facebook Prophet (Detailed) Facebook Prophet (Simple) Facebook Prophet (Multiple regressors) |
For the detailed app, use the bookmarks to step through the sheets with relevant selections. For calling Prophet through the load script refer to the simple app. If you want to reload the app using Qlik Sense Desktop you will need to download the data source, create a data connection named AttachedFiles in the app, and point the connection to the folder containing the source file.For the use of Prophet’s additional regressors capability refer to the multiple regressors app. The data for this app is found here. |
Named Entity Recognition | NER and Association Rules | If using Qlik Sense Desktop you will need to download the data sources, create a data connection named AttachedFiles in the app, and point the connection to the folder containing the source files. |
Association Rules / Market Basket Analysis | NER and Association Rules Market Basket Analysis |
If using Qlik Sense Desktop you will need to download the data sources, create a data connection named AttachedFiles in the app, and point the connection to the folder containing the source files. |
Qonnections 2019 Workshop
At Qonnections 2019 we ran hands-on workshops with PyTools and Qlik Sense. The content for these workshops, including the sample apps and exercise intructions, is available here.
The workshop exercises can be used as a tutorial for using this Server Side Extension with Qlik Sense Enterprise or Desktop.