Introduction to KNIME
KNIME Analytics Platform is the leading open-source solution for data-driven innovation, helping you discover the potential hidden in your data, mine for fresh insights, or predict new futures. KNIME allows users to visually create data flows (or pipelines), selectively execute some or all analysis steps, and inspect the results and models with interactive views. With more than thousand modules, hundreds of ready-to-run examples, and the widest choice of advanced algorithms available, KNIME Analytics Platform is the perfect toolbox for any data scientist. Additional plugins allow the integration of methods for Text mining, Image mining, as well as time series analysis.
On the first day, participants will be introduced to the KNIME Workbench, how to browse or search for different nodes, and creating workflow by connecting multiple nodes. Participants will get acquainted with the most common nodes (for example, loading data from Excel, performing column and row operations, grouping data based on categorical variables, visualizing data in bar and line plots etc.). They will learn how to generate descriptive summaries, and doing statistical analysis including hypothesis testing (parametric and non-parametric) and linear regression.
In the second day, participants will learn more advanced concepts including machine learning and text analysis. They will learn various supervised learning techniques (for examples, decision trees, naïve Bayes, support vector machines and neural networks models) using some example datasets. The unsupervised techniques of K-means clustering and a priori association rule mining will also be introduced to the participants. In the end, participants will learn how to perform text analytics in KNIME and will test their skills on live tweets that they will fetch within KNIME using Twitter API.
- To introduce KNIME Analytics platform to the participants
- To introduce workflow-based approach for data analytics
- To apply machine learning techniques on data using KNIME
- To perform text analytics in KNIME
Who teaches the programme
The workshop will be led by Dr Sajid Siraj who is Lecturer in Business Analytics and Decision Science at Centre for Decision Research in Leeds University Business School. His research interests mainly lie in the areas of data analytics and decision making. He has used his skills in various areas including seismic data processing, telecom call detail records processing and profiling, and developing decision support systems.
To get most out of this workshop, you should:
- Know the use of Microsoft Excel (or any similar spreadsheet-based software).
- Know fundamentals of statistics (e.g. descriptive statistics and hypothesis testing).
- Have a Twitter account (if you’re interested in downloading tweets for text analytics).
The following will be beneficial, although not necessary:
- Computer programming and/or scripting skills.
- Basic knowledge of supervised and unsupervised machine learning
University staff, public and charitable sector staff: £200
Private Sector: £600
DAY 1: KNIME Basics
09:30 Introduction to the KNIME Workbench
10:00 Loading data from Excel
10:30 Table operations (column and row operations)
11:30 Grouping data based on categorical variables
12:00 Visualizing data in bar and line plots etc.
12:30 lunch break
13:30 Generating descriptive summaries
14:00 Parametric hypothesis testing
14:30 Non-parametric hypothesis testing
15:30 Linear regression
16:30 Closing day 1
DAY 2: KNIME for Machine Learning and Text Analysis
09:30 Quick Recap
10:00 Preparing train/test vectors
10:30 Decision trees learning and classification
11:30 Other supervised learning: Naïve Bayes, SVM, Neural networks
12:00 Unsupervised learning: K-means clustering
12:30 lunch break
13:30 Introduction to Text processing
14:30 Creating Twitter account and Getting API keys
15:30 Capturing live tweets using Twitter API
16:30 Closing the workshop