Machine Learning and Artificial Intelligence
Elective Course for PhD Students at University of Ljubljana
This course is an introduction to data science for non-computer scientists. The course covers topics from data preparation, clustering, regression and classification, model evaluation, and embedding of unstructured data.
Type of course: Lectures + Homework Assignments
Course Code: 63834E (UL FRI)
ECTS: 5
Course name in Slovenian: Strojno učenje in umetna inteligenca
Semester: Fall 2024 (November and December)
Location for lectures: Faculty of Computer and Information Science, University of Ljubljana, Večna pot 113, Ljubljana
Time of the Lectures: The lectures are expected to take place in November and December. While we anticipate that they will be held in the evenings, likely between 5:00 PM and 7:00 PM, the exact schedule and room details will be confirmed in late October. This timing will be finalized once the undergraduate schedule at the University of Ljubljana is set and enrollment for this course is complete.
Prerequisites: No prior knowledge of the topics is assumed. This course will not use computer programming, and no prior statistics or data science knowledge is required.
Language: All course materials and lectures will be conducted in English.
Course Content
This is a machine learning and AI course intended for non-computer science students. We particularly encourage students from social sciences, humanities, natural sciences, engineering, and arts to enroll. No prior knowledge of statistics, computer science, or math is required. The course has a gentle learning curve, with additional video material and lecture notes available for all students. The course covers the following state-of-the-art topics:
Lecture 1: Exploring Data with Clustering
Lecturer: Janez Demšar
- Dive into the world of data with practical exploratory analysis techniques.
- Discover clustering techniques and methods for explaining discovered clusters.
- Case Study: understanding patterns in Slovenian surnames.
Lecture 2: Predictive Analytics with Linear Regression
Lecturer: Blaž Zupan
- Start with a simplest model of all: linear regression, a single variate case.
- Can machine learning predict with impossible accuracy? Expansion of feature space, overfitting, and regularization (yes, it is used both in the simplest linear regression and most complex neural networks models)
- Case study: dissecting body mass index.
Lecture 3: Demystifying Classification Techniques
Lecturer: Janez Demšar
- Learn about classification trees and random forests.
- Case Study: Predicting animal categories using zoo data.
- What is explainable AI: a case for logistic regression, naive Bayesian classifiers and nomograms.
Lecture 4: Tackling Overfitting in Machine Learning
Lecturer: Blaž Zupan
- Explore the concept of overfitting and why it’s a problem.
- Learn techniques to prevent overfitting and improve model accuracy.
- On model evaluation, cross-validation and right ways to perform feature selection.
Lecture 5: Model Scoring and Evaluation
Lecturer: Janez Demšar
- Delve into different ways to score and evaluate machine learning models.
- Understand the importance of precision, recall, and the ROC curve.
- Learn how to optimize models for better performance in real-world scenarios.
Lecture 6: Everything is Just Numbers - Embedding and Deep Models
Lecturer: Blaž Zupan
- Introduction to embedding techniques and their applications.
- Machine learning on images and text.
- Introduction to the world of deep learning, foundation models and generative AI.
Lecturers
Prof. Dr. Blaž Zupan teaches artificial intelligence and machine learning at the University of Ljubljana and Baylor College of Medicine. His research has focused on explainable AI and combinations of machine learning and data visualization techniques. He runs a twenty-member bioinformatics laboratory, which also develops Orange, a comprehensive open-source toolbox for machine learning.
Prof. Dr. Janez Demšar researches machine learning, data mining, with emphasis on data visualization. He spends most of his time programming an open-source component-based system for machine learning and data mining toolbox Orange. He also teaches courses in programming and in didactics of computer science.
Both Demšar and Zupan and have been awarded the best teacher awards at the Faculty of Computer Science, where Demšar receiving this award every year (except one :) ) since the students introduced this award about 20 years back. They have jointly received a Slovenian innovation "Puh" award for leading the development of Orange Data Mining, a software that will be used in the course.
Software Tool
In the course, we will be using Orange Data Mining, a free, popular open-source tool designed for data visualization and analysis in machine learning and artificial intelligence. Renowned for its ease of use and user-friendly interface, Orange employs a visual programming approach that allows users to create data analysis workflows through an intuitive drag-and-drop system. This makes it especially appealing for newcomers, as it offers a gentle learning curve while still providing robust capabilities for more advanced users. Orange's modular design and comprehensive library of widgets enable users to perform complex data manipulations, statistical analyses, and predictive modeling without needing extensive programming knowledge.
The screenshot below shows Orange in action: in the course, we will learn how to construct the workflows of components that read and process the data, build and evaluate predictive models, and visualize the data and results.
Enrollment Information
This is an elective course offered to all students at the University of Ljubljana. Students need to enroll at their own Faculty, which will then send their enrollment information to the Faculty of Computer and Information Science.
Course Materials
All course materials will be provided at the start of the course and will be available on the course's homepage (Moodle). The materials include lecture notes, short optional educational videos, and quizzes. We will provide course material to the students upon enrollment.
Homework Assignments and Grading
The course will include six practical homework assignments, each involving the use of Orange Data Mining software and requiring students to analyze chosen datasets. Assignments will be submitted through quiz-like questionnaires.
The final grade for the course will be computed based on scores from the homework assignments. There will be no final exam. Optional "bonus" assignments may be provided.
Course Attendance
This course is primarily organized for on-site attendance to facilitate interactive learning and engagement. However, we understand that circumstances may occasionally prevent students from attending in person. For those instances, comprehensive study materials—including lecture notes, recorded videos, and essential literature—will be made available. These resources are designed to ensure that students can thoroughly understand the course content and successfully complete the required home assignments, even if they miss an on-site session.
Contact Information, On-Line Support, and Further Announcements
Join our Discord server for any questions or further information about the course.