7.7 KiB
1
-
Organizational Information
- Page: 2
- Notes:
- Contact: klaus.kaiser@fh-dortmund.de
- Room: B.2.04
- Professor Klaus Kaiser has a background in data science across various industries.
-
Introduction to Data Science
- Page: 4-7
- Notes:
- Definition: Data science is about turning raw data into meaningful insights.
- Interdisciplinary field combining statistics, computing, and domain knowledge.
- Historical context of the term “data science” from 1962 to 2001.
-
What is Data Science?
- Page: 8-10
- Notes:
- Data science involves using methods and systems to extract knowledge from data.
- The intersection of math/statistics, computer science, and domain knowledge is crucial.
-
Practical Example of Data Science Project: Monkey Detection
- Page: 16-24
- Notes:
- Steps include understanding the problem, data collection, labeling, model training, and deployment.
-
Related Fields in Data Science
- Page: 14-15
- Notes:
- Data Engineering: Building systems for data collection and processing.
- Data Analysis: Inspecting and transforming data to inform decisions.
-
Tasks in Data Science
- Page: 16
- Notes:
- Overview of different tasks within classical machine learning.
-
Real-World Examples of Data Science Applications
- Page: 25-32
- Notes:
- Applications include autonomous driving, face recognition, predictive maintenance, fraud detection, recommendation systems, and cancer detection.
-
Overview of Lecture Content
- Page: 34-35
- Notes:
- Basic topics include data basics, statistics, presentation techniques, and machine learning.
-
Organizational: Schedule and Exam Information
- Page: 38-40
- Notes:
- Lecture and exercise schedules, language of instruction, and exam details (written exam with bonus points for data analytics).
-
Expectations from Students
- Page: 42-43
- Notes:
- Emphasis on respect, professionalism, and willingness to participate.
-
How to Continue in Data Science
- Page: 46-48
- Notes:
- Suggested literature for further reading and related courses available in the curriculum.
-
Summary & References
- Page: 51-55
- Notes:
- Key takeaways: ability to explain data science and recognize its applications.
- Important references for further study are provided.
2
-
Data Science Definition: Creating knowledge from data using math, statistics, and computer science.
-
Data Types:
-
Structured: Follows a predefined model (e.g., tables).
-
Unstructured: Lacks explicit structure (e.g., text, images).
-
-
Data Categories:
-
Discrete vs. Continuous
-
Nominal, Ordinal, Interval, Ratio
-
Qualitative vs. Quantitative
-
-
Data Interchange Formats: Common formats include CSV and JSON.
-
Data Trust: Importance of data quality dimensions: accuracy, completeness, consistency, timeliness, uniqueness, validity.
3
-
Data Categories: Discrete, continuous, nominal, ordinal, interval, ratio, qualitative, and quantitative.
-
Data Interchange Formats: Common formats include CSV and JSON.
-
Data Quality Dimensions: Accuracy, completeness, consistency, timelessness, uniqueness, validity.
-
Data Types: Primary (real-time, specific) vs. secondary (past, economical).
-
Data Acquisition Methods: Capturing (sensors, surveys), retrieving (databases, APIs), collecting (web scraping).
-
FAIR and Open Data: Principles for sustainable data usage and importance in scientific reproducibility.
4
-
Primary vs. Secondary Data: Primary data is collected for a specific purpose, while secondary data is sourced from existing datasets.
-
Data Collection Techniques: Includes scraping, which extracts data from websites, and considerations for legality and data protection.
-
Data Protection: Emphasizes GDPR compliance, anonymization, and pseudonymization of personal data.
-
Statistics Basics: Introduces descriptive and inductive statistics, frequency distributions, and graphical representations like histograms and bar charts.
-
FAIR Principles: Focus on data findability, accessibility, interoperability, and reusability.
5
-
Data Scraping: Extracts data from program outputs; should be a last resort.
-
Anonymization: Removes personal info to protect identity; pseudonymization allows identification with additional info.
-
Statistics Types: Descriptive, explorative, and inductive statistics.
-
Frequencies: Absolute and relative frequencies; visualized through histograms, pie charts, and bar charts.
-
Central Tendencies: Mode, median, and mean; box plots visualize data distribution.
-
Statistical Dispersion: Measures spread of data; includes range, quartile range, and empirical variance.
6
-
Histograms: Visual representation of frequency for continuous data.
-
Cumulative Frequency: Measures total frequency up to a certain value.
-
Statistical Dispersion: Includes empirical variance and standard deviation.
-
Bivariate Analysis: Examines relationships between two variables.
-
Correlation Coefficients: Quantifies the strength and direction of relationships.
-
Contingency Tables: Displays frequencies of categorical variables.
-
Pearson Coefficient: Measures linear correlation between metric variables.
-
Ordinal Data: Can be analyzed using rank correlation methods.
7
-
Correlation: Describes relationships between two variables using correlation coefficients based on variable types (nominal, ordinal, metric).
-
Contingency Tables: Used for two-dimensional frequency distributions; includes conditional frequencies and measures of association.
-
Probability Theory: Introduces random experiments, events, and Kolmogorov axioms; covers Laplace experiments and combinatorics.
-
Bayes’ Theorem: Explains conditional probability and its application in real-world scenarios, such as medical testing.
-
Outcome: Understanding of probability basics, combinatorial calculations, and Bayes’ theorem application.
8
-
Random Experiment: Defined by well-defined conditions with unpredictable outcomes (e.g., dice throw).
-
Kolmogorov Axioms: Fundamental properties of probability measures.
-
Random Variables: Assign outcomes to numbers; can be discrete (countable values) or continuous (any value in an interval).
-
Distributions: Includes discrete (e.g., binomial, uniform) and continuous (e.g., normal) distributions.
-
Expected Value & Variance: Key metrics for understanding random variables' behavior.
-
Applications: Used in statistical tests and linear regression.
9
-
Random Variables: Defined as functions mapping outcomes to real numbers.
-
Discrete vs. Continuous Distributions: Discrete has countable outcomes; continuous uses probability density functions.
-
Simple Linear Regression: Models correlation between independent (X) and dependent (Y) variables.
-
Key Concepts:
-
Residual Analysis: Evaluates fit of regression line.
-
Determinacy Measure (R²): Indicates model fit; ranges from 0 to 1.
-
-
Estimation: Parameters (β0, β1) estimated using least squares method.
-
Applications: Used in various fields to predict outcomes based on correlations.