vault backup: 2024-12-05 23:59:54

This commit is contained in:
2024-12-05 23:59:54 +01:00
parent 27b50b6b08
commit ea60f51b63
61 changed files with 8725 additions and 69 deletions

View File

@@ -0,0 +1,185 @@
# 1
1. **Organizational Information**
- **Page:** 2
- **Notes:**
- Contact: [klaus.kaiser@fh-dortmund.de](mailto:klaus.kaiser@fh-dortmund.de)
- Room: B.2.04
- Professor Klaus Kaiser has a background in data science across various industries.
2. **Introduction to Data Science**
- **Page:** 4-7
- **Notes:**
- Definition: Data science is about turning raw data into meaningful insights.
- Interdisciplinary field combining statistics, computing, and domain knowledge.
- Historical context of the term “data science” from 1962 to 2001.
3. **What is Data Science?**
- **Page:** 8-10
- **Notes:**
- Data science involves using methods and systems to extract knowledge from data.
- The intersection of math/statistics, computer science, and domain knowledge is crucial.
4. **Practical Example of Data Science Project: Monkey Detection**
- **Page:** 16-24
- **Notes:**
- Steps include understanding the problem, data collection, labeling, model training, and deployment.
5. **Related Fields in Data Science**
- **Page:** 14-15
- **Notes:**
- Data Engineering: Building systems for data collection and processing.
- Data Analysis: Inspecting and transforming data to inform decisions.
6. **Tasks in Data Science**
- **Page:** 16
- **Notes:**
- Overview of different tasks within classical machine learning.
7. **Real-World Examples of Data Science Applications**
- **Page:** 25-32
- **Notes:**
- Applications include autonomous driving, face recognition, predictive maintenance, fraud detection, recommendation systems, and cancer detection.
8. **Overview of Lecture Content**
- **Page:** 34-35
- **Notes:**
- Basic topics include data basics, statistics, presentation techniques, and machine learning.
9. **Organizational: Schedule and Exam Information**
- **Page:** 38-40
- **Notes:**
- Lecture and exercise schedules, language of instruction, and exam details (written exam with bonus points for data analytics).
10. **Expectations from Students**
- **Page:** 42-43
- **Notes:**
- Emphasis on respect, professionalism, and willingness to participate.
11. **How to Continue in Data Science**
- **Page:** 46-48
- **Notes:**
- Suggested literature for further reading and related courses available in the curriculum.
12. **Summary & References**
- **Page:** 51-55
- **Notes:**
- Key takeaways: ability to explain data science and recognize its applications.
- Important references for further study are provided.
# 2
- **Data Science Definition**: Creating knowledge from data using math, statistics, and computer science.
- **Data Types**:
- **Structured**: Follows a predefined model (e.g., tables).
- **Unstructured**: Lacks explicit structure (e.g., text, images).
- **Data Categories**:
- Discrete vs. Continuous
- Nominal, Ordinal, Interval, Ratio
- Qualitative vs. Quantitative
- **Data Interchange Formats**: Common formats include CSV and JSON.
- **Data Trust**: Importance of data quality dimensions: accuracy, completeness, consistency, timeliness, uniqueness, validity.
# 3
- **Data Categories**: Discrete, continuous, nominal, ordinal, interval, ratio, qualitative, and quantitative.
- **Data Interchange Formats**: Common formats include CSV and JSON.
- **Data Quality Dimensions**: Accuracy, completeness, consistency, timelessness, uniqueness, validity.
- **Data Types**: Primary (real-time, specific) vs. secondary (past, economical).
- **Data Acquisition Methods**: Capturing (sensors, surveys), retrieving (databases, APIs), collecting (web scraping).
- **FAIR and Open Data**: Principles for sustainable data usage and importance in scientific reproducibility.
# 4
- **Primary vs. Secondary Data**: Primary data is collected for a specific purpose, while secondary data is sourced from existing datasets.
- **Data Collection Techniques**: Includes scraping, which extracts data from websites, and considerations for legality and data protection.
- **Data Protection**: Emphasizes GDPR compliance, anonymization, and pseudonymization of personal data.
- **Statistics Basics**: Introduces descriptive and inductive statistics, frequency distributions, and graphical representations like histograms and bar charts.
- **FAIR Principles**: Focus on data findability, accessibility, interoperability, and reusability.
# 5
- **Data Scraping**: Extracts data from program outputs; should be a last resort.
- **Anonymization**: Removes personal info to protect identity; pseudonymization allows identification with additional info.
- **Statistics Types**: Descriptive, explorative, and inductive statistics.
- **Frequencies**: Absolute and relative frequencies; visualized through histograms, pie charts, and bar charts.
- **Central Tendencies**: Mode, median, and mean; box plots visualize data distribution.
- **Statistical Dispersion**: Measures spread of data; includes range, quartile range, and empirical variance.
# 6
- **Histograms**: Visual representation of frequency for continuous data.
- **Cumulative Frequency**: Measures total frequency up to a certain value.
- **Statistical Dispersion**: Includes empirical variance and standard deviation.
- **Bivariate Analysis**: Examines relationships between two variables.
- **Correlation Coefficients**: Quantifies the strength and direction of relationships.
- **Contingency Tables**: Displays frequencies of categorical variables.
- **Pearson Coefficient**: Measures linear correlation between metric variables.
- **Ordinal Data**: Can be analyzed using rank correlation methods.
# 7
- **Correlation**: Describes relationships between two variables using correlation coefficients based on variable types (nominal, ordinal, metric).
- **Contingency Tables**: Used for two-dimensional frequency distributions; includes conditional frequencies and measures of association.
- **Probability Theory**: Introduces random experiments, events, and Kolmogorov axioms; covers Laplace experiments and combinatorics.
- **Bayes Theorem**: Explains conditional probability and its application in real-world scenarios, such as medical testing.
- **Outcome**: Understanding of probability basics, combinatorial calculations, and Bayes theorem application.
# 8
- **Random Experiment**: Defined by well-defined conditions with unpredictable outcomes (e.g., dice throw).
- **Kolmogorov Axioms**: Fundamental properties of probability measures.
- **Random Variables**: Assign outcomes to numbers; can be discrete (countable values) or continuous (any value in an interval).
- **Distributions**: Includes discrete (e.g., binomial, uniform) and continuous (e.g., normal) distributions.
- **Expected Value & Variance**: Key metrics for understanding random variables' behavior.
- **Applications**: Used in statistical tests and linear regression.
# 9
- **Random Variables**: Defined as functions mapping outcomes to real numbers.
- **Discrete vs. Continuous Distributions**: Discrete has countable outcomes; continuous uses probability density functions.
- **Simple Linear Regression**: Models correlation between independent (X) and dependent (Y) variables.
- **Key Concepts**:
- **Residual Analysis**: Evaluates fit of regression line.
- **Determinacy Measure (R²)**: Indicates model fit; ranges from 0 to 1.
- **Estimation**: Parameters (β0, β1) estimated using least squares method.
- **Applications**: Used in various fields to predict outcomes based on correlations.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

View File

@@ -0,0 +1,11 @@
![[Pasted image 20241205160950.png]]
![[Pasted image 20241205161002.png]]
![[Pasted image 20241205161024.png]]
![[Pasted image 20241205161036.png]]
![[Pasted image 20241205161108.png]]
![[Pasted image 20241205161119.png]]
![[Pasted image 20241205161308.png]]
![[Pasted image 20241205161326.png]]
![[Pasted image 20241205161509.png]]
![[Pasted image 20241205161717.png]]
![[Pasted image 20241205161739.png]]

Binary file not shown.

Binary file not shown.

View File

@@ -0,0 +1,14 @@
![[Pasted image 20241205234600.png]]
![[Pasted image 20241205234621.png]]
![[Pasted image 20241205234646.png]]
![[Pasted image 20241205234656.png]]
![[Pasted image 20241205234732.png]]
![[Pasted image 20241205234924.png]]
![[Pasted image 20241205234938.png]]
![[Pasted image 20241205234951.png]]
![[Pasted image 20241205235039.png]]
![[Pasted image 20241205235249.png]]
![[Pasted image 20241205235306.png]]
![[Pasted image 20241205235322.png]]
![[Pasted image 20241205235347.png]]
![[Pasted image 20241205235403.png]]

Binary file not shown.

View File

@@ -0,0 +1,6 @@
![[Pasted image 20241205235524.png]]
![[Pasted image 20241205235604.png]]
![[Pasted image 20241205235614.png]]
![[Pasted image 20241205235635.png]]
![[Pasted image 20241205235827.png]]
![[Pasted image 20241205235837.png]]