vault backup: 2024-12-05 23:59:54
This commit is contained in:
185
WS2425/Data Science/VL/Zusammenfassung.md
Normal file
185
WS2425/Data Science/VL/Zusammenfassung.md
Normal file
@@ -0,0 +1,185 @@
|
||||
# 1
|
||||
1. **Organizational Information**
|
||||
|
||||
- **Page:** 2
|
||||
- **Notes:**
|
||||
- Contact: [klaus.kaiser@fh-dortmund.de](mailto:klaus.kaiser@fh-dortmund.de)
|
||||
- Room: B.2.04
|
||||
- Professor Klaus Kaiser has a background in data science across various industries.
|
||||
2. **Introduction to Data Science**
|
||||
|
||||
- **Page:** 4-7
|
||||
- **Notes:**
|
||||
- Definition: Data science is about turning raw data into meaningful insights.
|
||||
- Interdisciplinary field combining statistics, computing, and domain knowledge.
|
||||
- Historical context of the term “data science” from 1962 to 2001.
|
||||
3. **What is Data Science?**
|
||||
|
||||
- **Page:** 8-10
|
||||
- **Notes:**
|
||||
- Data science involves using methods and systems to extract knowledge from data.
|
||||
- The intersection of math/statistics, computer science, and domain knowledge is crucial.
|
||||
4. **Practical Example of Data Science Project: Monkey Detection**
|
||||
|
||||
- **Page:** 16-24
|
||||
- **Notes:**
|
||||
- Steps include understanding the problem, data collection, labeling, model training, and deployment.
|
||||
5. **Related Fields in Data Science**
|
||||
|
||||
- **Page:** 14-15
|
||||
- **Notes:**
|
||||
- Data Engineering: Building systems for data collection and processing.
|
||||
- Data Analysis: Inspecting and transforming data to inform decisions.
|
||||
6. **Tasks in Data Science**
|
||||
|
||||
- **Page:** 16
|
||||
- **Notes:**
|
||||
- Overview of different tasks within classical machine learning.
|
||||
7. **Real-World Examples of Data Science Applications**
|
||||
|
||||
- **Page:** 25-32
|
||||
- **Notes:**
|
||||
- Applications include autonomous driving, face recognition, predictive maintenance, fraud detection, recommendation systems, and cancer detection.
|
||||
8. **Overview of Lecture Content**
|
||||
|
||||
- **Page:** 34-35
|
||||
- **Notes:**
|
||||
- Basic topics include data basics, statistics, presentation techniques, and machine learning.
|
||||
9. **Organizational: Schedule and Exam Information**
|
||||
|
||||
- **Page:** 38-40
|
||||
- **Notes:**
|
||||
- Lecture and exercise schedules, language of instruction, and exam details (written exam with bonus points for data analytics).
|
||||
10. **Expectations from Students**
|
||||
|
||||
- **Page:** 42-43
|
||||
- **Notes:**
|
||||
- Emphasis on respect, professionalism, and willingness to participate.
|
||||
11. **How to Continue in Data Science**
|
||||
|
||||
- **Page:** 46-48
|
||||
- **Notes:**
|
||||
- Suggested literature for further reading and related courses available in the curriculum.
|
||||
12. **Summary & References**
|
||||
|
||||
- **Page:** 51-55
|
||||
- **Notes:**
|
||||
- Key takeaways: ability to explain data science and recognize its applications.
|
||||
- Important references for further study are provided.
|
||||
|
||||
# 2
|
||||
- **Data Science Definition**: Creating knowledge from data using math, statistics, and computer science.
|
||||
|
||||
- **Data Types**:
|
||||
|
||||
- **Structured**: Follows a predefined model (e.g., tables).
|
||||
|
||||
- **Unstructured**: Lacks explicit structure (e.g., text, images).
|
||||
|
||||
- **Data Categories**:
|
||||
|
||||
- Discrete vs. Continuous
|
||||
|
||||
- Nominal, Ordinal, Interval, Ratio
|
||||
|
||||
- Qualitative vs. Quantitative
|
||||
|
||||
- **Data Interchange Formats**: Common formats include CSV and JSON.
|
||||
|
||||
- **Data Trust**: Importance of data quality dimensions: accuracy, completeness, consistency, timeliness, uniqueness, validity.
|
||||
|
||||
# 3
|
||||
- **Data Categories**: Discrete, continuous, nominal, ordinal, interval, ratio, qualitative, and quantitative.
|
||||
|
||||
- **Data Interchange Formats**: Common formats include CSV and JSON.
|
||||
|
||||
- **Data Quality Dimensions**: Accuracy, completeness, consistency, timelessness, uniqueness, validity.
|
||||
|
||||
- **Data Types**: Primary (real-time, specific) vs. secondary (past, economical).
|
||||
|
||||
- **Data Acquisition Methods**: Capturing (sensors, surveys), retrieving (databases, APIs), collecting (web scraping).
|
||||
|
||||
- **FAIR and Open Data**: Principles for sustainable data usage and importance in scientific reproducibility.
|
||||
|
||||
# 4
|
||||
- **Primary vs. Secondary Data**: Primary data is collected for a specific purpose, while secondary data is sourced from existing datasets.
|
||||
|
||||
- **Data Collection Techniques**: Includes scraping, which extracts data from websites, and considerations for legality and data protection.
|
||||
|
||||
- **Data Protection**: Emphasizes GDPR compliance, anonymization, and pseudonymization of personal data.
|
||||
|
||||
- **Statistics Basics**: Introduces descriptive and inductive statistics, frequency distributions, and graphical representations like histograms and bar charts.
|
||||
|
||||
- **FAIR Principles**: Focus on data findability, accessibility, interoperability, and reusability.
|
||||
|
||||
# 5
|
||||
- **Data Scraping**: Extracts data from program outputs; should be a last resort.
|
||||
|
||||
- **Anonymization**: Removes personal info to protect identity; pseudonymization allows identification with additional info.
|
||||
|
||||
- **Statistics Types**: Descriptive, explorative, and inductive statistics.
|
||||
|
||||
- **Frequencies**: Absolute and relative frequencies; visualized through histograms, pie charts, and bar charts.
|
||||
|
||||
- **Central Tendencies**: Mode, median, and mean; box plots visualize data distribution.
|
||||
|
||||
- **Statistical Dispersion**: Measures spread of data; includes range, quartile range, and empirical variance.
|
||||
|
||||
# 6
|
||||
- **Histograms**: Visual representation of frequency for continuous data.
|
||||
|
||||
- **Cumulative Frequency**: Measures total frequency up to a certain value.
|
||||
|
||||
- **Statistical Dispersion**: Includes empirical variance and standard deviation.
|
||||
|
||||
- **Bivariate Analysis**: Examines relationships between two variables.
|
||||
|
||||
- **Correlation Coefficients**: Quantifies the strength and direction of relationships.
|
||||
|
||||
- **Contingency Tables**: Displays frequencies of categorical variables.
|
||||
|
||||
- **Pearson Coefficient**: Measures linear correlation between metric variables.
|
||||
|
||||
- **Ordinal Data**: Can be analyzed using rank correlation methods.
|
||||
|
||||
# 7
|
||||
- **Correlation**: Describes relationships between two variables using correlation coefficients based on variable types (nominal, ordinal, metric).
|
||||
|
||||
- **Contingency Tables**: Used for two-dimensional frequency distributions; includes conditional frequencies and measures of association.
|
||||
|
||||
- **Probability Theory**: Introduces random experiments, events, and Kolmogorov axioms; covers Laplace experiments and combinatorics.
|
||||
|
||||
- **Bayes’ Theorem**: Explains conditional probability and its application in real-world scenarios, such as medical testing.
|
||||
|
||||
- **Outcome**: Understanding of probability basics, combinatorial calculations, and Bayes’ theorem application.
|
||||
|
||||
# 8
|
||||
- **Random Experiment**: Defined by well-defined conditions with unpredictable outcomes (e.g., dice throw).
|
||||
|
||||
- **Kolmogorov Axioms**: Fundamental properties of probability measures.
|
||||
|
||||
- **Random Variables**: Assign outcomes to numbers; can be discrete (countable values) or continuous (any value in an interval).
|
||||
|
||||
- **Distributions**: Includes discrete (e.g., binomial, uniform) and continuous (e.g., normal) distributions.
|
||||
|
||||
- **Expected Value & Variance**: Key metrics for understanding random variables' behavior.
|
||||
|
||||
- **Applications**: Used in statistical tests and linear regression.
|
||||
|
||||
# 9
|
||||
- **Random Variables**: Defined as functions mapping outcomes to real numbers.
|
||||
|
||||
- **Discrete vs. Continuous Distributions**: Discrete has countable outcomes; continuous uses probability density functions.
|
||||
|
||||
- **Simple Linear Regression**: Models correlation between independent (X) and dependent (Y) variables.
|
||||
|
||||
- **Key Concepts**:
|
||||
|
||||
- **Residual Analysis**: Evaluates fit of regression line.
|
||||
|
||||
- **Determinacy Measure (R²)**: Indicates model fit; ranges from 0 to 1.
|
||||
|
||||
- **Estimation**: Parameters (β0, β1) estimated using least squares method.
|
||||
|
||||
- **Applications**: Used in various fields to predict outcomes based on correlations.
|
||||
|
||||
BIN
WS2425/Data Science/VL/lecture_01.pdf
Normal file
BIN
WS2425/Data Science/VL/lecture_01.pdf
Normal file
Binary file not shown.
BIN
WS2425/Data Science/VL/lecture_02.pdf
Normal file
BIN
WS2425/Data Science/VL/lecture_02.pdf
Normal file
Binary file not shown.
BIN
WS2425/Data Science/VL/lecture_03.pdf
Normal file
BIN
WS2425/Data Science/VL/lecture_03.pdf
Normal file
Binary file not shown.
BIN
WS2425/Data Science/VL/lecture_04.pdf
Normal file
BIN
WS2425/Data Science/VL/lecture_04.pdf
Normal file
Binary file not shown.
BIN
WS2425/Data Science/VL/lecture_05.pdf
Normal file
BIN
WS2425/Data Science/VL/lecture_05.pdf
Normal file
Binary file not shown.
BIN
WS2425/Data Science/VL/lecture_06.pdf
Normal file
BIN
WS2425/Data Science/VL/lecture_06.pdf
Normal file
Binary file not shown.
BIN
WS2425/Data Science/VL/lecture_07.pdf
Normal file
BIN
WS2425/Data Science/VL/lecture_07.pdf
Normal file
Binary file not shown.
11
WS2425/Data Science/VL/lecture_07_notes.md
Normal file
11
WS2425/Data Science/VL/lecture_07_notes.md
Normal file
@@ -0,0 +1,11 @@
|
||||
![[Pasted image 20241205160950.png]]
|
||||
![[Pasted image 20241205161002.png]]
|
||||
![[Pasted image 20241205161024.png]]
|
||||
![[Pasted image 20241205161036.png]]
|
||||
![[Pasted image 20241205161108.png]]
|
||||
![[Pasted image 20241205161119.png]]
|
||||
![[Pasted image 20241205161308.png]]
|
||||
![[Pasted image 20241205161326.png]]
|
||||
![[Pasted image 20241205161509.png]]
|
||||
![[Pasted image 20241205161717.png]]
|
||||
![[Pasted image 20241205161739.png]]
|
||||
BIN
WS2425/Data Science/VL/lecture_08.pdf
Normal file
BIN
WS2425/Data Science/VL/lecture_08.pdf
Normal file
Binary file not shown.
BIN
WS2425/Data Science/VL/lecture_08_neu.pdf
Normal file
BIN
WS2425/Data Science/VL/lecture_08_neu.pdf
Normal file
Binary file not shown.
14
WS2425/Data Science/VL/lecture_08_notes.md
Normal file
14
WS2425/Data Science/VL/lecture_08_notes.md
Normal file
@@ -0,0 +1,14 @@
|
||||
![[Pasted image 20241205234600.png]]
|
||||
![[Pasted image 20241205234621.png]]
|
||||
![[Pasted image 20241205234646.png]]
|
||||
![[Pasted image 20241205234656.png]]
|
||||
![[Pasted image 20241205234732.png]]
|
||||
![[Pasted image 20241205234924.png]]
|
||||
![[Pasted image 20241205234938.png]]
|
||||
![[Pasted image 20241205234951.png]]
|
||||
![[Pasted image 20241205235039.png]]
|
||||
![[Pasted image 20241205235249.png]]
|
||||
![[Pasted image 20241205235306.png]]
|
||||
![[Pasted image 20241205235322.png]]
|
||||
![[Pasted image 20241205235347.png]]
|
||||
![[Pasted image 20241205235403.png]]
|
||||
BIN
WS2425/Data Science/VL/lecture_09.pdf
Normal file
BIN
WS2425/Data Science/VL/lecture_09.pdf
Normal file
Binary file not shown.
6
WS2425/Data Science/VL/lecture_09_notes.md
Normal file
6
WS2425/Data Science/VL/lecture_09_notes.md
Normal file
@@ -0,0 +1,6 @@
|
||||
![[Pasted image 20241205235524.png]]
|
||||
![[Pasted image 20241205235604.png]]
|
||||
![[Pasted image 20241205235614.png]]
|
||||
![[Pasted image 20241205235635.png]]
|
||||
![[Pasted image 20241205235827.png]]
|
||||
![[Pasted image 20241205235837.png]]
|
||||
Reference in New Issue
Block a user