vault backup: 2024-12-05 23:59:54

This commit is contained in:
2024-12-05 23:59:54 +01:00
parent 27b50b6b08
commit ea60f51b63
61 changed files with 8725 additions and 69 deletions

View File

@@ -0,0 +1,185 @@
# 1
1. **Organizational Information**
- **Page:** 2
- **Notes:**
- Contact: [klaus.kaiser@fh-dortmund.de](mailto:klaus.kaiser@fh-dortmund.de)
- Room: B.2.04
- Professor Klaus Kaiser has a background in data science across various industries.
2. **Introduction to Data Science**
- **Page:** 4-7
- **Notes:**
- Definition: Data science is about turning raw data into meaningful insights.
- Interdisciplinary field combining statistics, computing, and domain knowledge.
- Historical context of the term “data science” from 1962 to 2001.
3. **What is Data Science?**
- **Page:** 8-10
- **Notes:**
- Data science involves using methods and systems to extract knowledge from data.
- The intersection of math/statistics, computer science, and domain knowledge is crucial.
4. **Practical Example of Data Science Project: Monkey Detection**
- **Page:** 16-24
- **Notes:**
- Steps include understanding the problem, data collection, labeling, model training, and deployment.
5. **Related Fields in Data Science**
- **Page:** 14-15
- **Notes:**
- Data Engineering: Building systems for data collection and processing.
- Data Analysis: Inspecting and transforming data to inform decisions.
6. **Tasks in Data Science**
- **Page:** 16
- **Notes:**
- Overview of different tasks within classical machine learning.
7. **Real-World Examples of Data Science Applications**
- **Page:** 25-32
- **Notes:**
- Applications include autonomous driving, face recognition, predictive maintenance, fraud detection, recommendation systems, and cancer detection.
8. **Overview of Lecture Content**
- **Page:** 34-35
- **Notes:**
- Basic topics include data basics, statistics, presentation techniques, and machine learning.
9. **Organizational: Schedule and Exam Information**
- **Page:** 38-40
- **Notes:**
- Lecture and exercise schedules, language of instruction, and exam details (written exam with bonus points for data analytics).
10. **Expectations from Students**
- **Page:** 42-43
- **Notes:**
- Emphasis on respect, professionalism, and willingness to participate.
11. **How to Continue in Data Science**
- **Page:** 46-48
- **Notes:**
- Suggested literature for further reading and related courses available in the curriculum.
12. **Summary & References**
- **Page:** 51-55
- **Notes:**
- Key takeaways: ability to explain data science and recognize its applications.
- Important references for further study are provided.
# 2
- **Data Science Definition**: Creating knowledge from data using math, statistics, and computer science.
- **Data Types**:
- **Structured**: Follows a predefined model (e.g., tables).
- **Unstructured**: Lacks explicit structure (e.g., text, images).
- **Data Categories**:
- Discrete vs. Continuous
- Nominal, Ordinal, Interval, Ratio
- Qualitative vs. Quantitative
- **Data Interchange Formats**: Common formats include CSV and JSON.
- **Data Trust**: Importance of data quality dimensions: accuracy, completeness, consistency, timeliness, uniqueness, validity.
# 3
- **Data Categories**: Discrete, continuous, nominal, ordinal, interval, ratio, qualitative, and quantitative.
- **Data Interchange Formats**: Common formats include CSV and JSON.
- **Data Quality Dimensions**: Accuracy, completeness, consistency, timelessness, uniqueness, validity.
- **Data Types**: Primary (real-time, specific) vs. secondary (past, economical).
- **Data Acquisition Methods**: Capturing (sensors, surveys), retrieving (databases, APIs), collecting (web scraping).
- **FAIR and Open Data**: Principles for sustainable data usage and importance in scientific reproducibility.
# 4
- **Primary vs. Secondary Data**: Primary data is collected for a specific purpose, while secondary data is sourced from existing datasets.
- **Data Collection Techniques**: Includes scraping, which extracts data from websites, and considerations for legality and data protection.
- **Data Protection**: Emphasizes GDPR compliance, anonymization, and pseudonymization of personal data.
- **Statistics Basics**: Introduces descriptive and inductive statistics, frequency distributions, and graphical representations like histograms and bar charts.
- **FAIR Principles**: Focus on data findability, accessibility, interoperability, and reusability.
# 5
- **Data Scraping**: Extracts data from program outputs; should be a last resort.
- **Anonymization**: Removes personal info to protect identity; pseudonymization allows identification with additional info.
- **Statistics Types**: Descriptive, explorative, and inductive statistics.
- **Frequencies**: Absolute and relative frequencies; visualized through histograms, pie charts, and bar charts.
- **Central Tendencies**: Mode, median, and mean; box plots visualize data distribution.
- **Statistical Dispersion**: Measures spread of data; includes range, quartile range, and empirical variance.
# 6
- **Histograms**: Visual representation of frequency for continuous data.
- **Cumulative Frequency**: Measures total frequency up to a certain value.
- **Statistical Dispersion**: Includes empirical variance and standard deviation.
- **Bivariate Analysis**: Examines relationships between two variables.
- **Correlation Coefficients**: Quantifies the strength and direction of relationships.
- **Contingency Tables**: Displays frequencies of categorical variables.
- **Pearson Coefficient**: Measures linear correlation between metric variables.
- **Ordinal Data**: Can be analyzed using rank correlation methods.
# 7
- **Correlation**: Describes relationships between two variables using correlation coefficients based on variable types (nominal, ordinal, metric).
- **Contingency Tables**: Used for two-dimensional frequency distributions; includes conditional frequencies and measures of association.
- **Probability Theory**: Introduces random experiments, events, and Kolmogorov axioms; covers Laplace experiments and combinatorics.
- **Bayes Theorem**: Explains conditional probability and its application in real-world scenarios, such as medical testing.
- **Outcome**: Understanding of probability basics, combinatorial calculations, and Bayes theorem application.
# 8
- **Random Experiment**: Defined by well-defined conditions with unpredictable outcomes (e.g., dice throw).
- **Kolmogorov Axioms**: Fundamental properties of probability measures.
- **Random Variables**: Assign outcomes to numbers; can be discrete (countable values) or continuous (any value in an interval).
- **Distributions**: Includes discrete (e.g., binomial, uniform) and continuous (e.g., normal) distributions.
- **Expected Value & Variance**: Key metrics for understanding random variables' behavior.
- **Applications**: Used in statistical tests and linear regression.
# 9
- **Random Variables**: Defined as functions mapping outcomes to real numbers.
- **Discrete vs. Continuous Distributions**: Discrete has countable outcomes; continuous uses probability density functions.
- **Simple Linear Regression**: Models correlation between independent (X) and dependent (Y) variables.
- **Key Concepts**:
- **Residual Analysis**: Evaluates fit of regression line.
- **Determinacy Measure (R²)**: Indicates model fit; ranges from 0 to 1.
- **Estimation**: Parameters (β0, β1) estimated using least squares method.
- **Applications**: Used in various fields to predict outcomes based on correlations.