Files
obsidian/WS2425/Data Science/VL/Zusammenfassung.md

7.7 KiB
Raw Blame History

1

  1. Organizational Information

    • Page: 2
    • Notes:
  2. Introduction to Data Science

    • Page: 4-7
    • Notes:
      • Definition: Data science is about turning raw data into meaningful insights.
      • Interdisciplinary field combining statistics, computing, and domain knowledge.
      • Historical context of the term “data science” from 1962 to 2001.
  3. What is Data Science?

    • Page: 8-10
    • Notes:
      • Data science involves using methods and systems to extract knowledge from data.
      • The intersection of math/statistics, computer science, and domain knowledge is crucial.
  4. Practical Example of Data Science Project: Monkey Detection

    • Page: 16-24
    • Notes:
      • Steps include understanding the problem, data collection, labeling, model training, and deployment.
  5. Related Fields in Data Science

    • Page: 14-15
    • Notes:
      • Data Engineering: Building systems for data collection and processing.
      • Data Analysis: Inspecting and transforming data to inform decisions.
  6. Tasks in Data Science

    • Page: 16
    • Notes:
      • Overview of different tasks within classical machine learning.
  7. Real-World Examples of Data Science Applications

    • Page: 25-32
    • Notes:
      • Applications include autonomous driving, face recognition, predictive maintenance, fraud detection, recommendation systems, and cancer detection.
  8. Overview of Lecture Content

    • Page: 34-35
    • Notes:
      • Basic topics include data basics, statistics, presentation techniques, and machine learning.
  9. Organizational: Schedule and Exam Information

    • Page: 38-40
    • Notes:
      • Lecture and exercise schedules, language of instruction, and exam details (written exam with bonus points for data analytics).
  10. Expectations from Students

    • Page: 42-43
    • Notes:
      • Emphasis on respect, professionalism, and willingness to participate.
  11. How to Continue in Data Science

    • Page: 46-48
    • Notes:
      • Suggested literature for further reading and related courses available in the curriculum.
  12. Summary & References

    • Page: 51-55
    • Notes:
      • Key takeaways: ability to explain data science and recognize its applications.
      • Important references for further study are provided.

2

  • Data Science Definition: Creating knowledge from data using math, statistics, and computer science.

  • Data Types:

    • Structured: Follows a predefined model (e.g., tables).

    • Unstructured: Lacks explicit structure (e.g., text, images).

  • Data Categories:

    • Discrete vs. Continuous

    • Nominal, Ordinal, Interval, Ratio

    • Qualitative vs. Quantitative

  • Data Interchange Formats: Common formats include CSV and JSON.

  • Data Trust: Importance of data quality dimensions: accuracy, completeness, consistency, timeliness, uniqueness, validity.

3

  • Data Categories: Discrete, continuous, nominal, ordinal, interval, ratio, qualitative, and quantitative.

  • Data Interchange Formats: Common formats include CSV and JSON.

  • Data Quality Dimensions: Accuracy, completeness, consistency, timelessness, uniqueness, validity.

  • Data Types: Primary (real-time, specific) vs. secondary (past, economical).

  • Data Acquisition Methods: Capturing (sensors, surveys), retrieving (databases, APIs), collecting (web scraping).

  • FAIR and Open Data: Principles for sustainable data usage and importance in scientific reproducibility.

4

  • Primary vs. Secondary Data: Primary data is collected for a specific purpose, while secondary data is sourced from existing datasets.

  • Data Collection Techniques: Includes scraping, which extracts data from websites, and considerations for legality and data protection.

  • Data Protection: Emphasizes GDPR compliance, anonymization, and pseudonymization of personal data.

  • Statistics Basics: Introduces descriptive and inductive statistics, frequency distributions, and graphical representations like histograms and bar charts.

  • FAIR Principles: Focus on data findability, accessibility, interoperability, and reusability.

5

  • Data Scraping: Extracts data from program outputs; should be a last resort.

  • Anonymization: Removes personal info to protect identity; pseudonymization allows identification with additional info.

  • Statistics Types: Descriptive, explorative, and inductive statistics.

  • Frequencies: Absolute and relative frequencies; visualized through histograms, pie charts, and bar charts.

  • Central Tendencies: Mode, median, and mean; box plots visualize data distribution.

  • Statistical Dispersion: Measures spread of data; includes range, quartile range, and empirical variance.

6

  • Histograms: Visual representation of frequency for continuous data.

  • Cumulative Frequency: Measures total frequency up to a certain value.

  • Statistical Dispersion: Includes empirical variance and standard deviation.

  • Bivariate Analysis: Examines relationships between two variables.

  • Correlation Coefficients: Quantifies the strength and direction of relationships.

  • Contingency Tables: Displays frequencies of categorical variables.

  • Pearson Coefficient: Measures linear correlation between metric variables.

  • Ordinal Data: Can be analyzed using rank correlation methods.

7

  • Correlation: Describes relationships between two variables using correlation coefficients based on variable types (nominal, ordinal, metric).

  • Contingency Tables: Used for two-dimensional frequency distributions; includes conditional frequencies and measures of association.

  • Probability Theory: Introduces random experiments, events, and Kolmogorov axioms; covers Laplace experiments and combinatorics.

  • Bayes Theorem: Explains conditional probability and its application in real-world scenarios, such as medical testing.

  • Outcome: Understanding of probability basics, combinatorial calculations, and Bayes theorem application.

8

  • Random Experiment: Defined by well-defined conditions with unpredictable outcomes (e.g., dice throw).

  • Kolmogorov Axioms: Fundamental properties of probability measures.

  • Random Variables: Assign outcomes to numbers; can be discrete (countable values) or continuous (any value in an interval).

  • Distributions: Includes discrete (e.g., binomial, uniform) and continuous (e.g., normal) distributions.

  • Expected Value & Variance: Key metrics for understanding random variables' behavior.

  • Applications: Used in statistical tests and linear regression.

9

  • Random Variables: Defined as functions mapping outcomes to real numbers.

  • Discrete vs. Continuous Distributions: Discrete has countable outcomes; continuous uses probability density functions.

  • Simple Linear Regression: Models correlation between independent (X) and dependent (Y) variables.

  • Key Concepts:

    • Residual Analysis: Evaluates fit of regression line.

    • Determinacy Measure (R²): Indicates model fit; ranges from 0 to 1.

  • Estimation: Parameters (β0, β1) estimated using least squares method.

  • Applications: Used in various fields to predict outcomes based on correlations.