This commit is contained in:
2024-12-08 18:19:24 +01:00
58 changed files with 3634 additions and 1 deletions

View File

@@ -0,0 +1,487 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "753cfea7-6082-484d-a916-50554ca4cb9c",
"metadata": {},
"source": [
"# Crash Course Python\n",
"During this session, you will get a brief introduction into Python. Therefore, follow the instructions in this notebook step by step. Do not hesitate to ask questions! The instruction given are not complete, therefore: Try it on your own, play a little with the code and take a look at the official documentation of python:\n",
"https://docs.python.org/3.11/"
]
},
{
"cell_type": "markdown",
"id": "beb6f785-e6a0-48e8-b838-a0e5a0987f8e",
"metadata": {},
"source": [
"## Data types\n",
"There are different built in data types in python. A variable takes the corresponding data type, if it is assigned to an instance of this type. with the Python command `type` one can check the type of a given variable or value."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "0f529564-bfee-4d55-8110-8be138863d75",
"metadata": {},
"outputs": [],
"source": [
"type(5)"
]
},
{
"cell_type": "markdown",
"id": "5f40d465-531f-420a-8442-6875e877bb5e",
"metadata": {},
"source": [
"### Boolean, Numbers & Strings\n",
"In Python one can easily work with boolean, numbers and strings. \n",
" - True and False are the constants for boolean. Operators are often written out (not False, True or False ...)\n",
" - Strings can be defined by \"...\" and '...', but also \"\"\"...\"\"\" for multi-line strings."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "962dfa3d-90b5-4ed8-b5f5-f32093f1daf7",
"metadata": {},
"outputs": [],
"source": [
"long_string = \"\"\"Hallo,\n",
"I'm a long string\"\"\""
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8592fb25-7882-456e-8418-b02ac9f0140c",
"metadata": {},
"outputs": [],
"source": [
"A = True\n",
"B = False\n",
"\n",
"not (A or B)"
]
},
{
"cell_type": "markdown",
"id": "3ac1a2e0-a7ec-43d2-90f6-220a717a3415",
"metadata": {},
"source": [
"**Task:** Play around with numbers, strings an boolean. Sum up some strings, define numbers and perform some basic math."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "db978452-f0d4-4511-9fba-e0d4231e3e6d",
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"id": "7142dffd-b4f0-40ea-8d0c-fee412705ab9",
"metadata": {},
"source": [
"Especially strings, have some built in functions. E.g. with upper() one can convert a string in only upper letters."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c1292afe-e088-4ca7-a32c-8026d68e37e5",
"metadata": {},
"outputs": [],
"source": [
"\"hallo\".upper()"
]
},
{
"cell_type": "markdown",
"id": "af78c85f-4e66-45f5-a114-4f8c992b8b07",
"metadata": {},
"source": [
"**Task:** Take a look into the documentation for strings and check some further functions one can directly use."
]
},
{
"cell_type": "markdown",
"id": "b297a9b2-a72d-477d-81aa-97c4af6f4d39",
"metadata": {},
"source": [
"### Lists\n",
"One special data type are lists. Similar to an array, a list is a chain of values. In Python a list is defined by [] and can store different data types."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "48d27fe8-609d-4f9a-9f50-03c221dad093",
"metadata": {},
"outputs": [],
"source": [
"[0, \"hallo\", False]"
]
},
{
"cell_type": "markdown",
"id": "6ce9c921-ff6a-4539-b0b8-b3c19f155bca",
"metadata": {},
"source": [
"A value in a list can be accessed by given the position in brackets []. E.g. in the following example the second (index 1) element is requested. One can also access elements backwards by using a negative index."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7ed90d0e-9181-46ea-9d07-19cdc8d75538",
"metadata": {},
"outputs": [],
"source": [
"[0, \"hallo\", False][1]"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4a3bea0f-1f45-4131-bb20-2fc30ce459a8",
"metadata": {},
"outputs": [],
"source": [
"[0, \"hallo\", False][-2]"
]
},
{
"cell_type": "markdown",
"id": "de6acc40-a65f-45cc-8cb6-8e77a5cdb79f",
"metadata": {},
"source": [
"One can also access a range of elements, by using the following notation in the brackets: start:end. The result is a list again. \n",
"\n",
"**Task** Take the upper list and access the first two elements of the list."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "2d1a5e9d-43a3-4793-9064-2b1f001923d4",
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"id": "3ebe85d9-5735-45c8-9aec-7e88732ce585",
"metadata": {},
"source": [
"Elements can be deleted from a list with the `del` command."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "cda33a05-45eb-4463-a6d8-ebad296719d9",
"metadata": {},
"outputs": [],
"source": [
"tmp_list = [0, \"hallo\", False]\n",
"del tmp_list[-2]\n",
"\n",
"tmp_list"
]
},
{
"cell_type": "markdown",
"id": "813ffc8e-9d2f-4cb1-bd2e-34b66a2541bc",
"metadata": {},
"source": [
"### Dictionaries\n",
"Next to lists (tuples and sets - not handled here), Python offers Dictionaries as an additional data type. A dictionary is a list of key-value pairs, where a value is accessed by the value. A dictionary is defined by {...}, keys are typically strings, values can be nearly anything one like."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "74b09698-9fb5-4fe0-9402-cee7085f637d",
"metadata": {},
"outputs": [],
"source": [
"some_functions = {\n",
" \"print\": print,\n",
" \"input\": input,\n",
"}\n",
"\n",
"some_functions[\"print\"](\"dictionaries can store nearly everything\")"
]
},
{
"cell_type": "markdown",
"id": "b1306508-e1a2-43da-a995-4fb09acc5fba",
"metadata": {},
"source": [
"**Task:** Create a dictionary which stores for a semester a list of lectures."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d693c391-727f-4da9-83e5-a8efb0395d34",
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"id": "66f74d92-497a-433c-9026-267308114e40",
"metadata": {},
"source": [
"## If statement\n",
"The if statement has the following structure: `if` CONDITION: Where condition is something evaluating to `True` or `False`. After the : a intended block begins, which is evaluated if the CONDITION is True. With else: and `elif` CONDITION the else or else if case can be used."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "cb3755b6-109f-419b-8ed2-0102a89479b9",
"metadata": {},
"outputs": [],
"source": [
"tmp_number = 3\n",
"\n",
"if tmp_number > 4:\n",
" print(f\"{tmp_number} is larger than 4\")\n",
"elif tmp_number < 4:\n",
" print(f\"{tmp_number} is smaller than 4\")\n",
"else:\n",
" print(f\"{tmp_number} equals 4\")"
]
},
{
"cell_type": "markdown",
"id": "c6ba6285-5228-4140-b03e-44733dddac05",
"metadata": {},
"source": [
"**Task:** Create an if statement which checks if a given number is even or odd. "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4a4d25b0-f476-4201-9f78-781a3d38b7c3",
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"id": "64fb2bdf-e2e2-428b-9ef1-bf79a296086d",
"metadata": {},
"source": [
"## for statement\n",
"The `for` statement has the following structure: for VALUE in ITERATOR: Where VALUE takes all values given in ITERATOR. After the : a intended block begins, which is evaluated for every step. ITERATOR can be everything one can iterate over. There are plenty functions like range to iterate over a list of numbers, but one can also use a list to iterate over."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3a4c109c-0966-4343-b4f3-c4a692adf52a",
"metadata": {},
"outputs": [],
"source": [
"for i in range(10):\n",
" print(i**2)"
]
},
{
"cell_type": "markdown",
"id": "5848b9ff-5a7d-4299-b9d2-fb0fc92fa455",
"metadata": {},
"source": [
"**Task:** Create a loop which iterates of every entry in a list of lectures and checks if the lecture is called \"Data Science\"."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "debf84c2-79cf-4b2b-9420-28c1a9ededf1",
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"id": "7650dc1e-8521-4ae2-b63f-d6954805385d",
"metadata": {},
"source": [
"## Functions\n",
"A function has the following structure: def NAME(VAR1,VAR2,VAR3=DEFAULTVAL3): Where after the : a intended block starts, which is evaluated when the function is called. The parameters is a list of Variables, where a variable can also have a default value. The returned value is given by a return statement."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d739bd2e-bfc7-4256-a9a5-947f7de0ada9",
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"id": "2f725e9f-9c99-417c-8911-439c96e33e24",
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"id": "a3eabab2-7e88-428f-b81f-2b910d61af74",
"metadata": {},
"outputs": [],
"source": [
"def my_add(x,y=2):\n",
" return x+y\n",
"\n",
"print(my_add(2,4))\n",
"print(my_add(2))"
]
},
{
"cell_type": "markdown",
"id": "5f5fca24-7b2c-493c-bd0e-af649ac89a0e",
"metadata": {},
"source": [
"**Task:** Create a function which takes a list of numbers and returns a list where every number is multiplied with a factor."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "1f1d96ed-9af3-435c-8df1-d15c89a83e6c",
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"id": "e7b67ab6-b009-414b-8f35-da4b1ec09564",
"metadata": {},
"source": [
"## Classes\n",
"Take a look into the official Python tutorial for the way how a class is defined in Python:\n",
"https://docs.python.org/3/tutorial/classes.html"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "430b950b-b9a8-4e42-8f71-9dd938959ac5",
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"id": "85f4f762-c825-4865-abab-74509038fbc2",
"metadata": {},
"source": [
"**Task:** Create a class which represents a lecture. Every lecture has a list of students and a title. Furthermore create a function which adds students to a lecture and a function which returns the number of students. "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "1efbb8a3-938a-4ebe-b05c-88687a49a6d6",
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"id": "04d85fef-d5eb-4c9a-aea3-c4647777d58b",
"metadata": {},
"source": [
"## Modules\n",
"Python has a lot of modules included, but there is also a huge amount of models which can be installed. A modul can be imported with the import command. A module can be installed with the help of pip (command line tool). "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b8b0d5b5-ec05-4c29-948b-a9c09a7944c1",
"metadata": {},
"outputs": [],
"source": [
"import tqdm\n",
"\n",
"r = 0\n",
"for i in tqdm.tqdm(range(10000000)):\n",
" r += i\n",
"\n",
"print(r)"
]
},
{
"cell_type": "markdown",
"id": "f5c40f75-f103-47c6-b5ca-a7fae135bd2f",
"metadata": {},
"source": [
"In jupyter a command line command can be executed in a code cell if it starts with a !."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "42c49615-29ef-40b3-9954-8cf589bd505b",
"metadata": {},
"outputs": [],
"source": [
"!cmd"
]
},
{
"cell_type": "markdown",
"id": "cbcc2c40-05c9-42cc-9281-51fd307d44f9",
"metadata": {},
"source": [
"**Task:** Take a look on PyPI and search for an interesting module. Install the module and try it out. Some example modules:\n",
" - tensorflow (deep learning)\n",
" - numpy (numerical methods)\n",
" - sklearn (machine learning)\n",
" - ..."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "93cf0ef8-c5a6-4e6b-b7d8-9d55728e4ff4",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.9"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

Binary file not shown.

File diff suppressed because it is too large Load Diff

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

View File

@@ -0,0 +1,185 @@
# 1
1. **Organizational Information**
- **Page:** 2
- **Notes:**
- Contact: [klaus.kaiser@fh-dortmund.de](mailto:klaus.kaiser@fh-dortmund.de)
- Room: B.2.04
- Professor Klaus Kaiser has a background in data science across various industries.
2. **Introduction to Data Science**
- **Page:** 4-7
- **Notes:**
- Definition: Data science is about turning raw data into meaningful insights.
- Interdisciplinary field combining statistics, computing, and domain knowledge.
- Historical context of the term “data science” from 1962 to 2001.
3. **What is Data Science?**
- **Page:** 8-10
- **Notes:**
- Data science involves using methods and systems to extract knowledge from data.
- The intersection of math/statistics, computer science, and domain knowledge is crucial.
4. **Practical Example of Data Science Project: Monkey Detection**
- **Page:** 16-24
- **Notes:**
- Steps include understanding the problem, data collection, labeling, model training, and deployment.
5. **Related Fields in Data Science**
- **Page:** 14-15
- **Notes:**
- Data Engineering: Building systems for data collection and processing.
- Data Analysis: Inspecting and transforming data to inform decisions.
6. **Tasks in Data Science**
- **Page:** 16
- **Notes:**
- Overview of different tasks within classical machine learning.
7. **Real-World Examples of Data Science Applications**
- **Page:** 25-32
- **Notes:**
- Applications include autonomous driving, face recognition, predictive maintenance, fraud detection, recommendation systems, and cancer detection.
8. **Overview of Lecture Content**
- **Page:** 34-35
- **Notes:**
- Basic topics include data basics, statistics, presentation techniques, and machine learning.
9. **Organizational: Schedule and Exam Information**
- **Page:** 38-40
- **Notes:**
- Lecture and exercise schedules, language of instruction, and exam details (written exam with bonus points for data analytics).
10. **Expectations from Students**
- **Page:** 42-43
- **Notes:**
- Emphasis on respect, professionalism, and willingness to participate.
11. **How to Continue in Data Science**
- **Page:** 46-48
- **Notes:**
- Suggested literature for further reading and related courses available in the curriculum.
12. **Summary & References**
- **Page:** 51-55
- **Notes:**
- Key takeaways: ability to explain data science and recognize its applications.
- Important references for further study are provided.
# 2
- **Data Science Definition**: Creating knowledge from data using math, statistics, and computer science.
- **Data Types**:
- **Structured**: Follows a predefined model (e.g., tables).
- **Unstructured**: Lacks explicit structure (e.g., text, images).
- **Data Categories**:
- Discrete vs. Continuous
- Nominal, Ordinal, Interval, Ratio
- Qualitative vs. Quantitative
- **Data Interchange Formats**: Common formats include CSV and JSON.
- **Data Trust**: Importance of data quality dimensions: accuracy, completeness, consistency, timeliness, uniqueness, validity.
# 3
- **Data Categories**: Discrete, continuous, nominal, ordinal, interval, ratio, qualitative, and quantitative.
- **Data Interchange Formats**: Common formats include CSV and JSON.
- **Data Quality Dimensions**: Accuracy, completeness, consistency, timelessness, uniqueness, validity.
- **Data Types**: Primary (real-time, specific) vs. secondary (past, economical).
- **Data Acquisition Methods**: Capturing (sensors, surveys), retrieving (databases, APIs), collecting (web scraping).
- **FAIR and Open Data**: Principles for sustainable data usage and importance in scientific reproducibility.
# 4
- **Primary vs. Secondary Data**: Primary data is collected for a specific purpose, while secondary data is sourced from existing datasets.
- **Data Collection Techniques**: Includes scraping, which extracts data from websites, and considerations for legality and data protection.
- **Data Protection**: Emphasizes GDPR compliance, anonymization, and pseudonymization of personal data.
- **Statistics Basics**: Introduces descriptive and inductive statistics, frequency distributions, and graphical representations like histograms and bar charts.
- **FAIR Principles**: Focus on data findability, accessibility, interoperability, and reusability.
# 5
- **Data Scraping**: Extracts data from program outputs; should be a last resort.
- **Anonymization**: Removes personal info to protect identity; pseudonymization allows identification with additional info.
- **Statistics Types**: Descriptive, explorative, and inductive statistics.
- **Frequencies**: Absolute and relative frequencies; visualized through histograms, pie charts, and bar charts.
- **Central Tendencies**: Mode, median, and mean; box plots visualize data distribution.
- **Statistical Dispersion**: Measures spread of data; includes range, quartile range, and empirical variance.
# 6
- **Histograms**: Visual representation of frequency for continuous data.
- **Cumulative Frequency**: Measures total frequency up to a certain value.
- **Statistical Dispersion**: Includes empirical variance and standard deviation.
- **Bivariate Analysis**: Examines relationships between two variables.
- **Correlation Coefficients**: Quantifies the strength and direction of relationships.
- **Contingency Tables**: Displays frequencies of categorical variables.
- **Pearson Coefficient**: Measures linear correlation between metric variables.
- **Ordinal Data**: Can be analyzed using rank correlation methods.
# 7
- **Correlation**: Describes relationships between two variables using correlation coefficients based on variable types (nominal, ordinal, metric).
- **Contingency Tables**: Used for two-dimensional frequency distributions; includes conditional frequencies and measures of association.
- **Probability Theory**: Introduces random experiments, events, and Kolmogorov axioms; covers Laplace experiments and combinatorics.
- **Bayes Theorem**: Explains conditional probability and its application in real-world scenarios, such as medical testing.
- **Outcome**: Understanding of probability basics, combinatorial calculations, and Bayes theorem application.
# 8
- **Random Experiment**: Defined by well-defined conditions with unpredictable outcomes (e.g., dice throw).
- **Kolmogorov Axioms**: Fundamental properties of probability measures.
- **Random Variables**: Assign outcomes to numbers; can be discrete (countable values) or continuous (any value in an interval).
- **Distributions**: Includes discrete (e.g., binomial, uniform) and continuous (e.g., normal) distributions.
- **Expected Value & Variance**: Key metrics for understanding random variables' behavior.
- **Applications**: Used in statistical tests and linear regression.
# 9
- **Random Variables**: Defined as functions mapping outcomes to real numbers.
- **Discrete vs. Continuous Distributions**: Discrete has countable outcomes; continuous uses probability density functions.
- **Simple Linear Regression**: Models correlation between independent (X) and dependent (Y) variables.
- **Key Concepts**:
- **Residual Analysis**: Evaluates fit of regression line.
- **Determinacy Measure (R²)**: Indicates model fit; ranges from 0 to 1.
- **Estimation**: Parameters (β0, β1) estimated using least squares method.
- **Applications**: Used in various fields to predict outcomes based on correlations.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

View File

@@ -0,0 +1,11 @@
![[Pasted image 20241205160950.png]]
![[Pasted image 20241205161002.png]]
![[Pasted image 20241205161024.png]]
![[Pasted image 20241205161036.png]]
![[Pasted image 20241205161108.png]]
![[Pasted image 20241205161119.png]]
![[Pasted image 20241205161308.png]]
![[Pasted image 20241205161326.png]]
![[Pasted image 20241205161509.png]]
![[Pasted image 20241205161717.png]]
![[Pasted image 20241205161739.png]]

Binary file not shown.

Binary file not shown.

View File

@@ -0,0 +1,14 @@
![[Pasted image 20241205234600.png]]
![[Pasted image 20241205234621.png]]
![[Pasted image 20241205234646.png]]
![[Pasted image 20241205234656.png]]
![[Pasted image 20241205234732.png]]
![[Pasted image 20241205234924.png]]
![[Pasted image 20241205234938.png]]
![[Pasted image 20241205234951.png]]
![[Pasted image 20241205235039.png]]
![[Pasted image 20241205235249.png]]
![[Pasted image 20241205235306.png]]
![[Pasted image 20241205235322.png]]
![[Pasted image 20241205235347.png]]
![[Pasted image 20241205235403.png]]

Binary file not shown.

View File

@@ -0,0 +1,6 @@
![[Pasted image 20241205235524.png]]
![[Pasted image 20241205235604.png]]
![[Pasted image 20241205235614.png]]
![[Pasted image 20241205235635.png]]
![[Pasted image 20241205235827.png]]
![[Pasted image 20241205235837.png]]