5.4 KiB
To pseudonymize the "name" values in the given dataset using the specified hash function, let's follow these steps:
-
Convert each character in the name to its alphabetic position using the
alphfunction, where:A = 1,B = 2,C = 3, and so on, up toZ = 26.
-
Compute the hash value using the formula:
-
h(s) = \sum_{i} \text{alph}(s_i) \mod 13where ( s_i ) is the ( i )-th character in the string.
-
Add a salt to the hash value. The salt is a random value that is combined with the hash to make it more secure.
Let's apply these steps to each name in the dataset.
Step-by-Step Calculation
For each name, we'll calculate the hash value before adding a salt.
1. Name: Franz
F = 6,R = 18,A = 1,N = 14,Z = 26- Sum of positions: ( 6 + 18 + 1 + 14 + 26 = 65 )
- Hash: ( 65 \mod 13 = 0 )
2. Name: Antje
A = 1,N = 14,T = 20,J = 10,E = 5- Sum of positions: ( 1 + 14 + 20 + 10 + 5 = 50 )
- Hash: ( 50 \mod 13 = 11 )
3. Name: Alex
A = 1,L = 12,E = 5,X = 24- Sum of positions: ( 1 + 12 + 5 + 24 = 42 )
- Hash: ( 42 \mod 13 = 3 )
Pseudonymized Dataset with Hash Values
We can now add the calculated hash values to the dataset. We'll also mention that a salt should be added to these hash values for increased security in a real implementation:
| Original Name | Height | Shoe Size | Hash Value (no salt) |
|---|---|---|---|
| Franz | 165 | 40 | 0 |
| Antje | 170 | 39 | 11 |
| Alex | 174 | 42 | 3 |
To finalize the pseudonymization process, a random salt should be combined with these hash values, ensuring that even if two identical names are processed, they won't result in the same pseudonymized value.
Let's go through the steps for this exercise.
Given Data
The weights of the luggage are:
22, 44, 11, 19, 21, 17, 17, 11, 11, 19, 22, 17
Categories:
- Light: less than 15 kg
- Normal: between 15 and 20 kg
- Overweight: more than 20 kg
Part (a): Absolute and Relative Frequencies
Step 1: Categorize the weights
We will count how many weights fall into each category.
| Category | Weight Values | Frequency (Absolute) |
|---|---|---|
| Light | 11, 11, 11 | 3 |
| Normal | 19, 17, 17, 17, 19 | 5 |
| Overweight | 22, 44, 21, 22 | 4 |
Step 2: Calculate Relative Frequencies
Relative frequency is calculated as:
\text{Relative Frequency} = \frac{\text{Absolute Frequency}}{\text{Total Number of Weights}}
Total number of weights = 12
| Category | Absolute Frequency | Relative Frequency |
|---|---|---|
| Light | 3 | ( \frac{3}{12} = 0.25 ) or 25% |
| Normal | 5 | ( \frac{5}{12} \approx 0.42 ) or 42% |
| Overweight | 4 | ( \frac{4}{12} \approx 0.33 ) or 33% |
Part (b): Empirical Distribution Function and Question
The empirical distribution function (EDF) represents the cumulative frequency of the dataset.
Let's arrange the weights in increasing order:
11, 11, 11, 17, 17, 17, 19, 19, 21, 22, 22, 44
The EDF for these weights can be expressed as:
- Less than or equal to 11 kg: 3/12 = 0.25
- Less than or equal to 17 kg: 6/12 = 0.5
- Less than or equal to 19 kg: 8/12 = 0.67
- Less than or equal to 21 kg: 9/12 = 0.75
- Less than or equal to 22 kg: 11/12 = 0.92
- Less than or equal to 44 kg: 12/12 = 1
Question: What is the proportion of weights that are less than 18 kg or more than 23 kg?
- Weights less than 18 kg: 6 out of 12 = ( \frac{6}{12} = 0.5 ) or 50%
- Weights more than 23 kg: 1 out of 12 = ( \frac{1}{12} \approx 0.08 ) or 8%
Proportion of weights that are less than 18 kg or more than 23 kg:
0.5 + 0.08 = 0.58 \text{ or 58%}
Visualization: Bar Chart and Histogram
Now, let's create a bar chart and histogram for the weight categories. I will generate these charts using the data provided.
These visualizations help in understanding how the weights are distributed across the categories.