Simulate a sample of n = 100 random numbers in which 50 are drawn from a normal distribution N(5, 1) and the other 50 are drawn from a uniform distribution U(-100, 100). This represents a heavier "contamination" or "background noise" scenario compared to Project 7.
For this sample, calculate: (a) The sample mean. (b) The trimmed sample means, discarding 10%, 30%, and 60% of the data (from both ends). (c) The sample median.
Which estimate appears to be the most accurate estimate of the central tendency of the main (normal) distribution (i.e., closest to μ=5)?
Repeat this entire experiment 10,000 times. Compute the standard deviation for each of these five estimates based on the results from the 10,000 repetitions. The estimate with the smallest standard deviation (variance) is considered the best (most statistically efficient) under these heavy contamination conditions.
This project largely follows the methodology of Project 7, with adjustments to contamination levels and trimming percentages.
-
Parameters:
n: Total sample size = 100.n_normal: Number of normal samples = 50 (changed from 80).n_uniform: Number of uniform samples (contamination) = 50 (changed from 20).- Normal distribution: N(μ=5, σ=1) (same as Project 7).
- Uniform distribution: U(low=-100, high=100) (same as Project 7).
num_repetitions: Number of times to repeat the experiment = 10,000.- Trim proportions: 10% (0.1), 30% (0.3), and 60% (0.3 because
scipy.stats.trim_meantakes proportion from each end, so 0.3 from each end is 60% total. Note:proportiontocutfortrim_meanis for one side, but the problem asks for total. We must use 0.3 forproportiontocutto achieve 60% total discard if it means 30% from each end. If it means 60% from each end, that would discard 120% which is not possible. Assuming it means 30% from each end for a total of 60% discarded). Let's clarify the 60% trim. Thescipy.stats.trim_meanparameterproportiontocutis the proportion of observations to cut from each end of the sorted array. So, aproportiontocutof 0.3 means 30% from the low end and 30% from the high end, totaling 60% of data discarded. This seems correct for the problem statement.- Trim proportions for
scipy.stats.trim_mean: 0.1 (for 10% total), 0.3 (for 30% total), and for 60% total we need to be careful. Ifproportiontocutmeans total, then 0.6. If it means from each end, then 0.3. Thescipydoc says "Proportion of observations to cut from each end". So for 60% total discard, we need to discard 30% from each end. So,proportiontocutwill be 0.3 for this case. This means the 30% and 60% trimmed means might be the same if interpreted this way orproportiontocutneeds to be 0.05, 0.15, 0.3 for 10%, 30%, 60% total discarded. Let's re-read the README. "discarding 10%, 30%, and 60% of the data". This typically means total.scipy.stats.trim_mean'sproportiontocutis from each end. So, if we want to discard 10% total,proportiontocutshould be 0.05. For 30% total,proportiontocut= 0.15. For 60% total,proportiontocut= 0.3. - Corrected Trim proportions for
scipy.stats.trim_mean: 0.05 (for 10% total), 0.15 (for 30% total), 0.3 (for 60% total).
- Trim proportions for
-
Single Experiment Simulation (Identical to Project 7 logic):
- Generate
n_normal=50samples from N(5, 1). - Generate
n_uniform=50samples from U(-100, 100). - Combine to a single sample of
n=100. - Calculate statistics:
- Sample Mean.
- Trimmed Mean (10% total ->
proportiontocut=0.05). - Trimmed Mean (30% total ->
proportiontocut=0.15). - Trimmed Mean (60% total ->
proportiontocut=0.3). - Sample Median.
- Generate
-
Repeated Experiments (Identical to Project 7 logic):
- Repeat
num_repetitions(10,000) times. - Store the five statistics from each repetition.
- Repeat
-
Analysis (Identical to Project 7 logic, including plotting):
- Accuracy: Calculate average of each estimator, compare to μ=5.
- Precision: Calculate standard deviation of each estimator.
- Best Estimator: Smallest standard deviation.
- Visualization: Generate and save bar charts for average values, biases, and standard deviations of the estimators, similar to Project 7, naming files like
project_8_estimator_averages.png, etc.
-
Implementation Details:
- The code from
project_7.pycan be largely reused, adjusting parameters (especiallyN_NORMAL,N_UNIFORM, and the list ofTRIM_PROPORTIONSpassed tocalculate_estimatorsand used for naming inrun_simulationandanalyze_results). project_8.pywill be the new script.test_project_8.pyfor unit tests.
- The code from