Estimator Set#
- class tailestim.estimators.estimator_set.TailEstimatorSet(data=None, output_file_path=None, number_of_bins=30, r_smooth=2, alpha=0.6, hsteps=200, bootstrap_flag=True, t_bootstrap=0.5, r_bootstrap=500, diagnostic_plots=False, eps_stop=1.0, theta1=0.01, theta2=0.99, verbose=False, noise_flag=True, p_noise=1, savedata=False, auto_plot=False, base_seed=None)[source]#
Bases:
objectClass for running estimation with multiple estimator methods at once and creating a plot for comparison.
- Parameters:
- datanp.ndarray
The data to plot.
- output_file_pathstr, optional
File path to which plots should be saved. If None, the figure is not saved.
- number_of_binsint, default=30
Number of log-bins for degree distribution.
- r_smoothint, default=2
Integer parameter controlling the width of smoothing window. Typically small value such as 2 or 3.
- alphafloat, default=0.6
Parameter controlling the amount of “smoothing” for the kernel-type estimator. Should be greater than 0.5.
- hstepsint, default=200
Parameter controlling number of bandwidth steps of the kernel-type estimator.
- bootstrap_flagbool, default=True
Flag to switch on/off double-bootstrap procedure.
- t_bootstrapfloat, default=0.5
Parameter controlling the size of the 2nd bootstrap. Defined from n2 = n*(t_bootstrap).
- r_bootstrapint, default=500
Number of bootstrap resamplings for the 1st and 2nd bootstraps.
- diagnostic_plotsbool, default=False
Flag to switch on/off generation of AMSE diagnostic plots.
- eps_stopfloat, default=1.0
Parameter controlling range of AMSE minimization. Defined as the fraction of order statistics to consider during the AMSE minimization step.
- theta1float, default=0.01
Lower bound of plotting range, defined as k_min = ceil(n^theta1). Overwritten if plots behave badly within the range.
- theta2float, default=0.99
Upper bound of plotting range, defined as k_max = floor(n^theta2). Overwritten if plots behave badly within the range.
- verbosebool, default=False
Flag controlling bootstrap verbosity.
- noise_flagbool, default=True
Switch on/off uniform noise in range [-5*10^(-p), 5*10^(-p)] that is added to each data point. Used for integer-valued sequences.
- p_noiseint, default=1
Integer parameter controlling noise amplitude.
- savedatabool, default=False
Flag to save data files in the directory with plots.
- auto_plotbool, default=False
Whether to create the plots immediately upon initialization.
- base_seed: None | SeedSequence | BitGenerator | Generator | RandomState, default=None
Base random seed for reproducibility of bootstrap. Only used for methods with bootstrap.
- fit(data)[source]#
Fit the estimators to the data.
- Parameters:
- datanp.ndarray
The data to fit the estimators to.
- Returns:
- selfTailEstimatorSet
The fitted estimator set.
- get_params()[source]#
Get the parameters used for plotting.
- Returns:
- Dict[str, Any]
Dictionary of parameters used for plotting.
Examples#
from tailestim import TailData, TailEstimatorSet
import matplotlib.pyplot as plt
data = TailData(name='CAIDA_KONECT').data
estim_set = TailEstimatorSet()
estim_set.fit(data)
estim_set.plot()
plt.show()
estim_set.plot_diagnostics()
plt.show()