Estimator Set#

class tailestim.estimators.estimator_set.TailEstimatorSet(data=None, output_file_path=None, number_of_bins=30, r_smooth=2, alpha=0.6, hsteps=200, bootstrap_flag=True, t_bootstrap=0.5, r_bootstrap=500, diagnostic_plots=False, eps_stop=1.0, theta1=0.01, theta2=0.99, verbose=False, noise_flag=True, p_noise=1, savedata=False, auto_plot=False, base_seed=None)[source]#

Bases: object

Class for running estimation with multiple estimator methods at once and creating a plot for comparison.

Parameters:
datanp.ndarray

The data to plot.

output_file_pathstr, optional

File path to which plots should be saved. If None, the figure is not saved.

number_of_binsint, default=30

Number of log-bins for degree distribution.

r_smoothint, default=2

Integer parameter controlling the width of smoothing window. Typically small value such as 2 or 3.

alphafloat, default=0.6

Parameter controlling the amount of “smoothing” for the kernel-type estimator. Should be greater than 0.5.

hstepsint, default=200

Parameter controlling number of bandwidth steps of the kernel-type estimator.

bootstrap_flagbool, default=True

Flag to switch on/off double-bootstrap procedure.

t_bootstrapfloat, default=0.5

Parameter controlling the size of the 2nd bootstrap. Defined from n2 = n*(t_bootstrap).

r_bootstrapint, default=500

Number of bootstrap resamplings for the 1st and 2nd bootstraps.

diagnostic_plotsbool, default=False

Flag to switch on/off generation of AMSE diagnostic plots.

eps_stopfloat, default=1.0

Parameter controlling range of AMSE minimization. Defined as the fraction of order statistics to consider during the AMSE minimization step.

theta1float, default=0.01

Lower bound of plotting range, defined as k_min = ceil(n^theta1). Overwritten if plots behave badly within the range.

theta2float, default=0.99

Upper bound of plotting range, defined as k_max = floor(n^theta2). Overwritten if plots behave badly within the range.

verbosebool, default=False

Flag controlling bootstrap verbosity.

noise_flagbool, default=True

Switch on/off uniform noise in range [-5*10^(-p), 5*10^(-p)] that is added to each data point. Used for integer-valued sequences.

p_noiseint, default=1

Integer parameter controlling noise amplitude.

savedatabool, default=False

Flag to save data files in the directory with plots.

auto_plotbool, default=False

Whether to create the plots immediately upon initialization.

base_seed: None | SeedSequence | BitGenerator | Generator | RandomState, default=None

Base random seed for reproducibility of bootstrap. Only used for methods with bootstrap.

__call__()[source]#

Return the figure and axes when the object is called.

__repr__()[source]#

Return a string representation of the object.

fit(data)[source]#

Fit the estimators to the data.

Parameters:
datanp.ndarray

The data to fit the estimators to.

Returns:
selfTailEstimatorSet

The fitted estimator set.

get_params()[source]#

Get the parameters used for plotting.

Returns:
Dict[str, Any]

Dictionary of parameters used for plotting.

plot()[source]#

Create and return the plots.

Returns:
figmatplotlib.figure.Figure

The figure object.

axesnumpy.ndarray

Array of axes objects.

plot_diagnostics()[source]#

Create and return the diagnostic plots.

Returns:
fig_dmatplotlib.figure.Figure

The diagnostic figure object.

axes_dnumpy.ndarray

Array of diagnostic axes objects.

Raises:
ValueError

If no data has been fitted or if bootstrap is not enabled.

Examples#

from tailestim import TailData, TailEstimatorSet
import matplotlib.pyplot as plt

data = TailData(name='CAIDA_KONECT').data

estim_set = TailEstimatorSet()
estim_set.fit(data)

estim_set.plot()
plt.show()

estim_set.plot_diagnostics()
plt.show()