Usage Guide#

tailestim provides various methods for estimating tail parameters of heavy-tailed distributions, which is useful for analyzing power-law behavior in complex networks.

Installation#

This package is available from PyPI and conda-forge.

pip install tailestim
conda install conda-forge::tailestim

Quick Start#

Using Built-in Datasets#

from tailestim import TailData
from tailestim import HillEstimator, KernelTypeEstimator, MomentsEstimator

# Load a sample dataset
data = TailData(name='CAIDA_KONECT').data

# Initialize and fit the Hill estimator
estimator = HillEstimator()
estimator.fit(data)

# Get estimated values
result = estimator.get_result()
gamma = result.gamma_

# Print full results
print(result)

Using degree sequence from networkx graphs#

import networkx as nx
from tailestim import HillEstimator, KernelTypeEstimator, MomentsEstimator

# Create or load your network
G = nx.barabasi_albert_graph(10000, 2)
degree = list(dict(G.degree()).values()) # Degree sequence

# Initialize and fit the Hill estimator
estimator = HillEstimator()
estimator.fit(degree)

# Get estimated values
result = estimator.get_result()
gamma = result.gamma_

# Print full results
print(result)

Available Estimators#

The package provides several estimators for tail estimation. For details on each estimator, refer to the respective class API reference.

Hill Estimator (HillEstimator) - Classical Hill estimator with double-bootstrap for optimal threshold selection - Generally recommended for power law analysis
Moments Estimator (MomentsEstimator) - Moments-based estimation with double-bootstrap - More robust to certain types of deviations from pure power law
Kernel-type Estimator (KernelEstimator) - Kernel-based estimation with double-bootstrap and bandwidth selection
Pickands Estimator (PickandsEstimator) - Pickands-based estimation (no bootstrap) - Provides arrays of estimates across different thresholds
Smooth Hill Estimator (SmoothHillEstimator) - Smoothed version of the Hill estimator (no bootstrap)

Results#

The full result can be obtained by result = estimator.get_result(). You can either print the result, or access individual attributes (e.g., result.gamma_). The output will include values such as:

gamma_: Power law exponent (γ = 1 + 1/ξ)
xi_star_: Tail index (ξ)
k_star_: Optimal order statistic
Bootstrap results (when applicable): - First and second bootstrap AMSE values - Optimal bandwidths or minimum AMSE fractions

Example Output#

When you print(result) after fitting, you will get the following output:

--------------------------------------------------
Result
--------------------------------------------------
Order statistics: Array of shape (200,) [1.0000, 1.0000, 1.0000, ...]
Tail index estimates: Array of shape (200,) [1614487461647431761920.0000, 1249994621547387551744.0000, 967791073562264862720.0000, ...]
Optimal order statistic (k*): 25153
Tail index (ξ): 0.5942
Power law exponent (γ): 2.6828
Bootstrap Results:
First Bootstrap:
   Fraction of order statistics: None
   AMSE values: None
   H Min: 0.9059
   Maximum index: None
Second Bootstrap:
   Fraction of order statistics: None
   AMSE values: None
   H Min: 0.9090
   Maximum index: None

Built-in Datasets and Custom Data#

The package includes several example datasets:

CAIDA_KONECT
Libimseti_in_KONECT
Pareto (Follows power-law with γ=2.5)

Load any example dataset using:

from tailestim import TailData
data = TailData(name='dataset_name').data

You can also load your own custom datasets by providing a path:

from tailestim import TailData
data = TailData(path='path/to/my/data.dat').data

The custom data file should follow the same format as the built-in datasets: a plain text file where each line contains two values separated by a space: - The first value (k) is the degree or value - The second value (n(k)) is the count or frequency of that value

For example: ` 10 3 20 2 30 1 ` This represents that there are 3 instances of value 10, 2 instances of value 20, and 1 instance of value 30.