Usage Guide#
tailestim provides various methods for estimating tail parameters of heavy-tailed distributions, which is useful for analyzing power-law behavior in complex networks.
Installation#
This package is available from PyPI and conda-forge.
pip install tailestim
conda install conda-forge::tailestim
Quick Start#
Using Built-in Datasets#
from tailestim import TailData
from tailestim import HillEstimator, KernelTypeEstimator, MomentsEstimator
# Load a sample dataset
data = TailData(name='CAIDA_KONECT').data
# Initialize and fit the Hill estimator
estimator = HillEstimator()
estimator.fit(data)
# Get estimated values
result = estimator.get_result()
gamma = result.gamma_
# Print full results
print(result)
Using degree sequence from networkx graphs#
import networkx as nx
from tailestim import HillEstimator, KernelTypeEstimator, MomentsEstimator
# Create or load your network
G = nx.barabasi_albert_graph(10000, 2)
degree = list(dict(G.degree()).values()) # Degree sequence
# Initialize and fit the Hill estimator
estimator = HillEstimator()
estimator.fit(degree)
# Get estimated values
result = estimator.get_result()
gamma = result.gamma_
# Print full results
print(result)
Available Estimators#
The package provides several estimators for tail estimation. For details on each estimator, refer to the respective class API reference.
Hill Estimator (
HillEstimator) - Classical Hill estimator with double-bootstrap for optimal threshold selection - Generally recommended for power law analysisMoments Estimator (
MomentsEstimator) - Moments-based estimation with double-bootstrap - More robust to certain types of deviations from pure power lawKernel-type Estimator (
KernelEstimator) - Kernel-based estimation with double-bootstrap and bandwidth selectionPickands Estimator (
PickandsEstimator) - Pickands-based estimation (no bootstrap) - Provides arrays of estimates across different thresholdsSmooth Hill Estimator (
SmoothHillEstimator) - Smoothed version of the Hill estimator (no bootstrap)
Results#
The full result can be obtained by result = estimator.get_result(). You can either print the result, or access individual attributes (e.g., result.gamma_). The output will include values such as:
gamma_: Power law exponent (γ = 1 + 1/ξ)xi_star_: Tail index (ξ)k_star_: Optimal order statisticBootstrap results (when applicable): - First and second bootstrap AMSE values - Optimal bandwidths or minimum AMSE fractions
Example Output#
When you print(result) after fitting, you will get the following output:
--------------------------------------------------
Result
--------------------------------------------------
Order statistics: Array of shape (200,) [1.0000, 1.0000, 1.0000, ...]
Tail index estimates: Array of shape (200,) [1614487461647431761920.0000, 1249994621547387551744.0000, 967791073562264862720.0000, ...]
Optimal order statistic (k*): 25153
Tail index (ξ): 0.5942
Power law exponent (γ): 2.6828
Bootstrap Results:
First Bootstrap:
Fraction of order statistics: None
AMSE values: None
H Min: 0.9059
Maximum index: None
Second Bootstrap:
Fraction of order statistics: None
AMSE values: None
H Min: 0.9090
Maximum index: None
Built-in Datasets and Custom Data#
The package includes several example datasets:
CAIDA_KONECTLibimseti_in_KONECTPareto(Follows power-law with γ=2.5)
Load any example dataset using:
from tailestim import TailData
data = TailData(name='dataset_name').data
You can also load your own custom datasets by providing a path:
from tailestim import TailData
data = TailData(path='path/to/my/data.dat').data
The custom data file should follow the same format as the built-in datasets: a plain text file where each line contains two values separated by a space: - The first value (k) is the degree or value - The second value (n(k)) is the count or frequency of that value
For example:
`
10 3
20 2
30 1
`
This represents that there are 3 instances of value 10, 2 instances of value 20, and 1 instance of value 30.