Usage Guide ========== tailestim provides various methods for estimating tail parameters of heavy-tailed distributions, which is useful for analyzing power-law behavior in complex networks. Installation ---------- This package is available from PyPI and conda-forge. .. code-block:: bash pip install tailestim conda install conda-forge::tailestim Quick Start ---------- Using Built-in Datasets ~~~~~~~~~~~~~~~~~~~~~ .. code-block:: python from tailestim import TailData from tailestim import HillEstimator, KernelTypeEstimator, MomentsEstimator # Load a sample dataset data = TailData(name='CAIDA_KONECT').data # Initialize and fit the Hill estimator estimator = HillEstimator() estimator.fit(data) # Get estimated values result = estimator.get_result() gamma = result.gamma_ # Print full results print(result) Using degree sequence from networkx graphs ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: python import networkx as nx from tailestim import HillEstimator, KernelTypeEstimator, MomentsEstimator # Create or load your network G = nx.barabasi_albert_graph(10000, 2) degree = list(dict(G.degree()).values()) # Degree sequence # Initialize and fit the Hill estimator estimator = HillEstimator() estimator.fit(degree) # Get estimated values result = estimator.get_result() gamma = result.gamma_ # Print full results print(result) Available Estimators ------------------ The package provides several estimators for tail estimation. For details on each estimator, refer to the respective class :doc:`API reference `. 1. **Hill Estimator** (``HillEstimator``) - Classical Hill estimator with double-bootstrap for optimal threshold selection - Generally recommended for power law analysis 2. **Moments Estimator** (``MomentsEstimator``) - Moments-based estimation with double-bootstrap - More robust to certain types of deviations from pure power law 3. **Kernel-type Estimator** (``KernelEstimator``) - Kernel-based estimation with double-bootstrap and bandwidth selection 4. **Pickands Estimator** (``PickandsEstimator``) - Pickands-based estimation (no bootstrap) - Provides arrays of estimates across different thresholds 5. **Smooth Hill Estimator** (``SmoothHillEstimator``) - Smoothed version of the Hill estimator (no bootstrap) Results ------- The full result can be obtained by ``result = estimator.get_result()``. You can either print the result, or access individual attributes (e.g., `result.gamma_`). The output will include values such as: - ``gamma_``: Power law exponent (γ = 1 + 1/ξ) - ``xi_star_``: Tail index (ξ) - ``k_star_``: Optimal order statistic - Bootstrap results (when applicable): - First and second bootstrap AMSE values - Optimal bandwidths or minimum AMSE fractions Example Output ------------ When you ``print(result)`` after fitting, you will get the following output: .. code-block:: text -------------------------------------------------- Result -------------------------------------------------- Order statistics: Array of shape (200,) [1.0000, 1.0000, 1.0000, ...] Tail index estimates: Array of shape (200,) [1614487461647431761920.0000, 1249994621547387551744.0000, 967791073562264862720.0000, ...] Optimal order statistic (k*): 25153 Tail index (ξ): 0.5942 Power law exponent (γ): 2.6828 Bootstrap Results: First Bootstrap: Fraction of order statistics: None AMSE values: None H Min: 0.9059 Maximum index: None Second Bootstrap: Fraction of order statistics: None AMSE values: None H Min: 0.9090 Maximum index: None Built-in Datasets and Custom Data ------------------------------- The package includes several example datasets: - ``CAIDA_KONECT`` - ``Libimseti_in_KONECT`` - ``Pareto`` (Follows power-law with γ=2.5) Load any example dataset using: .. code-block:: python from tailestim import TailData data = TailData(name='dataset_name').data You can also load your own custom datasets by providing a path: .. code-block:: python from tailestim import TailData data = TailData(path='path/to/my/data.dat').data The custom data file should follow the same format as the built-in datasets: a plain text file where each line contains two values separated by a space: - The first value (k) is the degree or value - The second value (n(k)) is the count or frequency of that value For example: ``` 10 3 20 2 30 1 ``` This represents that there are 3 instances of value 10, 2 instances of value 20, and 1 instance of value 30.