Revised paper: "Towards a psychology of individuals: The ergodicity information index and a bottom-up approach for finding generalizations"

Revised paper: "Towards a psychology of individuals: The ergodicity information index and a bottom-up approach for finding generalizations", by Golino, Christensen, and Nesselroade (under review). Keep reading for a detailed overview. Link to the pre-print in the comments.

This study introduces the Ergodicity Information Index (EII), a novel metric quantifying information loss when aggregating individual data into population structures. Key innovations:

EII: Measures structural ergodicity using network science & information theory

DynEGA: Estimates intra- & inter-individual network structures (and the representation of individual networks and population networks as multilayer networks)

Super-weak ergodicity: A new, less restrictive ergodicity concept

Information theory clustering: Identifies subgroups of individuals using a bottom-up approach.

The Ergodicity Information Index: The EII characterizes the relative algorithm complexity of the population structure with regard to multiple individual networks taking into consideration the number of underlying dimensions (e.g., communities, latent factors). The algorithm complexity of multiplex networks can be used to determine the optimal number of layers needed to represent a multiplex network and to detect structural and dynamical similarities among their layers (Santoro & Nicosia, 2020). The representation of intraindividual structures as a multiplex network and the quantification of their information relative to a single, population network structure are two central ideas in the development of our EII.

Algorithm (or Kolmogorov) Complexity of Networks:

The individual networks and population network in Figure 1 can be compared in terms of their algorithm complexity. Algorithm complexity can be used to analyze complex objects in an unbiased manner using mathematical principles (Zenil et al., 2018), and is based on the work of Kolmogorov, Martin-L?f, Solomonoff, and others. The figure below helps to illustrate the idea of Kolmogorov (or algorithm) complexity. The two networks with six nodes have different Kolmogorov complexity because the program needed to describe them differs in length.


The “program” defining network one has the following code:

- red is connected to blue, green, and purple

- blue is connected to red and green

- green is connected to red and blue

- purple is connected to red, orange, and yellow

- orange is connected to purple and yellow

- yellow is connected to orange and purple


While the “program” defining network two has the following code:

- All pairs of nodes are connected.



Network one has a higher Kolmogorov complexity because the program needed to define it is long, while network two has a lower Kolmogorov complexity because the program needed to define it is short. In sum, in information theory, complexity is similar to description length. The longer the description length, the higher the complexity. This concept is also related to randomness and structure, chaos and regularity (Downey & Hirschfeldt, 2010; Shen, Uspensky, & Vereshchagin, 2022; Velupillai, 2011). Kolmogorov complexity, while challenging to directly estimate, can be approximated in networks using compression algorithms to analyze the compressed weighted edge list, providing a practical approach to measure the complexity of network structures (Morzy et al., 2017; Santoro & Nicosia, 2020). Shuffling the weighted edge list of network one and network two (Figure above) 1,000 times and compressing the shuffled weighted edge list for each network generates the distribution seen in the Figure below.

The mean length of the compressed weighted edge list of the networks is their estimated Kolmogorov complexity.



Kolmogorov Complexity of Multiplex Networks:

For multiplex networks, the Kolmogorov complexity requires a strategy to encode all individual graphs into a single network. Santoro and Nicosia (2020) proposed the use of a prime-weight encoding matrix Ω that assigns a distinct prime number (p^[α]) to each individual network (i.e., each of the A layers of the multiplex networks) and sets each element Ωij equal to the product of the primes associated with the layers where an edge between node i and j exists. The prime-weight encoding matrix preserves full information about the placement of all edges of the multiplex network (Santoro & Nicosia, 2020). The prime-weight encoding of a multiplex network enables the estimation of the Kolmogorov complexity for all layers (networks) simultaneously.

The prime-weight encoding uses prime numbers to uniquely “tag” each network. By using the prime numbers as “tags” for each network in increasing order (canonical prime encoding) in which each prime number is associated with the layers with the lower number of edges, the prime numbers act as unique IDs to encode which edges came from which individual networks. This enables a joint representation of all networks in the multiplex network. It also opens up the possibility of comparing these individual networks of the multiplex network with population structures, as estimated in the dynamic exploratory graph analysis technique.

Santoro and Nicosia (2020) proposed a new metric for quantifying the algorithm complexity of multiplex networks that can be computed as the ratio of the (approximate) Kolmogorov complexity of the prime-weight matrix Ω of a multiplex network with A layers and the Kolmogorov complexity of an aggregated network combining all layers.

The Ergodicity Information Index:

In the current paper, we propose a similar strategy to quantify the algorithm complexity of the networks estimated using DynEGA. The multiplex networks are all individual networks estimated using the derivatives computed via GLLA in the DynEGA technique. Instead of comparing the algorithm complexity of Ω with a weight aggregation of the multiplex networks, it is more informative to compare it with the population network (i.e., the network estimated stacking the derivatives estimated using GLLA for all individuals). Additionally, in psychology, it is also important to consider the number of latent factors underlying the intensive-longitudinal data. Therefore, our ergodicity information index can be computed as:

ξ = sqrt(FP + 1)^{[KΩ/KP?]/log(Lχ)}

Where sqrt(FP + 1) is the square-root of the number of factors estimated in the population structure using DynEGA, KΩ is the algorithm complexity of the prime-weight encoding matrix of the individual networks (that composes square root multiplex network χ), KP? is the algorithm complexity of the prime-weight transformation of the population network (i.e., each element in the population network, P_{ij}, is transformed such that P_{ij}* = 2^{P_{ij}}), and Lχ is the number of distinct edges across the networks that make up the multiplex network (i.e., non-zero edges). One is added to the number of factors estimated in the population network to deal with unidimensional population structures.

The EII (ξ) computes the amount of information lost representing a set of measures as a single interindividual structure (nomothetic structure) instead of representing the measures as multiple individual structures (within-person or intraindividual structures). Larger values of the EII indicate that the intraindividual networks encode a relatively larger amount of information with respect to the population network.

The use of the EII implies a different type of ergodicity that we call super-weak ergodicity. In a strict definition of ergodicity, there are two central requirements: homogeneity of all participants and stationarity for all time points. A softer type of ergodicity termed weak ergodicity, requires only that the marginal distributions for all participants and for all time points be identical. The super-weak ergodicity, on the other side, doesn't require stationary for all participants (i.e., the same covariance matrix for all subjects), homogeneity for all time points (i.e., the same covariance matrix across time), or equal marginal distributions for all participants and time points. It requires a much weaker condition: the algorithm complexity of the population (or between-person) network be similar (but not equal) to the algorithm complexity of the prime-weight encoded network of all individuals.

Simulation:

The Monte Carlo simulation tested EII across 216 conditions, varying:

Sample sizes (50, 100)

Variables per factor (4, 6)

Number of factors (2, 3)

Error levels (0.125, 0.25, 0.5)

Dynamic Factor loadings (0.4, 0.6, 0.8)

Key findings:

Unweighted EII: Highest accuracy (83.48%).

High accuracy for low error (99.98%) & high loadings (92%).

Performance declines with high error.

More variables per factor improve accuracy (90.16% for 6 vs. 76.79% for 4).

A Test for Ergodicity:

The paper introduces a novel bootstrap test for ergodicity using the Ergodicity Information Index (EII):

1. Compute empirical EII

2. Randomly shuffle shared edges between population & individual networks

3. Recompute EII for shuffled networks (repeat 1000+ times)

4. Compare empirical EII to the distribution of shuffled EIIs

Hypothesis test:

H0: ξ ≥ ξR (non-ergodic)

HA: ξ < ξR (ergodic)

Where ξ is empirical EII and ξR is from random process.

If p < 0.05, reject H0 and conclude the process is ergodic.

This test helps determine if individual structures can be meaningfully represented by a population structure.

When a system is non-ergodic, this new clustering approach finds potential ergodic subgroups: 1. Compute pairwise Jensen-Shannon Distance (JSD) between individual networks 2. Use Ward's hierarchical clustering on JSD values 3. Cut trees at all possible levels (2 to N clusters) 4. Compute modularity for each cluster solution using similarity matrix (1 - JSD) 5. Select a cluster solution with the highest modularity

Key features: Uses Von Neumann entropy of networks Balances within-cluster similarity and between-cluster differences Identifies generalizable characteristics from individual structures This method enables researchers to find meaningful subgroups in non-ergodic systems, supporting a more nuanced understanding of psychological processes.


Empirical Examples:

Personality Data (N=119, Big Five Inventory-2):

? Non-ergodic overall (EII = 3.688, p = 0.998)

? Two clusters identified: Group 1 (N=44): Ergodic (EII = 3.994, p < 0.001)

Group 2 (N=75): Non-ergodic

Brain Network Data (N=62, younger & older adults):

? Non-ergodic overall (EII = 7.945, p = 0.462)

? Ergodic when separated by age: Younger: EII = 6.106, p < 0.001 Older: EII = 5.387, p < 0.001

? Information clustering accurately identified age groups (71% accuracy)

I will discuss more the empirical examples later this week. See the link to the pre-print below:

https://osf.io/preprints/psyarxiv/th6rm

要查看或添加评论,请登录

Hudson Golino的更多文章

社区洞察

其他会员也浏览了