In the TDApplied package vignette “TDApplied Theory
and Practice” simulated data is used to provide examples of package
function. In this vignette we will demonstrate that
TDApplied can carry out meaningful analyses of real
(i.e. non-simulated) data which other software packages cannot. We
analyzed data from a very well-known neurological dataset, and
identified topological features of neurological computation in single
and multiple subjects which were correlated with (i) the task the
subjects were performing while the data was collected, and (ii) the
behavior (i.e. reaction time) of the subjects during the task – these
correlations suggest that the features are meaningful in the context of
an neuroimaging analysis. Moreover, these features were identified,
interpreted and analyzed with the
bootstrap_persistence_diagram
, vr_graphs
and
diagram_kpca
functions, only the first of which has any
implementation in another R package (TDA).
TDApplied is therefore a powerful tool for applied
topological analyses of data.
A popular technology for studying neural function is called functional magnetic resonance imaging (fMRI), in which oxygenated blood-flow across the brain is detected via magnetic resonance over multiple time points; fMRI is a proxy measurement of neural activity. Spatial activity patterns, i.e. vectors of measured values across space in a single time point, are modulated by performing tasks. The study design is the sequence of temporal blocks of performing these tasks and each task type is called a condition, and a study design can evoke meaningful information about neural processing related to tasks. Collections of spatial activity patterns (for example the spatial patterns evoked by a particular task over multiple time points) have previously been analyzed with topological and geometric techniques which capture their global structural features (J. Shine 2019; Saggar et al. 2018, 2021; X. Liu, Chang, and Duyn 2013). However, these analyses are not designed to capture the spatially periodic features of fMRI data which we would expect to exist in abundance (Greve et al. 2013; T. Liu 2016; Caballero-Gaudes and Reynolds 2017). One persistent homology analysis of fMRI spatial pattern data found robust 0-dimensional topological features (i.e. clusters of time points with similar spatial patterns) whose persistence values negatively correlated with fluid intelligence (Anderson et al. 2018). Larger differences between spatial patterns at different time points generally corresponded to lower values of fluid intelligence, but higher-dimensional topological features such as loops were not considered in that analysis.
In this exploratory analysis, using the R package TDApplied, we utilized persistent homology to find, in one subject’s fMRI data in an emotion task, task-related signal in the form of a spatial loop. We then linked the loop back to the subject’s raw data to interpret what neurological features the loop represented. Finally we showed that topological features in 100 subject’s emotion task fMRI data were correlated with behavior (i.e. the subject’s response time to certain task blocks). While neuroimaging researchers would not consider a single loop within one subject “real” or “significant” without finding a similar loop across multiple subjects, the task of optimally matching loops between datasets is an open problem, so we leave the problem of finding group-level spatial loops to future work. For our analysis we will use data from the famous Human Connectome Project (M. Glasser 2016b) which contains extensively preprocessed neuroimaging data from roughly 1200 subjects. We focused on the HCP emotion task data, which alternated between two conditions - deciding which of two faces matched another target face in emotion, and deciding which of two shapes matched another target shape. We only analyzed the right-to-left phase encoding scan (this is a parameter of MRI imaging which determines the ordering of when image slices of the brain are obtained) as this was the phase encoding direction for the specific subject loop we analyzed. Also, all fMRI data was projected onto surface nodes – points on a mesh of the brain’s surface geometry – which are more comparable across subjects than standard 3D volumes (M. Glasser 2016b). The script used to perform the analysis can be found in the exec directory of TDApplied. Our analysis demonstrates the potential of using TDApplied for deriving interpretable and otherwise obscured insights from real datasets.
We next linked the secondary loop back to the fMRI data from which it came. From the 2D layout of its VR graph we computed a θ variable (angle around the loop, between −π and π) and an r variable (distance to the origin). Using linear models of the form Ai = β1cos (θ) + β2sin (θ) + β3 and Ai = β4r + β5 for the activity A of surface node i, we found that out of the total 91282 surface nodes, the activity of 1475 had a significant relationship with either cos (θ) or sin (θ), and the activity of 53 had a significant relationship with r (significance thresholding was done at the Bonferroni level of approximately 0.05/(2 * 91282) = 2.74 * 10−7). These two sets of nodes only shared one common node.
Here are surface plots of the nodes for the left hemisphere, generated using python, responding to θ and r:
A large cluster of surface nodes responding to θ was found in the left hemisphere (and not in the right hemisphere), which appeared to belong to the brain region Pol1 in the insular cortex. Interestingly, Pol1 was not found to show significant differences in activity between the faces and shapes conditions (M. Glasser 2016a), despite the implication of the insular cortex in emotional processing (Gogolla 2017). This finding suggests that spatial loops of fMRI data may contain complementary information to typical task-based analyses of fMRI data.
A t-test then found a significant difference in mean r values between shape and face blocks:
The main goal of using a summary statistic of neurological data, such as persistence diagrams of spatial activity patterns, is to correlate the statistic with behavior. In the HCP dataset, mean reaction time (in milliseconds) was recorded for all subjects and all tasks, so we checked whether topology was predictive of reaction time for the emotion task. We selected 100 subjects at random, computed their emotion right-to-left phase encoding data persistence diagrams, and computed a one dimensional kernel PCA embedding. We then plotted the relationship between the first embedding dimension and reaction time to shape blocks, which had a correlation of about 0.31:
We then computed statistical significance by Fisher-transforming the sample correlation of 0.31 to obtain a test-statistic of about 0.32, and compared this statistic to a normal null distribution with mean 0 and standard error $1/\sqrt{100-3} \approx 0.1$ (Fisher et al. 1936). The resulting p-value, for a two-sided null hypothesis, would have been less than 0.002, suggesting a significant positive correlation between the topology of neural activity spatial patterns and subject behavior.
We analyzed loops of spatial pattern correlations in HCP emotion fMRI data using TDApplied, and were able to visualize and interpret significant loops of an individual subject, relating them to the study design. Such a result implies a complex structure for the representation of the conditions, a complexity that cannot be captured by linear models assuming clustered distributions. Moreover, our topology embedding, enabled with this software and approach, allowed us to discover a relationship between fMRI patterns and reaction time – effectively, capturing a complex brain-behavior relationship. These results point towards the potential usage of 1-dimensional topology in finding task and/or behavior relationships with fMRI or other neuroimaging modalities, again implying that the shape of neuroimaging data may not be completely captured by traditional analysis methods. Moreover, our analysis hinged on the usage of several functions which are (largely) unique to TDApplied. These results show that TDApplied is the most flexible and useful tool for carrying out meaningful analyses of data.
In our analysis of HCP data we converted correlation values to correlation distance values using the formula $\rho \rightarrow \sqrt{2*(1-\rho)}$. To see why this transformation does produce distance values, let X and Y be two vectors (of the same length n) with 0 mean and unit variance. Then $\rho(X,Y) = \frac{Cov(E,Y)}{\sigma_{x}\sigma_{y}} = \frac{E[(X-0)(Y-0)]}{(1)(1)} = E[XY] = \frac{\sum_{i = 1}^{n}X_i Y_i}{n}$. The Euclidean distance of X and Y is therefore $d(X,Y) = \sqrt{\sum_{i=1}^{n}(X_i-Y_i)^2} = \sqrt{\sum_{i=1}^{n}X_i^2 + \sum_{i=1}^{n}Y_i^2 -2\sum_{i=1}^{n}X_i Y_i} = \sqrt{n + n - 2\rho(X,Y)} = \sqrt{2n(1-\rho(X,Y))} \propto \sqrt{2(1-\rho(X,Y))}$. Therefore, since a scaled distance metric (by a positive number like $\sqrt{n}$) is also a distance metric (this is very easy to verify), the transformation $\rho \rightarrow \sqrt{2*(1-\rho)}$ indeed gives distance values. Note that in (Anderson et al. 2018) a proportional transformation was used $\rho \rightarrow \sqrt{1-\rho} \propto \sqrt{2(1-\rho)}$ and therefore our results are consistent with those found in that work.