Skip to content

Commit f6bc49e

Browse files
committed
Clarify non-span-normalised pca
Fixes #3358
1 parent 104c7ba commit f6bc49e

File tree

1 file changed

+19
-14
lines changed

1 file changed

+19
-14
lines changed

python/tskit/trees.py

Lines changed: 19 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -9303,23 +9303,28 @@ def pca(
93039303
eigenvectors of the genetic relatedness matrix, which are obtained by a
93049304
randomized singular value decomposition (rSVD) algorithm.
93059305
9306-
Concretely, if :math:`M` is the matrix of genetic relatedness values, with
9307-
:math:`M_{ij}` the output of
9308-
:meth:`genetic_relatedness <.TreeSequence.genetic_relatedness>`
9309-
between sample :math:`i` and sample :math:`j`, then by default this returns
9310-
the top ``num_components`` eigenvectors of :math:`M`, so that
9306+
Concretely, take :math:`M` as the matrix of non-span-normalised
9307+
branch-based genetic relatedness values, for instance obtained by
9308+
setting :math:`M_{ij}` to be the :meth:`~.TreeSequence.genetic_relatedness`
9309+
between sample :math:`i` and sample :math:`j` with ``mode="branch"``,
9310+
``proportion=False`` and ``span_normalise=False``. Then by default this
9311+
returns the top ``num_components`` eigenvectors of :math:`M`, so that
93119312
``output.factors[i,k]`` is the position of sample `i` on the `k` th PC.
9312-
If ``samples`` or ``individuals`` are provided, then this does the same thing,
9313-
except with :math:`M_{ij}` either the relatedness between ``samples[i]``
9314-
and ``samples[j]`` or the nodes of ``individuals[i]`` and ``individuals[j]``,
9315-
respectively.
9313+
If ``samples`` or ``individuals`` are provided, then this does the same
9314+
thing, except with :math:`M_{ij}` either the relatedness between
9315+
``samples[i]`` and ``samples[j]`` or the average relatedness between the
9316+
nodes of ``individuals[i]`` and ``individuals[j]``, respectively.
9317+
Factors are normalized to have L2 norm 1, i.e.,
9318+
``output.factors[:,k] ** 2).sum() == 1)`` for any ``k``.
93169319
93179320
The parameters ``centre`` and ``mode`` are passed to
9318-
:meth:`genetic_relatedness <.TreeSequence.genetic_relatedness>`;
9319-
if ``windows`` are provided then PCA is carried out separately in each window.
9320-
If ``time_windows`` is provided, then genetic relatedness is measured using only
9321-
ancestral material within the given time window (see
9322-
:meth:`decapitate <.TreeSequence.decapitate>` for how this is defined).
9321+
:meth:`~.TreeSequence.genetic_relatedness`: the default ``centre=True`` results
9322+
in factors whose elements sum to zero; ``mode`` currently only supports the
9323+
``"branch"`` setting. If ``windows`` are provided then PCA is carried out
9324+
separately in each genomic window. If ``time_windows`` is provided, then genetic
9325+
relatedness is measured using only ancestral material within the given time
9326+
window (see :meth:`decapitate <.TreeSequence.decapitate>` for how this is
9327+
defined).
93239328
93249329
So that the method scales to large tree sequences, the underlying method
93259330
relies on a randomized SVD algorithm, using

0 commit comments

Comments
 (0)