Non-destructively sequencing gametes by sequencing meiotic cousins

Previous work: Gwern

My lay understanding is that it's not known how to sequence gametes without destroying them. If we could non-destructively sequence gametes, it would be easier to e.g. screen for genetic illnesses. Instead of fertilizing a few eggs, which are expensive to acquire, and then sequencing the resulting embryos, we could produce many sperms, find acceptable sperms, and then use those to fertilize the few available eggs.

This post describes a way to sequence gametes non-destructively, assuming that it's possible to sequence gametes destructively. I lack lots of basic biological knowledge, so I can't verify that this idea makes sense, would work, or would be feasible or efficient. I hope others who are more informed can check and use the idea. I'll focus on sperm for clarity; I don't know whether / how this might extend to ova.

1. Summary of idea: sperm cousins form complementary pairs

In short the idea goes like this: sequence an ordinary somatic cell. Then isolate a primary spermatocyte. Induce meiosis I and II, producing four sperm. Select one sperm. Sequence the other three sperm. Two of the other sperm will form a complementary pair, adding up to a single full set of the chromatids from the somatic cell. The third non-selected sperm's genome is then the complement of the selected sperm's genome, so the selected sperm is still intact and its genome is known as the complement of the third sperm's genome relative to the somatic genome.

[Update: due to the calculations here, this is far less powerful than it seemed naively to me.]

2. Complementary pairs in more detail

(The following line of reasoning ignores the Y chromosome for simplicity. I don't know if it still works for that chromosome. It seems like most benefits could be mostly gotten by sequencing the other two 22 chromosomes of a sperm, and if the idea works otherwise the Y chromosome can be figured out later.)

Spermatogenesis, schematically:

Description of spermatogenesis

Summary table lightly modified from (Wiki):


_________________________________________________________________________________________
| Cell type                  | ploidy/chromosomes | copy #/chromatids | Process entered |
| spermatogonium (Ad, Ap, B) | diploid (2N) / 46  | 2C / 46           | mitosis         |
| primary spermatocyte       | diploid (2N) / 46  | 4C / 2x46         | meiosis I       |
| 2 secondary spermatocytes  | haploid (N) / 23   | 2C / 2x23         | meiosis II      |
| 4 spermatids               | haploid (N) / 23   | C / 23            | spermiogenesis  |
| 4 functional spermatozoids | haploid (N) / 23   | C / 23            | spermiation     |
————————————————————————————————————————————————————————————————————————————————————————-

Spermatogenesis goes like this (taken from (Wiki) and (Rooij, Russell, 2000)):

  1. A_s spermatogonium. There's a stem cell spermatogonium. Genetically, this is like a normal cell: it has 2 copies of each of 23 chromosomes (each one a single chromatid, shaped like a bar or a mass), one from the father and one from the mother. When an A_s spermatogonium splits, the daughter cells sometimes become A_s spermatogonium again, replenishing the stem cell population, and some differentiate to non-stem cell A_pr spermatogonium committed to the path to spermatogenesis.

  2. B spermatogonium. The A_pr spermatogonia divide a number of times, through A_al / A_n / In spermatogonia, and finally to B spermatogonia. These are genetically like A_s spermatogonia.

  3. B spermatogonium mitosis. The B spermatogonium divides mitotically into two primary spermatocytes. A spermatocyte starts off still genetically like a normal cell, with 2 copies of each of 23 chromosomes.

  4. Chromatid copying. The primary spermatocyte prepares for meiosis I by copying each of its bar-shaped chromosomes, producing X-shaped chromosomes made of two identical chromatids connected by a centromere. It now has, for each genetic locus, 2 pairs of identical copies of alleles, where the alleles are homologous to but possibly different from each other. That is, it has 4 homologous DNA segments for each genetic locus: one per chromatid, two chromatids per chromosome, two chromosomes (of each of the 23 kinds of chromosomes).

  5. Crossover. Chromatids in homologous chromosomes exchange segments, creating chromatids with some segments from the maternal genome and some from the paternal genome.

  6. Meiosis I. The primary spermatocyte meiotically divides, creating two secondary spermatocytes. Each secondary spermatocyte has one X-shaped, two-chromatid chromosome of each kind, 23 total.

  7. Meiosis II. The chromatids in each chromosome unlink, so that each secondary spermatocyte has two single chromatids of each of 23 kinds. Then each secondary spermatocyte meiotically divides, creating two spermatids. Each spermatid has a single chromatid of each of 23 kinds.

  8. Maturation. The four spermatids mature into sperm.

Description of spermatogenesis, tracking one chromosome

If we track a single chromosome, assuming a single crossover point, spermatogenesis looks like this:

  1. ( M P )

B spermatogonium. Two separate copies of the chromosome, one maternal chromatid M and one paternal chromatid P.

  1. ( M P )

B spermatogonium mitosis. Focusing on one spermatocyte, there's still M and P.

  1. ( M'-M      P'-P )

Chromatid copying. M and M' are chromatids connected by a centromere, identical except possibly for copying-error mutations. Likewise P and P'. (All four are homologous.)

  1. ( M'-M₁/P₂     P'-P₁/M₂ )

Crossover. The tail M₂ of chromatid M swaps with the homologous tail P₂ of chromatid P. M'-M becomes M'-M₁/P₂, and P'-P becomes P‑P₁/M₂.

  1. ( M'-M₁/P₂ )      ( P'-P₁/M₂ )

Meiosis I. The two X-shaped chromosomes go into separate secondary spermatocytes.

  1. ( M' )      ( M₁/P₂ )      ( P' )      ( P₁/M₂ )

Meiosis II. Each X-shaped chromosome splits into two separate chromatids which go into separate spermatids, which develop into sperm.

3. Necessity and sufficiency of sequencing 3 of 4 grandchild sperm

Roughly speaking, to determine the genome of a sperm without sequencing it, it's both necessary and sufficient to sequence three out of the four sperm descended from the same primary spermatocyte as the given sperm.

Necessity

To know the genome of a selected sperm, it's necessary to sequence the sibling: otherwise there's no way to disambiguate our selected genome from the sibling's genome. And, it's necessary to sequence both cousins. Otherwise it's likely that the selected sperm and its unsequenced cousin crossed over with each other, i.e. they have at least one pair of chromosomes standing in relation to each other like (M₁/P₂) and (P₁/M₂). In that case, there's no way to know where the M₁ / M₂ division lies; even if we know M and P, we don't know where the crossover happened in our selected sperm.

(Speaking more precisely, it's not necessary to completely sequence the genomes of the two sperm not complementary to our selected sperm, as long as we've seen enough to piece together a full knowledge of M and P and enough to know which is the unpaired sperm. If we somehow knew that there was only one crossover event, and we sequenced a somatic cell, then we could sometimes know the genome of a sperm just by sequencing its sibling sperm: if we happened to sequence (M₁/P₂), we'd know that the remaining sibling is (M'). In real life there are multiple crossover events.)

Sufficiency

To make it clear how we can deduce the genome of the selected sperm, suppose that we do sequence an ordinary somatic cell, so that we know (M P). By sequencing the two cousin sperm, we know the set of genomes of {selected sperm, selected sperm's sibling}: if a cousin has (M₁/P₂), then one of the siblings has (P₁/M₂). Taking complements in this way for all chromosomes simultaneously gives us two full genomes. Sequencing the sibling sperm tells us that the other of the two cousin-complement genomes is the genome of our selected sperm. Note that it's not necessary to keep track of which sperm is the sibling: the sibling will be the complement of one of the two cousins, and the other cousin determines the selected sperm's genome.

Now, mainly just to clarify the situation from another angle (sequencing a somatic cell is cheap): it's not necessary to sequence a somatic cell. Say we see three sperm like:

( M₁/P₂/P₃ )      ( P₁/M₂/M₃ )      ( M₁/M₂/P₃ )

Then we know that the fourth is (P₁/P₂/M₃). This shows the simple procedure for reading off the selected sperm's genome: at each locus, see which base pair appears twice in the other three sperm. The third base pair is then the base pair of the selected sperm. (Including if all three are the same.)

Complementary sequencing step-by-step

  1. Isolate a primary spermatocyte.
  2. Induce meiosis I and II, producing four sperm.
  3. Select one sperm, S.
  4. Sequence the other three sperm, giving three genomes G₁, G₂, and G₃.
  5. Deduce that S has genome G equal to G₁ + G₂ + G₃, with base-pair-wise addition, regarding each base pair as a one-hot vector in 𝔽₂⁴. That is, base pair N of S's genome G is the odd base out in the set of the Nth base pairs in G₁, G₂, and G₃, or if those three base pairs are the same, it's just that base pair.
  6. Repeat 1-5 until some S is found to have an acceptable genome G.

4. Challenges

It seems like a major challenge would be that it's necessary to know which sperm are the other three grandchildren to a selected sperm, i.e. the sperm that came from the same primary spermatocyte. Maybe it's possible to extract spermatocytes and isolate them individually, and then induce meiosis I and II? This seems like a major bottleneck to scaling to searching over millions of sperm. It seems (to my uninformed glance) like the isolation might have to happen exactly at the primary spermatocyte stage: before that point, the non-A_pr spermatogonia in a clone descendent from a single A_pr spermatogonia are coupled by bridges and regulate each other's development, maybe indispensibly.

It might be hard to induce and support spermatogenesis in vitro, and that might imply significant scaling costs. There might be detrimental effects of in vitro spermatogenesis, e.g. epigenetic differences from in vivo sperm. Sequencing sperm (destructively) might not be feasible or cost-efficient.

Laura Bailey via Ivo Andrews points out that oogenesis should be amenable to complementary sequencing. In female embryos, primary oocytes are halted in the meiosis I phase and kept halted until they're stimulated to mature in adulthood. At that point they produce one ovum and 3 polar bodies carrying the complementary DNA, which are usually discarded. So oogenesis is already naturally a single-progenitor process, unlike spermatogenesis which has coupled clones up until spermatocytogenesis.

5. Acknowledgements

Thanks to Ivo Andrews (supported by PIBBSS) for discussion, information, and notes on a draft, and Laura Bailey for information. Also thanks to Sam Eisentstat for conversations about these ideas.

6. Appendix: a hare-brained scaling scheme

It may be prohibitively expensive to separately track sets of four cousin sperm. This may not affect applications much due to sharply diminishing returns (samples needed to get N standard deviations out scales as e^(N²)), but I don't know how big a factor that is (i.e. how many sperm one would want to sequence) because I haven't done the calculations. [Update update: the update in the next set of braces is incorrect. Or more precisely only applies to a sample of sperm where each sperm is sampled randomly from the sperm of the whole population. The variance within the sperm of one donor is about 1/6 of the variance in sperm from the whole population. Likewise for the eggs.] [Update (FALSE): see this post for the math. The upshot is that selecting the best (or top 10) sperm out of $10^6$ and then combining that with 10 eggs and choosing the best embryo should give in the ballpark of $(4.8 + 1.3)/\sqrt{2} \approx 4.3$ SDs of real selection power. Correcting for a PGS explaining 1/5 the variance gives $4.3/\sqrt{5} \approx 1.9$ SDs of effective selection power.]

Update September 2022 on scaling with Red-Green GFPs

This is a speculative idea to non-destructively genomically sort sperm. I'll describe it tersely but if someone's interested, let me know (tsvibtcontact@gmail.com). The idea is to use GFP (or some other visual cue) to visually identify whether crossover has happened near a point where you want crossover to happen. Then you sort sperm based on how many of the desired visual cues they have.

In a bit more detail, say your sperm donor has paternal chromosome 1 AA' and maternal chromosome 1 BB'. You want a sperm that has chromosome 1 AB'. So you get a sperm sample and you tag near the right end of A with tag 1, like AT₁, and near the left end of B' like T₂B'. If a sperm has both tags T₁ and T₂, it looks roughly like AT₁T₂B' (sperm usually have one crossover per chromosome, IIUC, as distinct from eggs). So it's approximately your desired chromosome. Do this for all the chromosomes and then somehow sort by how many tags they have, choosing the ones with the most tags, e.g. simply sort by brightness. A major issue with this is that CRISPRing a bunch of GFPs would almost certainly seriously damage some of the DNA. Maybe this can be fixed somehow, e.g. by not actually inserting genes and instead just binding some visual cues.

Here's a [previous] very speculative idea to complementarily sequence many sperm in one batch:

Short high-level version

  1. Collect a large sample of spermatocytes.
  2. Tag all the spermatocytes somehow with unique tags, in a way that will propagate through meiosis to sperm. (This seems like the most sci-fi part.)
  3. Induce spermatogenesis.
  4. Hold out some sperm.
  5. Sequence the remaining sperm.
  6. Find sets of 3 sperm with the same tag among those you sequenced. Use those sequences to deduce the genome of one of the sperm in your holdout set.
  7. Select a desired genome among those you've deduced.
  8. Somehow isolate the sperm with the corresponding tag. (This also seems sci-fi.)

I don't know how this stuff works, so I don't know if these steps are feasible. The point would be to make in vitro spermatogenesis and sequencing much cheaper by having step 3 and/or step 5 be doable in a single batch, hence much more cheaply.

More detailed scheme

  1. Collect a large sample of spermatocytes, before they've copied their chromatids.
  2. Use CRISPR to insert a unique 20-base-pair tag into position 1 on most chromatids in most spermatocytes.
  3. Induce spermatogenesis.
  4. Hold out 1/4 of the resulting sperm. (This maximizes the probability 4×f×(1-f)³ that a given set of four sperm has exactly one member in the holdout set.)
  5. Sequence the remaining sperm.
  6. Sequence a somatic cell.
  7. Find pairs of sequenced sperm that are complementary in the somatic genome, e.g. that look like { (M₁/P₂/P₃), (P₁/M₂/M₃) }. (And also which have both been tagged.)
  8. Find a sequenced sperm with either the same tag as P₁ or the same tag as M₁, e.g. (M₁/M₂/P₃), such that its complement is not among the sequenced genomes.
  9. Deduce that there's a sperm in your holdout set with the same tag T as P₁ and the genome (P₁/P₂/M₃). This happens many times: if the tagging step is completely successful, then the holdout sperm's genomes are deducible at a rate of roughly 4×(1/4)×(1-1/4)³ = (3/4)³ = 27/64 ≈ 0.42.
  10. Select a desired genome among those you've deduced.
  11. Somehow isolate the sperm with the corresponding tag.
  12. If needed, use CRISPR to remove the tag so it doesn't have adverse effects.

Possible ways this wouldn't work:

  • We can't use CRISPR to insert lots of unique tags.

  • We can't select sperm by its tags.

  • Sequencing just gets confused. If the sequencing method works by making lots of copies of lots of short-ish segments of DNA and then reading those, we'll get a picture of where the crossover events happened, but we won't know which ones happened on the same chromosomes. Maybe it's fine though? IDK how sequencing does or can work (can we easily clone sperm genomes?). If this is a problem, then step 5 could including separating the sperm and sequencing them individually, which seems much less difficult than separating individual spermatocytes before meiosis.

  • Sequencing requires cloning and we can't clone sperm.

  • Chromosomes dissociated. We'll get a bag of chromosomes, but not a bag of full genomes. Maybe this could be fixed by somehow connecting the chromosomes by strands of DNA??

  • The tags are harmful.