Encrypting DNA – A Deep Dive

Undisclosed DNA’s unique and patented DNA solution is based on some mathematical and cryptographical technologies. In this section, we will delve deeper into the science and maths behind our technology to explain how it works.


Undisclosed DNA’s solution is an elegantly simple answer to a complex problem

from base pairs to genetic addresses

A person’s whole genome sequence [WGS] contains about 3 billion base pairs

At heart, DNA sequences to be compared actually have a genetic address extracted from them and the genetic addresses are compared.

In some circumstances, the genetic address could even be extracted by the individual whose DNA it is, on a device of their own.
Only the genetic address itself, containing no sequenced DNA, is shared for comparison.

In another case, such as the NHS’s DNA sequencing of every Newborn in the UK:
There will be many many parties who apply to obtain a copy of the DNA data, for legitimate screening purposes, but the issue is to make illegitimate screening impossible as well as making the accidental disclosure of sequenced DNA impossible.
Illegitimate screening is prevented by having legitimate-context-specific genetic addresses created and that is what is released, instead of the DNA sequence itself.

That which is not held, cannot be leaked / lost / stolen / hacked / accidently-sent-to-the-wrong-person / left-on-a-train !
Every genetic address of focussed for its specific contextual use case, be that familial matching of genetic screening for something specific.

There are 4 types of Genetic Key:  

3 x “Verbatim”

  • One similar, based on the Y Chromosome.
  • One, based on the X Chromosome.
  • One slightly different in generation, due to size,
    based on the Mitochondria.

1 x “Unique”

  • Quite different in generation, structure & algorithm,
    based on Chromosomes 1 to 22.

When the sequence is extracted it is classically held in a BAM file format, or equivalent.

It’s about genetic distance

Each of us receive about ½ of our DNA from our mother & about ½ from our father,  & we may also receive the occasional mutated base-pair.

Distinct from the numbered chromosomes, Mitochondria, Y & many X chromosomes tend to be inherited verbatim.

As a result, we can count the evolutionary steps, measuring the genetic distance between each of us, even without disclosing the actual DNA sequence.

Our algorithm breaks down into distinct operations:

Converting a .bam file containing a persons sequenced DNA into a .kad file containing that person’s DNA-Address & their DNA-cryptographic keys. The rest of the code can work with either, it’s just about subsequent load times.

Extracting a person’s DNA-Address, from their sequenced DNA.

Extracting a person’s DNA keys, from their sequenced DNA.

The Maths behind it

Mathematical proof of the irreversibility of the DNAddress

Foundationally we need to:

Provide a mathematical proof that given the sum of twelve million integers, [< ¼ of the length of the Y chromosome] each in the range of zero to three, [G, C, A & T] it is not possible to determine the original list.

Step 1: Understanding the Problem

We are tasked with proving that given the sum of twelve million integers, each in the range of 0 to 3, it is not possible to determine the original list. This involves showing that multiple lists can have the same sum.

Step 2: Calculating the Range of Possible Sums

The minimum possible sum for a list of twelve million integers, each in the range 0 to 3, is 0 (if all numbers are 0). The maximum possible sum is 12,000,000 times 3 = 36,000,000

Step 3: Demonstrating Non-Uniqueness

Let’s consider two different lists that could potentially have the same sum. For simplicity, let’s aim for a sum of 12,000,000 (which is 1 times 12,000,000).

“List 1: [1, 1, 1, …, 1] (twelve million times)
Sum of List 1 = 12,000,000″

“List 2: [0, 0, 0, …, 0, 2, 2, 2, …, 2] where the number of 2s is 6,000,000 and the rest are 0s.
Sum of List 2 = 6,000,000 times 2 = 12,000,000″

Step 4: Conclusion

Since List 1 and List 2 are different but have the same sum (12,000,000$), this demonstrates that knowing the sum of twelve million integers, each in the range 0 to 3, is not enough to determine the original list. There can be multiple combinations of numbers that result in the same sum.

Step 5: Generalising the Proof

Given the vast number of possible combinations (4^{12,000,000}) and the range of possible sums (0 to 36,000,000), it’s clear that many different lists can yield the same sum, making it impossible to determine the original list from the sum alone.

The Genetic Address

* Note: In this illustration, reference is made to M, X and Y, as that renders the simplest formula.  The Y can be replaced with X’ or X/Y, which is necessary in the context of XX and then the formula introduces an additional term, addressing which Xs to compare, still without sequence disclosure.

Undisclosed DNA’s unique and patented DNA solution is based on some mathematical and cryptographical technologies. In this section, we will delve deeper into the science and maths behind our technology to explain how it works.


Undisclosed DNA’s solution is an elegantly simple answer to a complex problem

from base pairs to genetic addresses

A person’s whole genome sequence [WGS] contains about 3 billion base pairs

At heart, DNA sequences to be compared actually have a genetic address extracted from them and the genetic addresses are compared.

In some circumstances, the genetic address could even be extracted by the individual whose DNA it is, on a device of their own.

Only the genetic address itself, containing no sequenced DNA, is shared for comparison.

In another case, such as the NHS’s DNA sequencing of every Newborn in the UK:

There will be many many parties who apply to obtain a copy of the DNA data, for legitimate screening purposes, but the issue is to make illegitimate screening impossible as well as making the accidental disclosure of sequenced DNA impossible.

Illegitimate screening is prevented by having legitimate-context-specific genetic addresses created and that is what is released, instead of the DNA sequence itself.

That which is not held, cannot be leaked / lost / stolen / hacked / accidently-sent-to-the-wrong-person / left-on-a-train !

Every genetic address of focussed for its specific contextual use case, be that familial matching of genetic screening for something specific.

There are 4 types of Genetic Key:  

3 x “Verbatim”

  • One similar, based on the Y Chromosome.
  • One, based on the X Chromosome.
  • One slightly different in generation, due to size,
    based on the Mitochondria.

1 x “Unique”

  • Quite different in generation, structure & algorithm,
    based on Chromosomes 1 to 22.

When the sequence is extracted it is classically held in a BAM file format, or equivalent.

It’s about genetic distance

Each of us receive about ½ of our DNA from our mother & about ½ from our father,  & we may also receive the occasional mutated base-pair.

Distinct from the numbered chromosomes, Mitochondria, Y & many X chromosomes tend to be inherited verbatim.

As a result, we can count the evolutionary steps, measuring the genetic distance between each of us, even without disclosing the actual DNA sequence.

Our algorithm breaks down into distinct operations:

Converting a .bam file containing a persons sequenced DNA into a .kad file containing that person’s DNA-Address & their DNA-cryptographic keys. The rest of the code can work with either, it’s just about subsequent load times.

Extracting a person’s DNA-Address, from their sequenced DNA.

Extracting a person’s DNA keys, from their sequenced DNA.

The Maths behind it

Mathematical proof of the irreversibility of the DNAddress

Foundationally we need to:

Provide a mathematical proof that given the sum of twelve million integers, [< ¼ of the length of the Y chromosome] each in the range of zero to three, [G, C, A & T] it is not possible to determine the original list.

Step 1: Understanding the Problem

We are tasked with proving that given the sum of twelve million integers, each in the range of 0 to 3, it is not possible to determine the original list. This involves showing that multiple lists can have the same sum.

Step 2: Calculating the Range of Possible Sums

The minimum possible sum for a list of twelve million integers, each in the range 0 to 3, is 0 (if all numbers are 0). The maximum possible sum is 12,000,000 times 3 = 36,000,000

Step 3: Demonstrating Non-Uniqueness

Let’s consider two different lists that could potentially have the same sum. For simplicity, let’s aim for a sum of 12,000,000 (which is 1 times 12,000,000).

“List 1: [1, 1, 1, …, 1] (twelve million times)
Sum of List 1 = 12,000,000″

“List 2: [0, 0, 0, …, 0, 2, 2, 2, …, 2] where the number of 2s is 6,000,000 and the rest are 0s.
Sum of List 2 = 6,000,000 times 2 = 12,000,000″

Step 4: Conclusion

Since List 1 and List 2 are different but have the same sum (12,000,000$), this demonstrates that knowing the sum of twelve million integers, each in the range 0 to 3, is not enough to determine the original list. There can be multiple combinations of numbers that result in the same sum.

Step 5: Generalising the Proof

Given the vast number of possible combinations (4^{12,000,000}) and the range of possible sums (0 to 36,000,000), it’s clear that many different lists can yield the same sum, making it impossible to determine the original list from the sum alone.

The Genetic Address

* Note: In this illustration, reference is made to M, X and Y, as that renders the simplest formula.  The Y can be replaced with X’ or X/Y, which is necessary in the context of XX and then the formula introduces an additional term, addressing which Xs to compare, still without sequence disclosure.