Cycles in family tree software
I am the developer of some family tree software (written in C++ and Qt). I had no problems until one of my customers mailed me a bug report. The problem is that he has two children with his own daughter, and, as a result, he can't use my software because of errors.

Those errors are the result of my various assertions and invariants about the family graph being processed (for example, after walking a cycle, the program states that X can't be both father and grandfather of Y).

How can I resolve those errors without removing all data assertions?
improve this question | comment
Ena Bins DVM Created at: 2013-11-13 17:07:29 UTC By Ena Bins DVM
Sounds like you should limit the sale of the software to those that avoid getting in to tricky family situations!  How does one have children with his own daughter - I hope you're talking about his daughter-in-law! - Curt Nikolaus
You should obviously write your software with Ray Stevens' song in mind. - Camren Bednar
This might be one of those cases where you need to ask yourself: Do I really want to make business with that guy?  Another solution would be to press criminal charges against him. Incest is forbidden in most of the world, after all. Finally, your software is broken anyway, because you can (legally) have cycles in a family tree: cousins are allowed to marry in most (all?) western countries. - Camille Kassulke
Time for Oedipal Edition. Charge extra. - Russ Hayes
What kind of father lets his daughter date a man who is old enough to be her father. - Miss Berniece Prohaska
18 Answers
Another technical workaround:

Incorporate this man ("Daddy") with his daughter ("Daughter") as one node. For present purposes, this is Daddy-Daughter, Incorporated, henceforth DDI.

DDI has a father (Daddy's father) and two mothers (Daddy's mother and Daughter's mother). Surely a person may have two mothers... well, anyway...

Neither Daddy's marriage to Daughter's mother nor the descent of Daughter from that marriage are modeled explicitly. All information that is normally modeled explicitly for each person but that is not modeled explicitly for Daddy or for Daughter is explained textually in the DDI record.

DDI has two children (Daughter's with Daddy) from a "marriage" to a(nother) fictitious person whose identity remains unspecified.

This workaround is not nice and it gives up some functionality, but I think you'll agree that it preserves a certain amount of functionality and all of the information until a full solution may arrive. Until then, it places the burden of supporting the exceptional case on the maintainer of exceptional data (customer) rather than on the maintainer of the code (developer).

In applications other than family trees, this strategy of modeling isolated cases, in which some individual nodes are connected by unsupported relations, as a corporate node may be pleasant enough.
Another mock serious answer for a silly question:

The real answer is, use an appropriate data structure. Human genealogy cannot fully be expressed using a pure tree with no cycles. You should use some sort of graph. Also, talk to an anthropologist before going any further with this, because there are plenty of other places similar errors could be made trying to model genealogy, even in the most simple case of "Western patriarchal monogamous marriage." 

Even if we want to ignore locally taboo relationships as discussed here, there are plenty of perfectly legal and completely unexpected ways to introduce cycles into a family tree.

For example:

Basically, cousin marriage is not only common and expected, it is the reason humans have gone from thousands of small family groups to a worldwide population of 6 billion. It can't work any other way.

There really are very few universals when it comes to genealogy, family and lineage. Almost any strict assumption about norms suggesting who an aunt can be, or who can marry who, or how children are legitimized for the purpose of inheritance, can be upset by some exception somewhere in the world or history.
You should focus on what really makes value for your software. Is the time spent on making it work for ONE consumer worth the price of the license ? Likely not.

I advise you to apologize to this customer, tell him that his situation is out of scope for your software and issue him a refund.
You should have set up the Atreides family (either modern, Dune, or ancient, Oedipus Rex) as a testing case. You don't find bugs by using sanitized data as a test case.
It seems you (and/or your company) have a fundamental misunderstanding of what a family tree is supposed to be. 

Let me clarify, I also work for a company that has (as one of its products) a family tree in its portfolio, and we have been struggling with similar problems.

The problem, in our case, and I assume your case as well, comes from the GEDCOM format that is extremely opinionated about what a family should be. However this format contains some severe misconceptions about what a family tree really looks like.

It contains many issues, such as incompatibility with same sex relations, incest, etc... Which in real life happens more often than you'd imagine (especially when going back in time to the 1700-1800).

We have modeled our family tree to what happens in the real world: Events (for example, births, weddings, engagement, unions, deaths, adoptions, etc.). We do not put any restrictions on these, except for logically impossible ones (for example, one can't be one's own parent, relations need two individuals, etc...)

The lack of validations gives us a more "real world", simpler and more flexible solution.

As for this specific case, I would suggest removing the assertions as they do not hold universally.

For displaying issues (that will arise) I would suggest drawing the same node as many times as needed, hinting at the duplication by lighting up all the copies on selecting one of them.
Relax your assertions.

Not by changing the rules, which are mostly likely very helpful to 99.9% of your customers in catching mistakes in entering their data.

Instead, change it from an error "can't add relationship" to a warning with an "add anyway".
This is one of the reasons why languages like "Go" do not have assertions. They are used to handle cases that you probably didn't think about, all too often. You should only assert the impossible, not simply the unlikely. Doing the latter is what gives assertions a bad reputation. Every time you type assert(, walk away for ten minutes and really think about it.

In your particularly disturbing case, it is both conceivable and appalling that such an assertion would be bogus under rare but possible circumstances. Hence, handle it in your app, if only to say "This software was not designed to handle the scenario that you presented".

Asserting that your great, great, great grandfather being your father as impossible is a reasonable thing to do.

If I was working for a testing company that was hired to test your software, of course I would have presented that scenario. Why? Every juvenile yet intelligent 'user' is going to do the exact same thing and relish in the resulting 'bug report'.
I hate commenting on such a screwed up situation, but the easiest way to not rejigger all of your invariants is to create a phantom vertex in your graph that acts as a proxy back to the incestuous dad.
So, I've done some work on family tree software. I think the problem you're trying to solve is that you need to be able to walk the tree without getting in infinite loops - in other words, the tree needs to be acyclical.

However, it looks like you're asserting that there is only one path between a person and one of their ancestors. That will guarantee that there are no cycles, but is too strict.  Biologically speaking, descendancy is a directed acyclic graph (DAG). The case you have is certainly a degenerate case, but that type of thing happens all the time on larger trees.  

For example, if you look at the 2^n ancestors you have at generation n, if there was no overlap, then you'd have more ancestors in 1000 AD than there were people alive. So, there's got to be overlap.

However, you also do tend to get cycles that are invalid, just bad data. If you're traversing the tree, then cycles must be dealt with. You can do this in each individual algorithm, or on load. I did it on load.

Finding true cycles in a tree can be done in a few ways. The wrong way is to mark every ancestor from a given individual, and when traversing, if the person you're going to step to next is already marked, then cut the link. This will sever potentially accurate relationships. The correct way to do it is to start from each individual, and mark each ancestor with the path to that individual. If the new path contains the current path as a subpath, then it's a cycle, and should be broken. You can store paths as vector<bool> (MFMF, MFFFMF, etc.) which makes the comparison and storage very fast.

There are a few other ways to detect cycles, such as sending out two iterators and seeing if they ever collide with the subset test, but I ended up using the local storage method.
I guess that you have some value that uniquely identifies a person on which you can base your checks.

This is a tricky one. Assuming you want to keep the structure a tree, I suggest this:

Assume this: A has kids with his own daughter.

A adds himself to the program as A and as B. Once in the role of father, let's call it boyfriend.

Add a is_same_for_out() function which tells the output generating part of your program that all links going to B internally should be going to A on presentation of data.

This will make some extra work for the user, but I guess IT would be relatively easy to implement and maintain.

Building from that, you could work on code synching A and B to avoid inconsistencies.

This solution is surely not perfect, but is a first approach.
Potential legal implications aside, it certainly seems that you need to treat a 'node' on a family tree as a predecessor-person rather than assuming that the node can be the-one-and-only person.

Have the tree node include a person as well as the successors - and then you can have another node deeper down the tree that includes the same person with different successors.
A few answers have shown ways to keep the assertions/invariants, but this seems like a misuse of assertions/invariant. Assertions are to make sure something that should be true is true, and invariants are to make sure something that shouldn't change doesn't change.

What you're asserting here is that incestuous relationships don't exist. Clearly they do exist, so your assertion is invalid. You can work around this assertion, but the real bug is in the assertion itself. The assertion should be removed.
Your family tree should use directed relations. This way you won't have a cycle.
The most important thing is to avoid creating a problem, so I believe that you should use a direct relation to avoid having a cycle.

As @markmywords said, #include "fritzl.h".

Finally I have to say recheck your data structure. Maybe something is going wrong over there (maybe a bidirectional linked list solves your problem).
Instead of removing all assertions, you should still check for things like a person being his/her own parent or other impossible situations and present an error. Maybe issue a warning if it is unlikely so the user can still detect common input errors, but it will work if everything is correct.

I would store the data in a vector with a permanent integer for each person and store the parents and children in person objects where the said int is the index of the vector. This would be pretty fast to go between generations (but slow for things like name searches). The objects would be in order of when they were created.
Genealogical data is cyclic and does not fit into an acyclic graph, so if you have assertions against cycles you should remove them.

The way to handle this in a view without creating a custom view is to treat the cyclic parent as a "ghost" parent. In other words, when a person is both a father and a grandfather to the same person, then the grandfather node is shown normally, but the father node is rendered as a "ghost" node that has a simple label like ("see grandfather") and points to the grandfather.

In order to do calculations you may need to improve your logic to handle cyclic graphs so that a node is not visited more than once if there is a cycle.
Duplicate the father (or use symlink/reference).

For example, if you are using hierarchical database:

$ #each person node has two nodes representing its parents.
$ mkdir Family
$ mkdir Family/Son
$ mkdir Family/Son/Daughter
$ mkdir Family/Son/Father
$ mkdir Family/Son/Daughter/Father
$ ln -s Family/Son/Daughter/Father Family/Son/Father
$ mkdir Family/Son/Daughter/Wife
$ tree Family
└── Son
    ├── Daughter
    │   ├── Father
    │   └── Wife
    └── Father -> Family/Son/Daughter/Father

4 directories, 1 file

Here's the problem with family trees: they are not trees. They are directed acyclic graphs or DAGs. If I understand the principles of the biology of human reproduction correctly, there will not be any cycles.

As far as I know, even the Christians accept marriages (and thus children) between cousins, which will turn the family tree into a family DAG.

The moral of the story is: choose the right data structures.
Your Answer