The Family Tree Software Series

Gremlin Traversals: Don’t get lost in the route #5

An article to explore graph traversals and common pitfalls

gksriharsha

--

Source. Right direction is a must to avoid getting lost.

This is the fifth article about building genealogy software using a graph database. Previously in this series of articles, I have shared some CRUD operation commands that are used to add Vertices and Edges into the graph database. In this article, I would be going over the traversals. This is the differentiating characteristic between the SQL and graph databases.

Relationships

A typical family represented in our storage model.

The graph of a typical family having two parents with two kids is shown on the right. This is the model schema similar to which all the relations are stored in the database. If one asks how are Alice and Charlie related. It is easy, right? Alice is the mother of Charlie.

In this schema of storing data, there are always multiple correct answers such as Alice is the wife of Charlie’s father Bob or Alice is the mother of Charlie’s brother Doug. Therefore we should know how to traverse through the right set of relations.

dedup() and paths

dedup() is a function that avoids all cyclic paths between the starting and destination vertices. In the above example, the path Alice — Charlie is direct while Alice — Charlie — Doug and Alice — Bob — Charlie is cyclic. Therefore if we use dedup, we need not worry about such relations. Unfortunately due to this schema of storing the relations, we have to use dedup for all traversals because of the existence of reverse relations. The traversal query, therefore, becomes the following:

path = g.V().has(‘Name’,’Alice’).repeat(out().dedup()).until(has(‘Name’,’Charlie’)).path().fold().next()

Let’s break down the command into parts.

  • g.V().has('Name','Alice') lists all the nodes (g.V() part) and searches the one that has the Name attribute as Alice.
  • repeat(out().dedup()).until(has('Name','Charlie')) performs the traversal from the start vertex (Alice) and proceeds in the outward direction from the current vertex. This outward traversal is made sure not to be cyclic and until a vertex has the name Charlie is reached. If there is no vertex with the name Charlie, the rest of the query returns empty and the next() command throws a StopIteration Error.
  • path().fold().next() gets the path followed by the traverser from the starting vertex to the final vertex. Fold command puts all the vertices into a list and the next command returns the “next” item which is the entire path (as a list).

The above command only shows the list of vertices in the path =[v[1],v[2]]. The crucial information about the edges are missing therefore the query must be modified as:

path = g.V().has('Name','Alice').repeat(outE().otherV().dedup()).until(has('Name','Charlie')).path().by('Name').by(label).fold().next()

The change in out() to outE().otherV() means that instead of simply traversing to the vertex that is connected by the outward edge from this vertex, it lists the outward edges and traverses to the other vertices of the edges. The net traversal is the same but in the second command, it is explicitly mentioned to travel through the outgoing edges. Therefore a list of vertices and edges is returned from the starting vertex to the destination vertex.

The by() function in the query gives the necessary information about the vertex/edge instead of the object. The function by() gets assigned in a round-robin fashion to all the vertices and edges in the list. First, there is a person vertex whose name will be shown. Next the outE() shows the edge and the by function shows the label (relation). Next, is the Person Vertex again shown by the name till the end of the list.

Special Relations

Some relations do not occur as frequently as the basic 6 relations. These are divorce and adoption. There are also few instances where two people may have two or more relations between them. This occurs when 1st or 2nd cousins marry each other or when one person from one part of the family gets adopted into another part of the family. In such cases, all meaningful multiple relations must be figured out.

Checking if such relations exist is impossible if there are no decision-making statements. Gremlin by itself does not have any decision-making functions that do not accumulate the previous step’s result. Gremlin console is developed over groovy programming language. Therefore we can take advantage of Groovy scripts for executing certain complex multi-step queries (Domain Specific Language). Let us look at an example to understand some of the multiple relations.

Example: In my family, there is a relative who is adopted to a different part of my family. Since the person already has a relation, this adoption creates a new set of relations with that person. This multi-relationship is not universal to that person either. Some relations like the son/spouse/grandson of the adoptee would not change because of his/her adoption. Therefore I have devised an algorithm for checking for multiple relations.

Algorithm:

Algorithm to check if there is a multiple relations present

This is implemented in groovy script which will automatically fetch relations through adoption (if any). Groovy is very similar to Java therefore writing such this was relatively easy.

adoptedCheck = g.V().has(id,from).repeat(out().dedup()).until(has(id,to)).
path().unfold().where(has('Adopted','Yes'))
isAdopted = adoptedCheck.hasNext()

The above code checks if any of the people in the shortest path from start to destination is adopted. If a person is adopted, we must then check if there is only one relationship between the start and the final vertex. In other words, we must check for “Is there another relation between the adopted and the final person without adoption getting involved ?”. An example of this is visualized below.

parentID= adoptedCheck().in('Father_Of*').valueMap(true).next()[[id]otherRelationCheck = g.V(from).repeat(out().dedup()).until(has(id,parentID)).path().unfold().where(has(id,adoptedID)).hasNext()//If the above check is true then the following command is executed.
final_path = g.V(from).union(repeat(outE().otherV().dedup()).until(has(id,to)).
path().by('Name').by(label),
\
repeat(outE().otherV().dedup()).until(has(id,parentID)).path().
by('Name').by(label)).fold().next()

If there is, the adoption relation does not get applied to them. It is a bit confusing in words so let’s get back to the diagram.

An unusual but real scenario in my extended family.

To maintain simplicity in the diagram only fathers and sons are shown. This relationship complexity exists in some parts of my extended family tree, therefore could not completely use off-the-shelf solutions to capture the genealogy. Person D has adopted his grandson E, thereby creating a new set of relations between all the people in the family tree with E except F. In other words, F (and his successors) will only have a single relationship with E irrespective of E’s adoption. Similarly, E’s spouse and spouse’s family do not get multiple relations because of this adoption. Therefore an algorithm is required to check if the person would have multiple relations with the adopted one (E). The shortest path between A and F is through B and E (Siblings). A — B — E * — F. Since E is involved in the shortest path, there is also another check that must be done. The newly created relation (due to adoption) would be A — B — C — D — E * — F.

Similar to the previous algorithm, another one should be developed to get multiple relations between two people who are first cousins and are married. Using such algorithms within the groovy script makes the python code easier. Python needs to only invoke one groovy function which performs all the required checks and returns all the possible relations.

Summary

In this article, I have shared the different schemas used to store the same people and relations in different ways. The concept of having multiple relations are explored. The most important part of “naming the relation” will be looked into in the next article. We have also looked at the advantage of having a DSL (groovy scripts) which would streamline the application development.

If you like the article kindly clap and share the article with others. Thankyou for reading the article.

--

--