The Family Tree Software Series

Structure and CRUD operations of a graph database #4

Article explaining the structure of the graph database and CRUD operations

gksriharsha

--

Source. Highlights of the total picture.

This is the fourth article about building genealogy software using a graph database. In the first three articles, I have mentioned the design of the application and some frequently used commands in gremlin. In this article, I would be going over the structure of my graph database and some basic operations performed on it.

Structure of the database

We as humans know that if a person, Alice, is the mother of a child Bob, then Bob is the child of Alice. In other words, if a relation in one way is established, the relation in the opposite way is implied. This is done in two ways:

  • Database level: To every person added to the family tree, the opposite relation should be created and stored in the database. This method causes redundancy in storage as it is storing the same information from a different perspective. The advantage of this method is a lighter processing load. In this model, both the relations, Alice is the mother of Bob and Bob is the son of Alice are stored.
  • Logic level: The connections are kept to the minimum when storing in the database but the full relation is built after extracting the information. In other words, only one-directional data is stored in the database. Continuing this example, only “Alice is the mother of Bob” is stored and implications are inferred by Python.

There may be a mixture of both solutions that may perform better. I have chosen to store all the data (database level) in the database for the sake of simplicity and clarity.

Vertices and Edges

Vertices have a label property that states the type of vertex. The most important type of vertex in a family tree is of type Person. These vertices will be connected relations (edges). There are 6 basic types of relations that two Person vertices can be connected. They are Father_Of, Mother_Of, Son_Of, Daughter_Of, Wife_Of, Husband_Of. There would have been fewer edges if we would have chosen a different type of storage model.

Person Vertices have many properties that should be stored. Some of the most crucial ones are Firstname, Lastname, Gender, Date of Birth, Occupation, etc. Some of my grandparents do not remember the date of birth as per the modern-day calendar. They remember according to our regional calendar. There are APIs and other services online which convert from our regional calendar to an approximate modern-day calendar value and vice-versa. I would like to keep the original date as it is because this conversion is not exact. Such features were the main aim of building the genealogy software from scratch.

All the special categories of information that are of interest must be made into a vertex. One such category of interest is Location. This type of node has far fewer properties when compared with the Person vertex. It has Place, State, Country as the vertex properties. A person vertex can be connected with a location vertex as Born_in, Died_in, Lives_in, Married_in, etc.

CRUD functions in gremlin

CRUD — Create, Read, Update, Delete operations are primary steps in any database. For the creation of a vertex, I have a class in python that stores all the necessary information of the vertex. One of the methods in that class is to create the vertex node according to the non-empty values present in the class.

#Create operation
dictionary = person.__dict__
vert = g.addV('Person')for item in dictionary.items():
vert.property(item[0],item[1])
val = vert.next() // query will be executed here.

An edge can be created with the following command

g.V(1).addE(‘Father_Of’).to(V(2)).next()

This command adds a relation Father_Of relation from vertex 1 to vertex 2. Immediately after this command, the opposite relation (son/daughter) will be added to vertex 2. In this way, for every relation added, the opposite relations are also added.

Read operation has two variations to it. In one case, a full read is performed i.e, all the properties are read and sent to the application. In such scenario g.V(1).elementMap().next()can be used. If only some properties are required from the vertex, then g.V(1).elementMap('Firstname','Lastname').next()can be used. This will enhance the user experience of the application by reducing the load times.

Read operation can also be performed on edges. Generally performing a read on edge that is originating or ending on a vertex of interest is useful. g.V().bothE().next() would give all possible relationships from all nodes in the graph. This is not useful as it is not particular to a vertex. If only a particular vertex’s edge information is required, the same elementMap() function can be used. g.V(1).bothE('Mother_Of').elementMap().next() would give all the edge properties of the Mother_Of relation. If no parameter is passed to the bothE() function, details about all the edges connecting the vertex 1 will be given.

The update operation is a little tricky. Due to an issue present in the language variant (gremlin-python), the same command property() cannot be used to update the property of a vertex. Therefore instead of using the library, I have passed the raw string command to the gremlin server.

# Update snippet
cli = client.Client('ws://localhost:8182/gremlin','g')
query = f'g.V({ID})'
for item in person.__dict__.items():
query = query+(f'.property({item[0]},{item[1]})'
cli.submit(query).all().result()

The update operation is not used for edges except in the case of a divorce. Only during a divorce relationship, the edge relationship between the couple is changed to ex-spouse_Of. In all other cases, the relationship does not change. I have not heard of news where any of the other 4 base relations (father, son, daughter, mother) have ended therefore did not include it in the functionality.

Delete operation is very simple g.V(1).drop().next(). The data is created and updated but never deleted because we are creating records of existence. Neither the people nor their relations with other vertices are ever deleted. Therefore delete operation is not present in our application.

Summary

In this article, I have shared the database structure and CRUD operation commands used in my genealogy software. In the next article we would looking at traversal between the nodes that are added using these commands.

If you liked this article, please clap and share the article.

--

--