The Family Tree Software Series

Designing The Layout of a Family Tree #2

An article exploring the requirements to finalize the design

gksriharsha

--

Source

This article is the second one in the series of articles about building genealogy software using a graph database. In the previous article, I have discussed the motivation for creating such software. In this article, I would be going over the requirements of building such an application

Understanding the purpose

Now that I will be able to develop the entire family tree on my own, I could add some features and build it in a way that best suits my requirements. I wanted to add some features which would make it easier to use this software regularly and intuitively. In the following article, I would be sharing my thoughts and reasons for choosing the software which would be used to build the web app. If there are multiple options for any of the following decisions, then go with the one with the most familiarity. It speeds up the development process very much. This advice should be only followed for developing pilot projects/ hobby projects / Proof Of Concept — POCs

Development Platform

I had to choose the base of the application in which APIs should be coded. I had the following requirements that should be fulfilled by making this decision.

  1. Facial Detection and Recognition: In the previous family tree software that I have used, there were options for uploading pictures. Although that option was not available for the free tire, it was available for premium customers. I wanted this language to support Facial Recognition libraries so that later in the development stage, I could add a feature where I would be able to recognize people just by having a picture/video/live(webcam) source. Adding this feature was very important to me because I frequently came across situations where I would have the picture but may not remember the other details about the person. Rapidly rising interest in AR glasses from tech giants such as Apple, make me believe that developers would be given access to create AR applications.
  2. Support for OGM: While working with SQL databases, I came across the concept of ORM — Object Relational Mapper. These libraries would convert an object into an SQL Entity and vice-versa to allow the flow of data to/from the database. The advantage of having such a library during development is tremendous as it takes care of the typecast exceptions and other problems that one may encounter during the conversion of class types. Similarly, for graph databases, there are libraries called OGMs — Object Graph Mappers which help in the development phase of the application. Once the base application is developed, optimizations over default query and query patterns can be made for a better user experience.
  3. Support for Native Query generation: Sometimes the query would be too specific that a function from the aforementioned OGMs would not be sufficient. Therefore support for generating queries manually (in the language) and sending it to the database should be present.
  4. Support for NLP models: This feature is an optional requirement. I have considered this requirement too while choosing the development language. Later in the development of the web app, I would like to include an “assistant” feature that keeps the family tree information updated as I converse with my relatives. Such a feature is definitely a bit of a stretch but on deeper thought looks much more important. The family tree should have a feature that keeps it relevant and up to date. The more people stored in the graph, the higher/faster would be the number of updates that should be made to the tree to keep it complete.
Source Multiple Rubik’s cubes

Based on the requirements that I collected, I felt using Python would be correct for the web app. It has goblin OGM support, facial-recognition library support and gremlin-python library for native query generation support. There are also many packages like TensorFlow and pytorch available to build/use NLP models if required.

Web development framework

I know two major web development frameworks for Python: Flask and Django. Based on my requirements, there should be active support for OGM. Due to that Flask would be the go-to choice. This is not because flask actually supports OGMs inherently. Django is built for applications that for the most part use SQL/ORMs. Therefore by using an OGM, Django makes it harder for the developer to build an application without ORMs. This point is stated in the architecture of Flask.

Django does lots of things automatically. For example, it is importing models from the models.py, administrative interface declarations from admin.py and so on.

Flask philosophy is slightly different — explicit is better than implicit. If something should be initialized, it should be initialized by the developer.

Flask-Admin follows this convention. It is up for you, as a developer, to tell Flask-Admin what should be displayed and how.

Sometimes this will require writing a bit of boilerplate code, but it will pay off in the future, especially if you will have to implement some custom logic.

In other words, Flask follows a bottom-up approach where pieces of code and libraries must be connected together for an application to work. This option of letting the developer connect makes it easier in this case.

Graph Database

Source Hard disk storage

I did not know about graph databases while starting this project. I have searched for a few graph databases that I can use to get some hands-on experience. In the process of searching, I have come across Neo4j and JanusGraph. Initially, I have used neo4j and its Cypher Query Language to build some graphs. The UI was excellent which helped me to understand a gist of what graph databases are. The presence of the movie database and the built-in example cards which contain the commands were helpful. There was a python API for neo4j which I could use for developing the application.

I wanted to continue exploring JanusGraph and its functionality to understand the differences between Cypher Query Language in Neo4j and Gremlin in JanusGraph. Gremlin was a bit tougher to understand when compared to Cypher because of the presence of multiple ways to write a command which does the same task. A very simple example is as follows.

If I would like to invoke a node that has an id property set to 1, then all the following commands perform the same task.

g.V(1)g.V().has(id,1)g.V().where(has(id,1))g.V().outE().otherV().has(id,1)

During this time, I have stumbled upon TinkerPop and have realized that Gremlin has a much higher developer base than Cypher. Therefore, I have continued to learn Gremlin and use JanusGraph for developing the family tree.

JanusGraph has a plug-and-play structure in place to connect storage and index back-ends. In other words, a developer can choose the place where data can be stored based on the requirements of the application. Each of the back-ends is generally used for a specific purpose. Cassandra is used for high availability applications while Redis is used for caching. In such ways, storage back-ends can be chosen. Similarly, there are index back-ends that are responsible for indexing the data. Indexing is done to retrieve and search the data efficiently for the application. I have configured JanusGraph with Cassandra storage back-end and Elastic Search index back-end.

Front End

I have decided to use Angular framework because I was familiar with this front-end framework. There may be more appropriate frameworks such as Vue or React but I felt that the improved experience is too small to notice for a hobby project like this one. If anyone is using other frameworks and feel that they are better, please let me know in the comments.

Layout

The application is designed in such a way that the user interacts with the website. Angular pings the necessary APIs to retrieve the information. Flask server constructs the necessary queries and sends them to the gremlin server. After retrieving the result, it places them in the appropriate HTTP response object and relays the results back to the website. A mobile application can be developed to interact with the flask server for the application.

Summary

In this article, I have shared my approach in selecting the required software. A more sophisticated way would be to use a Pugh matrix to make these decisions. Such tools are used in a production environment where finances and user experience are important. In the upcoming article, I would be discussing graph database queries and how I have formulated them for my application.

If you liked my article, kindly clap and share this article. If you have any questions, please post them in the comment section.

--

--