Getting Started with Graph Database Queries, with Cheat Sheet!
Graph databases are quickly becoming a core part of the analytics toolset for enterprise IT organizations. If you know SQL, you can easily learn Cypher and open up a huge opportunity for data analysis.
Graph databases are gaining momentum every year. They will never completely replace relational databases, and they aren’t trying to. But they will start to enter the spaces where datalakes and data warehouses are struggling. A graph database is faster and more intuitive to analyze networks of events, resources, and people:Â
- Financial transactions involving complex patterns, and occasional fraud
- Healthcare interactions between patients, medical staff, facilities and equipment
- Supply chain webs of customers, vendors, contractors and products
- Manufacturing bill of materials with recipes for input materials
Those types of networked relationships are difficult to model and visualize in a relational or dimensional data model. The graph database provides a structure to mimic the real-world networks in business.  Â
As you get started with graph databases and the query languages, it is important to prepare for a shift in your mental model. First off, there is not yet a widely accepted standard query language like SQL. As you can see in the attachment, there is a group of competing languages and a committee struggling to get everyone to agree on a single GQL standard. For our purposes today, we will use the Cypher query language, which is developed and promoted by the top database vendor, Neo4j. Â
In graph queries we lose some syntax from SQL and gain other syntax. SELECT has been replaced by MATCH. FROM and JOIN have been discarded. But the WHERE and ORDER BY commands are used in the same way. Aggregate functions like SUM and AVG are all there, but the GROUP BY has been discarded. Most importantly, though, we gain the ability to query patterns in the graph using the node relationships. In the attached Cheat Sheet, you will see a list of most-commonly used query approaches.  Â
Following is the graph model that will be used in the attached cheat sheet:
I have selected a rental graph because nearly everyone has rented at some time in their life! Obviously, this graph could be much more complex if we added the full list of properties for each node.  Â
Next step is to get some practice. You can download a sample dataset from a source such as Kaggle or from a vendor, such as JanusGraph or Neo4j. Â Â
If you have a dataset at your employer or hobby projects that involves networked relationships, give a graph database a try. You will find that the data that fits awkwardly in a relational database will be right at home in a graph!
Stan Pugsley is a freelance data engineering and analytics consultant based in Salt Lake City, UT. He is also a lecturer at the University of Utah Eccles School of Business. You can reach the author via email.