 # Introduction to DSE Graph

## Introduction

Datastax Enterprise Graph (DSE Graph) is a NoSQL database tool that is optimized for storing objects and their relationships. It is a distributed database, just like Cassandra, and has many of the same advantages. It can scale to handle massive amounts of data, is optimized for high-speed transactions, and built for high reliability, just like Cassandra. DataStax enterprise graph specifically, also comes with access to other DataStax enterprise features, like DSE Search capabilities and security features.

## Graph Theory

### Overview

Graph theory is a topic in mathematics/computer science that describes a certain kind of data structure. A graph is a collection of nodes and vertices. A node or vertex is a container for any kind of object. Its function in a graph is to hold data that is being related to other data. Edges connect nodes to each other. They define the relationship between nodes. Edges can be directed or undirected.

Many other types of data structures can be built from the same components that make up a graph. For instance, a queue or stack has nodes that contain data and edges that connect them to define how to navigate between them. A tree is also built out of nodes and edges, with a particular node designated the root defining the hierarchy of the tree. In this sense, a graph is a superclass of object, meaning that rules that apply to graphs generally can also apply to subclasses of graphs, like trees.

### Graph Taxonomy

Graphs with different characteristics are organized into groups. A simple graph is one with no duplicate edges and no loops. A multi-graph can have more than one edge connecting the same two nodes. A pseudo-graph is a multi-graph that also has loops. Directed graphs have all of the same categories, but are made with only directed edges that point from one node to another. A mixed graph has both directed and undirected edges. A fully connected graph is one where every node is connected to every other node.

### Graph Search

Finding a specific node in a graph from a starting point is a problem who’s solutions have many practical applications. For example, finding a path on a map from one location to another can be understood as a graph search problem. In order to solve graph search problems, we use a number of different traversal methods.

A random traversal method puts all of the eligible nodes into a list and picks randomly from them, adding the nodes adjacent to the recently explored nodes to the list each time. It, like all of our traversal algorithms, starts with a list consisting of nodes adjacent to the starting point. A depth-first search uses a stack of nodes in order to traverse the child nodes of a particular node before exploring the sibling nodes. A breadth-first search uses a queue to traverse the sibling nodes before any that are further out. There are other traversal types that use edge information to create various types of best-first searches.

## DSE Graph

DSE Graph is a graph database and has characteristics much like the graphs described above. It is composed of vertices and edges, but both of these can have properties. Properties are a key-value pair used to store data. Properties can be attached to either nodes or vertices. DSE Graph edges can be reflexive, pointing back to the same node they come from, in order to denote relationships between two objects of the same type. Nodes can also have multiple edges between them, emphasizing different relationships between two types of objects. Object types and relationship descriptions are provided by vertex labels and edge labels. All other information is supplied via properties.

### Counts

The graph generally is referred to using the variable g. Calling g by itself does not do anything, however. g.V() returns all of the vertices in the graph. We can then append count() to the end to get a count of all of the vertices in the graph. g.E() does the same thing for edges in the graph.

### Schema

When creating a graph, first, schemas need to be created for all of the vertices, edges, and properties that are to be used in the graph. schema.propertyKey, schema.vertexLabel, and schema.edgeLabel are used to create the schema for the objects in our graph. g.addV and g.addE are then used to create vertices and edges. What properties go with a particular type of edge or vertex are define when the schema for that object is defined and then filled in with data during the creation of that vertex or edge. Vertices are added with g.addV(label, [propertykey, propertyValue]). Edges are added with g.V(firstVertex).addE(edgeLabel).to(secondVertex)

### Traversal

In DSE graph traversal is a set of conditions and filters used to query your graph. A set of traversal functions exists that can be placed in sequence to produce a query that is useful. One query step is the has function, which pulls out vertices or edges with specific properties. You must specify the label of the vertex or edge and both the property key and value. You can also move along edges by using the inE or outE functions. They return the incoming or outgoing edges connected to a vertex. inV and outV return the vertices connected to those incoming or outgoing edges. The values function returns the property value for a given property for all nodes passed by the previous traversal steps. There are many types of traversal steps and they can be combined in order to give useful queries.

## Conclusion

DSE Graph is a graph database. It takes many aspects of its operation from graph theory graphs as well as from Cassandra. Graph databases are a useful way to store relationships between entities and we can use the Gremlin query language to create traversals that return useful information about entities that are part of a graph.