The Graph Isomorphism AlgorithmAshay Dharwadker and JohnTagore TevetAND THE STRUCTURE SEMIOTICS RESEARCH GROUP S.E.R.R. • EUROACADEMY • TALLINN • 2009 

We present a new polynomialtime
algorithm for determining whether two given graphs are isomorphic or not.
We prove that the algorithm is necessary and sufficient for solving the
Graph Isomorphism Problem in polynomialtime, thus showing that the Graph
Isomorphism Problem is in P. The semiotic theory for the recognition
of graph structure is used to define a canonical form of the sign matrix
of a graph. We prove that the canonical form of the sign matrix is uniquely
identifiable in polynomialtime for isomorphic graphs. The algorithm is
demonstrated by solving the Graph Isomorphism Problem for many of the hardest
known examples. We implement the algorithm in C++ and provide a demonstration
program for Microsoft Windows [download]. Google Scholar Citations © 2009


The Graph Isomorphism Algorithm and its consequence that Graph Isomorphism is in P
were first announced during a special S.E.R.R. seminar at Euroacademy in 2009. The result was subsequently published in the Euroacademy series
Baltic Horizons No. 14 (111) in 2010. We are pleased to announce that
The Graph Isomorphism Algorithm has also been published by
Amazon in 2011. The Endowed Chair of the Institute of Mathematics was bestowed upon Distinguished Professor Ashay Dharwadker in 2012 to honour his fundamental contributions to Mathematics and Natural Sciences.

1. Introduction 

One of the most fundamental problems in
graph theory is the Graph Isomorphism Problem : given two graphs
G_{A}
and G_{B}, are they isomorphic? Graphs G_{A}
and G_{B} are said to be isomorphic if their vertices
can be rearranged so that the corresponding edge structure is exactly the
same. To show that graphs G_{A} and G_{B}
are isomorphic, it suffices to find one such rearrangement of vertices.
On the other hand, to show that G_{A} and G_{B}
are not isomorphic, one must prove that no such rearrangement of vertices
can exist. Without a good algorithm, this problem can be very difficult
to solve even for relatively small graphs.
We present a new polynomialtime GRAPH ISOMORPHISM ALGORITHM for determining whether two given graphs are isomorphic or not. If the given graphs are isomorphic, the algorithm finds an explicit isomorphism function in polynomialtime. In Section 2, we provide precise DEFINITIONS of all the terminology used and introduce the essential concept of a sign matrix according to the semiotic theory for the recognition of graph structure. In Section 3, we present a formal description of the ALGORITHM followed by an example to show how the algorithm works stepbystep. In Section 4, we prove that the algorithm is NECESSARY AND SUFFICIENT for solving the Graph Isomorphism Problem: if graphs G_{A} and G_{B} are isomorphic, then the algorithm finds an explicit isomorphism function; if graphs G_{A} and G_{B} are not isomorphic, then the algorithm determines that no isomorphism function can exist. In Section 5, we show that the algorithm has polynomialtime COMPLEXITY. Thus, we prove that the Graph Isomorphism Problem is in P. In Section 6, we provide an IMPLEMENTATION of the algorithm as a C++ program, together with demonstration software for Microsoft Windows. In Section 7, we demonstrate the algorithm by solving the Graph Isomorphism Problem for several EXAMPLES of graphs in the hardest known cases. In Section 8, we list the REFERENCES. 

2. Definitions 

To begin with, we present elementary definitions
of all the terminology used, following [1]. Thereafter,
we introduce the essential concept of a sign matrix according to the semiotic
theory for the recognition of graph structure, following [2].
A finite simple graph G consists of a set of vertices V, with V = n, and a set of edges E, such that each edge is an unordered pair of distinct vertices. The definition of a simple graph G forbids loops (edges joining a vertex to itself) and multiple edges (many edges joining a pair of vertices), whence the set E must also be finite, with E = m. We label the vertices of G with the integers 1, 2, ..., n. If the unordered pair of vertices {u, v} is an edge in G, we say that u is adjacent to v and write uv ∈ E. Adjacency is a symmetric relationship: uv ∈ E if and only if vu ∈ E. The degree of a vertex v is the number of vertices that are adjacent to v. A (u, v)path P in G is a sequence of distinct vertices u = v_{1}, v_{2}, ..., v_{k} = v such that v_{i}v_{i}_{+1} ∈ E for i = 1, 2, ..., k1. If such a (u, v)path P exists, then the vertices u and v are said to be connected by a path of length k1. Given any pair of vertices (u,
v) in G, we define
the distance
We now introduce the key ingredients of semiotic theory. For any pair of vertices (u, v) in G, the collateral graph G\uv is defined as follows:
where
The adjacency matrix of G is an n × n matrix with the entry in row u and column v equal to 1 if uv ∈ E and equal to 0 if uv ∉ E. Thus, the adjacency matrix of the graph G can be recovered from the leading binary signs of the entries of the sign matrix S. Note that for a simple graph G, both the adjacency matrix and the sign matrix S are symmetric. We shall now define a canonical form S* of the sign matrix
by ordering the rows and columns of S in a certain way. First, write
the set of all distinct (u,
v)signs s_{uv}
in lexicographic order s_{1},
s_{2}, ...,
s_{r}.
Then, for each row i of the sign matrix, i = 1, 2, ..., n,
compute the sign frequency vector
where f_{i}^{(k)} is the number of times the sign s_{k} occurs in row i. Since S is symmetric, the sign frequency vector for column i is the same as the sign frequency vector for row i, for i = 1, 2, ..., n. Now, write the sign frequency vectors f_{1}, f_{2}, ..., f_{n} in lexicographic order ^{f}i_{1}, ^{f}i_{2}, ..., ^{f}i_{n}. Reorder the rows and columns of the sign matrix according to the permutation i_{1}, i_{2}, ..., i_{n} of the vertices 1, 2, ..., n of G to obtain the canonical form S* of the sign matrix. The vertices of G are partitioned into equivalence classes consisting of vertices with the same sign frequency vectors. Thus, the canonical form S* of the sign matrix is uniquely defined only upto permutations of vertices within each equivalence class. Graphs G_{A} and G_{B} are said to be
isomorphic
if there exists a bijection
from the vertices of graph G_{A} to the vertices of graph G_{B}, such that uv is an edge in graph G_{A} if and only if φ(u)φ(v) is an edge in graph G_{B}. The graph isomorphism problem is to determine whether two given graphs are isomorphic or not. An algorithm is a problemsolving method suitable for implementation as a computer program. While designing algorithms we are typically faced with a number of different approaches. For small problems, it hardly matters which approach we use, as long as it is one that solves the problem correctly. However, there are many problems for which the only known algorithms take so long to compute the solution that they are practically useless. For instance, the naïve approach of computing all n! possible permutations of the n vertices to show that a pair of graphs G_{A} and G_{B} are not isomorphic is impractical even for small inputs. A polynomialtime algorithm is one whose number of computational steps is always bounded by a polynomial function of the size of the input. Thus, a polynomialtime algorithm is one that is actually useful in practice. The class of all problems that have polynomialtime algorithms is denoted by P. If graphs G_{A} and G_{B} are isomorphic then they must have the same sign frequency vectors in lexicographic order ^{f}i_{1}, ^{f}i_{2}, ..., ^{f}i_{n} and we shall show that our algorithm obtains identical canonical forms of their sign matrices S_{A}* and S_{B}* in polynomialtime, thus exhibiting an explicit isomorphism function φ. Conversely, we shall show that our algorithm determines in polynomialtime that the sign matrices S_{A}* and S_{B}* cannot be expressed in identical canonical forms if and only if the graphs G_{A} and G_{B} are not isomorphic. Thus, we have a polynomialtime algorithm for solving the graph isomorphism problem, showing that the graph isomorphism problem is in P. 

3. Algorithm 

We are now ready to present a formal description
of the algorithm. After that, the steps of the algorithm will be illustrated
by an example. We begin by defining four procedures.
3.1. Procedure. This procedure is Dijkstra's algorithm [3]. Given a graph G and a vertex u, we compute shortest (u, v)paths to all vertices v of G. Define a(u, v) = 1 if uv ∈ E and a(u, v) = ∞ if uv ∉ E. We maintain a set V_{known} of vertices to which the shortest (u, v)path is known and a tentative distance d'(u, w) for each vertex w outside V_{known}.
The algorithm first computes all the pair graphs of G_{A}.
To see how this is done, let us explicitly compute the pair graph G_{12}
for the pair of vertices (1, 2). First, Procedure 3.2 computes the shortest
paths from vertex 1 in G_{A}\12 as (1), (1, 7, 2), (1, 7,
3), (1, 7), (1, 8), (1, 4), (1, 5) and (1, 6). Then, Procedure 3.2 computes
the shortest paths from vertex 2 in G_{A}\12 as (2), (2,
7, 1), (2, 7, 3), (2, 7), (2, 8), (2, 4), (2, 5) and (2, 6). The distance
d(1,
2) = 2 is given by the length of any shortest (1, 2)path found in G_{A}\12
so far. Now, Procedure 3.2 obtains the shortest (1, 2)paths (1, 7, 2),
(1, 8, 2), (1, 4, 2), (1, 5, 2) and (1, 6, 2) whose union gives the 7 vertices
of the pair graph {1, 2, 4, 5, 6, 7, 8}. The pair graph has 16 edges {1,7},
{1,8}, {1,4}, {1,5}, {1,6}, {2,7}, {2,8}, {2,4}, {2,5}, {2,6}, {7,4}, {7,5},
{8,4}, {8,5}, {4,6} and {5,6}. Since {1, 2} is not an edge in G_{A},
the leading binary sign is negative and Procedure 3.3 computes the sign
s_{12}
= 2.7.16. Similarly, Procedure 3.3 computes all the signs s_{ij}
for i, j = 1, 2, 3, 4, 5, 6, 7, 8. Note that for i
= j the sign is always 0.1.0. Thus, Procedure 3.3 computes the
sign matrix S_{A}. Then, Procedure 3.3 counts the number
of times each sign occurs in a column of S_{A} and obtains
the sign frequency vectors for each column of S_{A}. Finally,
Procedure 3.3 reorders the rows and columns of S_{A}
according to the lexicographic order of the sign frequency vectors, to
obtain the canonical form of the sign matrix S_{A}*. We
use the following convention to display the sign matrix : the row and column
headers show the vertex labels and the equivalence classes of vertices
are distinguished by different shades of blue; the sign frequency vectors,
vertex degrees and equivalence class numbers are displayed along the column
footers.
Similarly, Procedure 3.3 obtains the canonical form of the sign matrix
S_{B}*
:
Next, the algorithm checks that the sign frequency vectors in lexicographic
order for S_{A}* and S_{B}* are the same,
( ^{f}_{A}4,
^{f}_{A}5,
^{f}_{A}1,
^{f}_{A}2,
^{f}_{A}3,
^{f}_{A}7,
^{f}_{A}8,
^{f}_{A}6
) = ( ^{f}_{B}1,
^{f}_{B}8,
^{f}_{B}7,
^{f}_{B}3,
^{f}_{B}6,
^{f}_{B}4,
^{f}_{B}5,
^{f}_{B}2)
= (01106, 01106, 20132, 20132, 20132, 20132, 20132, 20132). Finally, the
algorithm runs through the loop k = 1, 2, 3, 4, 5, 6, 7, 8 to find
an explicit isomorphism if it exists. Starting with k = 1, set A
= S_{A}* and B = S_{B}* :
Since k = 1, there is no initial interchange of rows and columns
of B. Now, the algorithm uses Procedure 3.4. The entries of A
and B are read in row major order. The first mismatch is found in
the third row and fourth column, shown underlined. The algorithm finds
that exchanging the fourth column with the fifth column (and the fourth
row with the fifth row) of B will push the first mismatch further
along the row major order:
The first mismatch is found in the third row and fifth column, shown
underlined. The algorithm finds that exchanging the fifth column with the
eighth column (and the fifth row with the eighth row) of B will
push the first mismatch further along the row major order:
Now there is no mismatch, A = B. The algorithm exits the
final loop and reports that an isomorphism has been found. The explicit
isomorphism φ is given by reading the vertex
labels of A and B in this order:
If the graphs G_{A} and G_{B} are redrawn
with vertices ordered in this way, the isomorphism φ
is easy to visualize.


5. Complexity 
We shall now show that the algorithm terminates
in polynomialtime, by specifying a polynomial of the larger of the two
number of vertices
n of the input graphs, that is an upper bound
on the total number of computational steps performed by the algorithm.
Note that we consider
5.1. Proposition. Given a graph G with n vertices, Procedure 3.1 takes at most 3n^{2} + 3n steps to find shortest paths from an initial vertex u to all other vertices. Proof. Initialization takes at most 3n steps. To find the minimum distance of an unknown vertex from the initial vertex u takes at most n steps and to update the tentative distances takes at most n steps. There are at most n iterations until all the vertices are known. Finally, it takes at most n^{2} steps to recover the vertices of the shortest paths. Thus, Procedure 3.1 terminates after at most 3n + n(n + n) + n^{2} = 3n^{2} + 3n steps. ☐ 5.2. Proposition. Given a graph G with n vertices, Procedure 3.2 takes at most 7n^{2} + 7n steps to compute the distance d(u, v) in the collateral graph G\uv and the pair graph G_{uv} for a given pair of vertices u and v. Proof. The graph G\uv also has n vertices. By Proposition 5.1, Procedure 3.1 takes at most 3n^{2} + 3n steps to find shortest paths from the initial vertex u to all other vertices and at most 3n^{2} + 3n steps to find shortest paths from the initial vertex v to all other vertices. Then it takes at most n steps to determine the distance d(u, v). Finally, it takes at most n^{2} steps to run through pairs of shortest paths to find the vertices of the pair graph G_{uv}. Thus, Procedure 3.2 terminates after at most 3n^{2} + 3n + 3n^{2} + 3n + n + n^{2} = 7n^{2} + 7n steps. ☐ 5.3. Proposition. Given a graph G with n vertices, Procedure 3.3 takes at most 7n^{4} + 7n^{3} + 2n^{2} steps to compute the canonical form of the sign matrix S*. Proof. By Proposition 5.2, for each pair of vertices it takes at most 7n^{2} + 7n steps to compute the sign s_{uv}. Since there are n^{2} signs, it takes at most n^{2}(7n^{2} + 7n) = 7n^{4} + 7n^{3} steps to compute the sign matrix S. Then it takes at most n^{2} steps to compute the sign frequency vector and at most n^{2} steps to sort it in lexicographic order. Thus, Procedure 3.3 terminates after at most 7n^{4} + 7n^{3} + n^{2}+ n^{2} = 7n^{4} + 7n^{3} + 2n^{2} steps. ☐ 5.4. Proposition. Given a sign matrices S_{A}* and S_{B}* such that the sign frequency vectors in lexicographic order ( ^{f}_{A}i_{1}, ^{f}_{A}i_{2}, ..., ^{f}_{A}i_{n} ) = ( ^{f}_{B}i'_{1}, ^{f}_{B}i'_{2}, ..., ^{f}_{B}i'_{n} ), Procedure 3.4 takes at most 2n^{4} steps to terminate. Proof. Since there are n^{2} entries in S_{B}*, it takes at most n^{2} steps to find a mismatch (i, j) with the corresponding entry in S_{A}* in row major order. Then, it takes at most n^{2} steps along the row i to find a column j' such that interchanging rows ( j, j' ) and columns ( j, j' ) leads to a mismatch later in the row major order. This may be repeated at most n^{2} times until either the corresponding interchange column j' cannot be found or all the entries in the sign matrices are perfectly matched. Thus, Procedure 3.4 terminates after at most n^{2}(n^{2} + n^{2}) = 2n^{4} steps. ☐ 5.5. Proposition. Given graphs G_{A} and G_{B} with n vertices, the algorithm takes at most 2n^{5} + 14n^{4} + 14n^{3} + 4n^{2} steps to terminate. Proof. By Proposition 5.3, Procedure 3.3 takes at most 2(7n^{4} + 7n^{3} + 2n^{2}) = 14n^{4} + 14n^{3} + 4n^{2} steps to compute the canonical forms of the sign matrices S_{A}* and S_{B}*. If the sign frequency vectors in lexicographic order ( ^{f}_{A}i_{1}, ^{f}_{A}i_{2}, ..., ^{f}_{A}i_{n} ) = ( ^{f}_{B}i'_{1}, ^{f}_{B}i'_{2}, ..., ^{f}_{B}i'_{n} ) are the same, then by Proposition 5.4 the final loop for k = 1, 2, ..., n takes at most n(2n^{4}) = 2n^{5} steps. Thus, the algorithm terminates after at most 2n^{5} + 14n^{4} + 14n^{3} + 4n^{2} steps. ☐ From Theorem 4.3 and Proposition 5.5, we have 5.6. Theorem. The Graph Isomorphism Problem is in P. ☐ 
6. Implementation 

We provide a demonstration program for
the Graph Isomorphism Algorithm written in C++ for Microsoft Windows.
The input consists of the files Graph A.txt and Graph B.txt
containing the adjacency matrices of graph G_{A} and graph
G_{B}
respectively. The program computes the sign matrices
S_{A}*
and S_{B}* in canonical form and determines whether G_{A}
and G_{B} are isomorphic or not, in polynomialtime.
We show how to write the input for the computation performed in Example
3.6:
The C++ program is shown below:
The output of the program for the input
in Figure 6.2 is shown in Example 3.6. The next section shows many more
examples of input/output files. The download package also contains a visualizer
for drawing graphs according to the output of the demonstration program.


7. Examples 

We now demonstrate the Graph Isomorphism
Algorithm for several examples. The first set of examples 7.17.5 consists
of isomorphic graphs whose vertices have been permuted randomly so that
the isomorphism is well and truly hidden. The second set of examples 7.67.9
consists of graphs that are not isomorphic and yet have a very similar
structure, hence deciding that they are not isomorphic in polynomialtime
demonstrates the power of the algorithm.
7.1. Example. We run the program on isomorphic Petersen [5]
graphs A and B as input:
The algorithm finds an explicit isomorphism, shown below.
7.2. Example. We run the program on isomorphic Icosahedron [6]
graphs A and B as input:
The algorithm finds an explicit isomorphism, shown below.
7.3. Example. We run the program on isomorphic Ramsey [7]
graphs A and B as input:
The algorithm finds an explicit isomorphism, shown below.
7.4. Example. We run the program on isomorphic Dodecahedron [6]
graphs A and B as input:
The algorithm finds an explicit isomorphism, shown below.
7.5. Example. We run the program on isomorphic Coxeter [8]
graphs A and B as input:
The algorithm finds an explicit isomorphism, shown below.
7.6. Example. We run the program on nonisomorphic Praust [9]
graphs A and B as input:
Graph A and Graph B have the same sign frequency vectors in lexicographic
order, so their structure is very similar. The algorithm determines that
the graphs are not isomorphic, shown below.
7.7. Example. We run the program on nonisomorphic Mathon [10]
graphs A and B as input:
Graph A and Graph B have the same sign frequency vectors in lexicographic
order, so their structure is very similar. The algorithm determines that
the graphs are not isomorphic, shown below.
7.8. Example. We run the program on nonisomorphic Weisfeiler
[11]
graphs A and B as input:
Graph A and Graph B have different sign frequency vectors in lexicographic
order. The algorithm determines that the graphs are not isomorphic, shown
below.
7.9. Example. We run the program on nonisomorphic Siberian
[12]
graphs A and B as input:
Graph A and Graph B have the same sign frequency vectors in lexicographic order, so their structure is very similar. The algorithm determines that the graphs are not isomorphic, shown below.


References 








S.E.R.R. • EUROACADEMY • TALLINN • 2009 



