![]() |
The Graph Isomorphism AlgorithmAshay Dharwadker and John-Tagore TevetAND THE STRUCTURE SEMIOTICS RESEARCH GROUP S.E.R.R. • EUROACADEMY • TALLINN • 2009 |
![]() |
|
We present a new polynomial-time
algorithm for determining whether two given graphs are isomorphic or not.
We prove that the algorithm is necessary and sufficient for solving the
Graph Isomorphism Problem in polynomial-time, thus showing that the Graph
Isomorphism Problem is in P. The semiotic theory for the recognition
of graph structure is used to define a canonical form of the sign matrix
of a graph. We prove that the canonical form of the sign matrix is uniquely
identifiable in polynomial-time for isomorphic graphs. The algorithm is
demonstrated by solving the Graph Isomorphism Problem for many of the hardest
known examples. We implement the algorithm in C++ and provide a demonstration
program for Microsoft Windows [download]. Google Scholar Citations © 2009
|
![]() |
|
The Graph Isomorphism Algorithm and its consequence that Graph Isomorphism is in P
were first announced during a special S.E.R.R. seminar at Euroacademy in 2009. The result was subsequently published in the Euroacademy series
Baltic Horizons No. 14 (111) in 2010. We are pleased to announce that
The Graph Isomorphism Algorithm has also been published by
Amazon in 2011. The Endowed Chair of the Institute of Mathematics was bestowed upon Distinguished Professor Ashay Dharwadker in 2012 to honour his fundamental contributions to Mathematics and Natural Sciences.
|
![]() |
1. Introduction |
||
One of the most fundamental problems in
graph theory is the Graph Isomorphism Problem : given two graphs
GA
and GB, are they isomorphic? Graphs GA
and GB are said to be isomorphic if their vertices
can be rearranged so that the corresponding edge structure is exactly the
same. To show that graphs GA and GB
are isomorphic, it suffices to find one such rearrangement of vertices.
On the other hand, to show that GA and GB
are not isomorphic, one must prove that no such rearrangement of vertices
can exist. Without a good algorithm, this problem can be very difficult
to solve even for relatively small graphs.
We present a new polynomial-time GRAPH ISOMORPHISM ALGORITHM for determining whether two given graphs are isomorphic or not. If the given graphs are isomorphic, the algorithm finds an explicit isomorphism function in polynomial-time. In Section 2, we provide precise DEFINITIONS of all the terminology used and introduce the essential concept of a sign matrix according to the semiotic theory for the recognition of graph structure. In Section 3, we present a formal description of the ALGORITHM followed by an example to show how the algorithm works step-by-step. In Section 4, we prove that the algorithm is NECESSARY AND SUFFICIENT for solving the Graph Isomorphism Problem: if graphs GA and GB are isomorphic, then the algorithm finds an explicit isomorphism function; if graphs GA and GB are not isomorphic, then the algorithm determines that no isomorphism function can exist. In Section 5, we show that the algorithm has polynomial-time COMPLEXITY. Thus, we prove that the Graph Isomorphism Problem is in P. In Section 6, we provide an IMPLEMENTATION of the algorithm as a C++ program, together with demonstration software for Microsoft Windows. In Section 7, we demonstrate the algorithm by solving the Graph Isomorphism Problem for several EXAMPLES of graphs in the hardest known cases. In Section 8, we list the REFERENCES. |
||
2. Definitions |
|||||||||||||||||||||||||||||
To begin with, we present elementary definitions
of all the terminology used, following [1]. Thereafter,
we introduce the essential concept of a sign matrix according to the semiotic
theory for the recognition of graph structure, following [2].
A finite simple graph G consists of a set of vertices V, with |V| = n, and a set of edges E, such that each edge is an unordered pair of distinct vertices. The definition of a simple graph G forbids loops (edges joining a vertex to itself) and multiple edges (many edges joining a pair of vertices), whence the set E must also be finite, with |E| = m. We label the vertices of G with the integers 1, 2, ..., n. If the unordered pair of vertices {u, v} is an edge in G, we say that u is adjacent to v and write uv ∈ E. Adjacency is a symmetric relationship: uv ∈ E if and only if vu ∈ E. The degree of a vertex v is the number of vertices that are adjacent to v. A (u, v)-path P in G is a sequence of distinct vertices u = v1, v2, ..., vk = v such that vivi+1 ∈ E for i = 1, 2, ..., k-1. If such a (u, v)-path P exists, then the vertices u and v are said to be connected by a path of length k-1. Given any pair of vertices (u,
v) in G, we define
the distance
We now introduce the key ingredients of semiotic theory. For any pair of vertices (u, v) in G, the collateral graph G\uv is defined as follows:
where
The adjacency matrix of G is an n × n matrix with the entry in row u and column v equal to 1 if uv ∈ E and equal to 0 if uv ∉ E. Thus, the adjacency matrix of the graph G can be recovered from the leading binary signs of the entries of the sign matrix S. Note that for a simple graph G, both the adjacency matrix and the sign matrix S are symmetric. We shall now define a canonical form S* of the sign matrix
by ordering the rows and columns of S in a certain way. First, write
the set of all distinct (u,
v)-signs suv
in lexicographic order s1,
s2, ...,
sr.
Then, for each row i of the sign matrix, i = 1, 2, ..., n,
compute the sign frequency vector
where fi(k) is the number of times the sign sk occurs in row i. Since S is symmetric, the sign frequency vector for column i is the same as the sign frequency vector for row i, for i = 1, 2, ..., n. Now, write the sign frequency vectors f1, f2, ..., fn in lexicographic order fi1, fi2, ..., fin. Reorder the rows and columns of the sign matrix according to the permutation i1, i2, ..., in of the vertices 1, 2, ..., n of G to obtain the canonical form S* of the sign matrix. The vertices of G are partitioned into equivalence classes consisting of vertices with the same sign frequency vectors. Thus, the canonical form S* of the sign matrix is uniquely defined only upto permutations of vertices within each equivalence class. Graphs GA and GB are said to be
isomorphic
if there exists a bijection
from the vertices of graph GA to the vertices of graph GB, such that uv is an edge in graph GA if and only if φ(u)φ(v) is an edge in graph GB. The graph isomorphism problem is to determine whether two given graphs are isomorphic or not. An algorithm is a problem-solving method suitable for implementation as a computer program. While designing algorithms we are typically faced with a number of different approaches. For small problems, it hardly matters which approach we use, as long as it is one that solves the problem correctly. However, there are many problems for which the only known algorithms take so long to compute the solution that they are practically useless. For instance, the naïve approach of computing all n! possible permutations of the n vertices to show that a pair of graphs GA and GB are not isomorphic is impractical even for small inputs. A polynomial-time algorithm is one whose number of computational steps is always bounded by a polynomial function of the size of the input. Thus, a polynomial-time algorithm is one that is actually useful in practice. The class of all problems that have polynomial-time algorithms is denoted by P. If graphs GA and GB are isomorphic then they must have the same sign frequency vectors in lexicographic order fi1, fi2, ..., fin and we shall show that our algorithm obtains identical canonical forms of their sign matrices SA* and SB* in polynomial-time, thus exhibiting an explicit isomorphism function φ. Conversely, we shall show that our algorithm determines in polynomial-time that the sign matrices SA* and SB* cannot be expressed in identical canonical forms if and only if the graphs GA and GB are not isomorphic. Thus, we have a polynomial-time algorithm for solving the graph isomorphism problem, showing that the graph isomorphism problem is in P. |
|||||||||||||||||||||||||||||
3. Algorithm |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
We are now ready to present a formal description
of the algorithm. After that, the steps of the algorithm will be illustrated
by an example. We begin by defining four procedures.
3.1. Procedure. This procedure is Dijkstra's algorithm [3]. Given a graph G and a vertex u, we compute shortest (u, v)-paths to all vertices v of G. Define a(u, v) = 1 if uv ∈ E and a(u, v) = ∞ if uv ∉ E. We maintain a set Vknown of vertices to which the shortest (u, v)-path is known and a tentative distance d'(u, w) for each vertex w outside Vknown.
The algorithm first computes all the pair graphs of GA.
To see how this is done, let us explicitly compute the pair graph G12
for the pair of vertices (1, 2). First, Procedure 3.2 computes the shortest
paths from vertex 1 in GA\12 as (1), (1, 7, 2), (1, 7,
3), (1, 7), (1, 8), (1, 4), (1, 5) and (1, 6). Then, Procedure 3.2 computes
the shortest paths from vertex 2 in GA\12 as (2), (2,
7, 1), (2, 7, 3), (2, 7), (2, 8), (2, 4), (2, 5) and (2, 6). The distance
d(1,
2) = 2 is given by the length of any shortest (1, 2)-path found in GA\12
so far. Now, Procedure 3.2 obtains the shortest (1, 2)-paths (1, 7, 2),
(1, 8, 2), (1, 4, 2), (1, 5, 2) and (1, 6, 2) whose union gives the 7 vertices
of the pair graph {1, 2, 4, 5, 6, 7, 8}. The pair graph has 16 edges {1,7},
{1,8}, {1,4}, {1,5}, {1,6}, {2,7}, {2,8}, {2,4}, {2,5}, {2,6}, {7,4}, {7,5},
{8,4}, {8,5}, {4,6} and {5,6}. Since {1, 2} is not an edge in GA,
the leading binary sign is negative and Procedure 3.3 computes the sign
s12
= -2.7.16. Similarly, Procedure 3.3 computes all the signs sij
for i, j = 1, 2, 3, 4, 5, 6, 7, 8. Note that for i
= j the sign is always -0.1.0. Thus, Procedure 3.3 computes the
sign matrix SA. Then, Procedure 3.3 counts the number
of times each sign occurs in a column of SA and obtains
the sign frequency vectors for each column of SA. Finally,
Procedure 3.3 reorders the rows and columns of SA
according to the lexicographic order of the sign frequency vectors, to
obtain the canonical form of the sign matrix SA*. We
use the following convention to display the sign matrix : the row and column
headers show the vertex labels and the equivalence classes of vertices
are distinguished by different shades of blue; the sign frequency vectors,
vertex degrees and equivalence class numbers are displayed along the column
footers.
Similarly, Procedure 3.3 obtains the canonical form of the sign matrix
SB*
:
Next, the algorithm checks that the sign frequency vectors in lexicographic
order for SA* and SB* are the same,
( fA4,
fA5,
fA1,
fA2,
fA3,
fA7,
fA8,
fA6
) = ( fB1,
fB8,
fB7,
fB3,
fB6,
fB4,
fB5,
fB2)
= (01106, 01106, 20132, 20132, 20132, 20132, 20132, 20132). Finally, the
algorithm runs through the loop k = 1, 2, 3, 4, 5, 6, 7, 8 to find
an explicit isomorphism if it exists. Starting with k = 1, set A
= SA* and B = SB* :
Since k = 1, there is no initial interchange of rows and columns
of B. Now, the algorithm uses Procedure 3.4. The entries of A
and B are read in row major order. The first mismatch is found in
the third row and fourth column, shown underlined. The algorithm finds
that exchanging the fourth column with the fifth column (and the fourth
row with the fifth row) of B will push the first mismatch further
along the row major order:
The first mismatch is found in the third row and fifth column, shown
underlined. The algorithm finds that exchanging the fifth column with the
eighth column (and the fifth row with the eighth row) of B will
push the first mismatch further along the row major order:
Now there is no mismatch, A = B. The algorithm exits the
final loop and reports that an isomorphism has been found. The explicit
isomorphism φ is given by reading the vertex
labels of A and B in this order:
If the graphs GA and GB are redrawn
with vertices ordered in this way, the isomorphism φ
is easy to visualize.
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
5. Complexity |
We shall now show that the algorithm terminates
in polynomial-time, by specifying a polynomial of the larger of the two
number of vertices
n of the input graphs, that is an upper bound
on the total number of computational steps performed by the algorithm.
Note that we consider
5.1. Proposition. Given a graph G with n vertices, Procedure 3.1 takes at most 3n2 + 3n steps to find shortest paths from an initial vertex u to all other vertices. Proof. Initialization takes at most 3n steps. To find the minimum distance of an unknown vertex from the initial vertex u takes at most n steps and to update the tentative distances takes at most n steps. There are at most n iterations until all the vertices are known. Finally, it takes at most n2 steps to recover the vertices of the shortest paths. Thus, Procedure 3.1 terminates after at most 3n + n(n + n) + n2 = 3n2 + 3n steps. ☐ 5.2. Proposition. Given a graph G with n vertices, Procedure 3.2 takes at most 7n2 + 7n steps to compute the distance d(u, v) in the collateral graph G\uv and the pair graph Guv for a given pair of vertices u and v. Proof. The graph G\uv also has n vertices. By Proposition 5.1, Procedure 3.1 takes at most 3n2 + 3n steps to find shortest paths from the initial vertex u to all other vertices and at most 3n2 + 3n steps to find shortest paths from the initial vertex v to all other vertices. Then it takes at most n steps to determine the distance d(u, v). Finally, it takes at most n2 steps to run through pairs of shortest paths to find the vertices of the pair graph Guv. Thus, Procedure 3.2 terminates after at most 3n2 + 3n + 3n2 + 3n + n + n2 = 7n2 + 7n steps. ☐ 5.3. Proposition. Given a graph G with n vertices, Procedure 3.3 takes at most 7n4 + 7n3 + 2n2 steps to compute the canonical form of the sign matrix S*. Proof. By Proposition 5.2, for each pair of vertices it takes at most 7n2 + 7n steps to compute the sign suv. Since there are n2 signs, it takes at most n2(7n2 + 7n) = 7n4 + 7n3 steps to compute the sign matrix S. Then it takes at most n2 steps to compute the sign frequency vector and at most n2 steps to sort it in lexicographic order. Thus, Procedure 3.3 terminates after at most 7n4 + 7n3 + n2+ n2 = 7n4 + 7n3 + 2n2 steps. ☐ 5.4. Proposition. Given a sign matrices SA* and SB* such that the sign frequency vectors in lexicographic order ( fAi1, fAi2, ..., fAin ) = ( fBi'1, fBi'2, ..., fBi'n ), Procedure 3.4 takes at most 2n4 steps to terminate. Proof. Since there are n2 entries in SB*, it takes at most n2 steps to find a mismatch (i, j) with the corresponding entry in SA* in row major order. Then, it takes at most n2 steps along the row i to find a column j' such that interchanging rows ( j, j' ) and columns ( j, j' ) leads to a mismatch later in the row major order. This may be repeated at most n2 times until either the corresponding interchange column j' cannot be found or all the entries in the sign matrices are perfectly matched. Thus, Procedure 3.4 terminates after at most n2(n2 + n2) = 2n4 steps. ☐ 5.5. Proposition. Given graphs GA and GB with n vertices, the algorithm takes at most 2n5 + 14n4 + 14n3 + 4n2 steps to terminate. Proof. By Proposition 5.3, Procedure 3.3 takes at most 2(7n4 + 7n3 + 2n2) = 14n4 + 14n3 + 4n2 steps to compute the canonical forms of the sign matrices SA* and SB*. If the sign frequency vectors in lexicographic order ( fAi1, fAi2, ..., fAin ) = ( fBi'1, fBi'2, ..., fBi'n ) are the same, then by Proposition 5.4 the final loop for k = 1, 2, ..., n takes at most n(2n4) = 2n5 steps. Thus, the algorithm terminates after at most 2n5 + 14n4 + 14n3 + 4n2 steps. ☐ From Theorem 4.3 and Proposition 5.5, we have 5.6. Theorem. The Graph Isomorphism Problem is in P. ☐ |
6. Implementation |
||||||||||||||
We provide a demonstration program for
the Graph Isomorphism Algorithm written in C++ for Microsoft Windows.
The input consists of the files Graph A.txt and Graph B.txt
containing the adjacency matrices of graph GA and graph
GB
respectively. The program computes the sign matrices
SA*
and SB* in canonical form and determines whether GA
and GB are isomorphic or not, in polynomial-time.
We show how to write the input for the computation performed in Example
3.6:
The C++ program is shown below:
The output of the program for the input
in Figure 6.2 is shown in Example 3.6. The next section shows many more
examples of input/output files. The download package also contains a visualizer
for drawing graphs according to the output of the demonstration program.
|
||||||||||||||
7. Examples |
||||||||||||||||||||||||||||||||||||||||||||||||||||||
We now demonstrate the Graph Isomorphism
Algorithm for several examples. The first set of examples 7.1-7.5 consists
of isomorphic graphs whose vertices have been permuted randomly so that
the isomorphism is well and truly hidden. The second set of examples 7.6-7.9
consists of graphs that are not isomorphic and yet have a very similar
structure, hence deciding that they are not isomorphic in polynomial-time
demonstrates the power of the algorithm.
7.1. Example. We run the program on isomorphic Petersen [5]
graphs A and B as input:
The algorithm finds an explicit isomorphism, shown below.
7.2. Example. We run the program on isomorphic Icosahedron [6]
graphs A and B as input:
The algorithm finds an explicit isomorphism, shown below.
7.3. Example. We run the program on isomorphic Ramsey [7]
graphs A and B as input:
The algorithm finds an explicit isomorphism, shown below.
7.4. Example. We run the program on isomorphic Dodecahedron [6]
graphs A and B as input:
The algorithm finds an explicit isomorphism, shown below.
7.5. Example. We run the program on isomorphic Coxeter [8]
graphs A and B as input:
The algorithm finds an explicit isomorphism, shown below.
7.6. Example. We run the program on nonisomorphic Praust [9]
graphs A and B as input:
Graph A and Graph B have the same sign frequency vectors in lexicographic
order, so their structure is very similar. The algorithm determines that
the graphs are not isomorphic, shown below.
7.7. Example. We run the program on nonisomorphic Mathon [10]
graphs A and B as input:
Graph A and Graph B have the same sign frequency vectors in lexicographic
order, so their structure is very similar. The algorithm determines that
the graphs are not isomorphic, shown below.
7.8. Example. We run the program on nonisomorphic Weisfeiler
[11]
graphs A and B as input:
Graph A and Graph B have different sign frequency vectors in lexicographic
order. The algorithm determines that the graphs are not isomorphic, shown
below.
7.9. Example. We run the program on nonisomorphic Siberian
[12]
graphs A and B as input:
Graph A and Graph B have the same sign frequency vectors in lexicographic order, so their structure is very similar. The algorithm determines that the graphs are not isomorphic, shown below.
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||
References |
||||||||||||||||||||||||
|
|
||||||||||||
|
||||||||||||
|
||||||||||||
S.E.R.R. • EUROACADEMY • TALLINN • 2009 |
||||||||||||
|
||||||||||||
|