Understanding disease spreading on networks

In this notebook we will explore disease spreading on networks. Let us work with the really wonderful Network X Python package. Please install it on your python notebook using the command pip install networkx.

Our goal will be to first think about the appearance of giant components and then to think about disease propagation. This is a really fun place to learn about generating function.

In [ ]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
import networkx as nx
from operator import itemgetter
from datetime import datetime, timedelta
from scipy.special import gamma,gammainc,gammaincc,erf
from scipy.optimize import minimize, root_scalar
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
from matplotlib.backends import backend_pdf as bpdf

#from covid_functions import *
%matplotlib inline

Looking at the largest component with simulations

In class, we have used two different methods to think about the phase transition and the appearance of a giant component in random Erdos-Reyni graphs. Consider a random graph with $N$ edges and average degree $\langle k \rangle$.

  • What is the probability p of having an edge between any two nodes?





Play with the code and visualization below. In it we generate random graphs and then examine the largest component.Vary the c and see what happens

  • Make a sketch of the size of giant component as a function of $\langle k \rangle$?How does this function look and why are physicists so interested in this?





In [ ]:
#create a graph with with network x python package
import warnings; warnings.simplefilter('ignore')



    #'Scale-free_gamma4'G=nx.expected_degree_graph(nx.utils.powerlaw_sequence(N, 2.5), selfloops=False)
    #'Scale-free_gamma4':nx.expected_degree_graph(nx.utils.powerlaw_sequence(N, 4), selfloops=False),
    #'Erdos-Renyi_k5':nx.gnp_random_graph(N,5/N),
    #'Erdos-Renyi_k10':nx.gnp_random_graph(N,10/N)
        
        
# Create Erdos-Reyni graph with N vertices and M edges
N=1000
k_avg=0.5
G=nx.gnp_random_graph(N,k_avg/N)
#Visualize draw and show graph

pos = nx.spring_layout(G)
nx.draw_spring(G, )
plt.show()

components = nx.connected_components(G)
print("The number of connected components:",nx.number_connected_components(G))
largest_component = max(components, key=len)

subgraph = G.subgraph(largest_component)
diameter = nx.diameter(subgraph)
fraction = subgraph.number_of_nodes()/N

print("Fraction of nodes in largest component", fraction)
print("Network diameter of largest component:", diameter)


#pos = nx.spring_layout(G.subgraph(largest_component))
nx.draw_spring(subgraph)
plt.show()


#Make Graph

Understanding the size of the graph

-Use the code and make a plot of the size of the largest component as a function of $\langle k \rangle$? Does this agree with your sketch above?





-How do you expect this graph to change as a function of $\$? Test your hypothesis below?





In [ ]:
N=100
k_avg_array=[0.25,0.5,0.75,1,1.25,2]
S=[]
for k_avg in k_avg_array:
    G=nx.gnp_random_graph(N,k_avg/N)
    components = nx.connected_components(G)
    print("The number of connected components for "+str(k_avg)+" is:",nx.number_connected_components(G))
    largest_component = max(components, key=len)
    subgraph = G.subgraph(largest_component)
    diameter = nx.diameter(subgraph)
    fraction = subgraph.number_of_nodes()/N
    S.append(fraction)
    print("Fraction of nodes in largest component", fraction)
    print("Network diameter of largest component:", diameter)





plt.plot(k_avg_array,S , 'o-')
plt.xlabel("k_avg")
plt.ylabel("Fraction of nodes in giant component")
plt.show()

Disease spreading on networks

We will now simulate disease spreading on networks to try to understand the Corona virus.

In [ ]:
def simulate_pandemic_Gaussian(G,TG,sigG,N_0=5,p=1,tmax=60):
    
    #Sample waiting times
    N = G.number_of_nodes()
    graph_waiting_times=np.abs(np.random.normal(TG, sigG, N))

    #Create list of what nodes are infected and absolute time at
    #which node infects neighbor node infects all its neighbors
    data=[]
    #This list is of people who have been infected
    infected=[]
    #This is the time at which they will infect their neighbors
    infection_times=[]
    time_infected=[]


    #Draw node to infect
    t=0
    tmax_running = 0
    generation=0
    infected= list(np.random.randint(N,size=N_0))
    infection_times=list(graph_waiting_times[infected])
    infected, infection_times=[list(x) for x in zip(*sorted(zip(infected, infection_times), key=itemgetter(1)))]
    Rtild = []
    t_in = []

    while generation < max(len(infected),1):
        if generation %1000 ==0:
            print('Generation '+str(generation))
        current_node=infected[generation]
        t=infection_times[generation]
        
        #Get neighbors of current node that will infect all
        neighbors=G.neighbors(current_node)
        
        #Find uninfected neighbors
        uninfected_neighbors= list(set(neighbors)-set(infected))
        if len(uninfected_neighbors):
            Rtild.append(len(uninfected_neighbors))
            t_in.append(t)
            #Determine which uninfected neighbors to infect
            infected_neighbors=list(np.array(uninfected_neighbors)[np.random.uniform(size=len(uninfected_neighbors))>1-p])
            #Determine time when infections occur
            neighbor_infection_times=graph_waiting_times[infected_neighbors]+t
            #Update list of infected nodes
            infected=list(infected)+list(infected_neighbors)
            #Update list of infection times
            infection_times=list(infection_times)+list(neighbor_infection_times)

            #Repackage
            infected, infection_times=[list(x) for x in zip(*sorted(zip(infected, infection_times), key=itemgetter(1)))]
            time_infected=list(time_infected)+len(uninfected_neighbors)*[t]
        
        generation=generation+1
    
    #Make time axis
    t=np.arange(int(tmax))

    #Extract cumulative number of cases at each time point
    infection_times_array=np.tile(infection_times,(len(t),1))
    t_array=np.tile(t,(len(infection_times),1)).T
    cum_cases=np.sum(infection_times_array < t_array, axis=1)

    return t, cum_cases, t_in, Rtild


def simulate_pandemic_Exponential(G,TG,N_0=5,p=1,tmax=60):
    
    #Sample waiting times
    N = G.number_of_nodes()
    graph_waiting_times=np.random.normal(1/TG, N)

    #Create list of what nodes are infected and absolute time at
    #which node infects neighbor node infects all its neighbors
    data=[]
    #This list is of people who have been infected
    infected=[]
    #This is the time at which they will infect their neighbors
    infection_times=[]
    time_infected=[]


    #Draw node to infect
    t=0
    tmax_running = 0
    generation=0
    infected= list(np.random.randint(N,size=N_0))
    infection_times=list(graph_waiting_times[infected])
    infected, infection_times=[list(x) for x in zip(*sorted(zip(infected, infection_times), key=itemgetter(1)))]
    Rtild = []
    t_in = []

    while generation < max(len(infected),1):
        if generation %1000 ==0:
            print('Generation '+str(generation))
        current_node=infected[generation]
        t=infection_times[generation]
        
        #Get neighbors of current node that will infect all
        neighbors=G.neighbors(current_node)
        
        #Find uninfected neighbors
        uninfected_neighbors= list(set(neighbors)-set(infected))
        if len(uninfected_neighbors):
            Rtild.append(len(uninfected_neighbors))
            t_in.append(t)
            #Determine which uninfected neighbors to infect
            infected_neighbors=list(np.array(uninfected_neighbors)[np.random.uniform(size=len(uninfected_neighbors))>1-p])
            #Determine time when infections occur
            neighbor_infection_times=graph_waiting_times[infected_neighbors]+t
            #Update list of infected nodes
            infected=list(infected)+list(infected_neighbors)
            #Update list of infection times
            infection_times=list(infection_times)+list(neighbor_infection_times)

        #Repackage
        infected, infection_times=[list(x) for x in zip(*sorted(zip(infected, infection_times), key=itemgetter(1)))]
        time_infected=list(time_infected)+len(uninfected_neighbors)*[t]
        
        generation=generation+1
    
    #Make time axis
    t=np.arange(int(tmax))

    #Extract cumulative number of cases at each time point
    infection_times_array=np.tile(infection_times,(len(t),1))
    t_array=np.tile(t,(len(infection_times),1)).T
    cum_cases=np.sum(infection_times_array < t_array, axis=1)

    return t, cum_cases, t_in, Rtild
In [ ]:
N=5000 
graphs = {'Scale-free_gamma2.5':nx.expected_degree_graph(nx.utils.powerlaw_sequence(N, 2.5), selfloops=False),
          #'Scale-free_gamma4':nx.expected_degree_graph(nx.utils.powerlaw_sequence(N, 4), selfloops=False),
          'Erdos-Renyi_k5':nx.gnp_random_graph(N,5/N),
          'Erdos-Renyi_k10':nx.gnp_random_graph(N,10/N)}


tmax = 1000
TG = 10
time_axis = [(datetime.today()-timedelta(days=tmax-k)).isoformat()[:10] for k in range(tmax)]
sim_data = pd.DataFrame(index=time_axis)
for item in graphs.keys():
    print(item)
    for sigmaG in [5*TG]:
        print(sigmaG)
        t,cum_cases,t_in,Rtild = simulate_pandemic_Gaussian(graphs[item],TG,sigmaG,N_0=5,p=1,tmax=tmax)
        plt.plot(t,cum_cases)
        plt.show()
        plt.semilogy(t_in,np.asarray(Rtild)+1,'o',markersize=3)
        plt.show()
        sim_data[item+'_TG'+str(TG)+'_sigmaG'+str(sigmaG)[:3]] = cum_cases
sim_data.to_csv('simulation_pankaj.csv')