COVID-19 Twitter Graph

This COVID-19 twitter graph dataset is generated based on the COVID-19-Tweet-IDs created by University of Southern California. Based on the original information of COVID-19-Tweet-IDs, we created a retweet network and a semantic network. By only extracting English tweets, we get in total 6 Million tweets and construct a retweet networks.

The details of the retweet network are shown below.

Dataset # of nodes type:tweet type:user # of edges type:retweeted from type:post Maximal Depth
Retweet 6021624 4159802 1861822 9634294 2953175 6681119 2

Each network contains two files, the retweet network and the attributes file. The retweet network contains retweet relationships of all extracted tweets. Example: "1231987151606140928 49916007 \t 1231722638751326208 729246649". Each line in retweet network is a pair seperated by "\t". The example implies the tweet 1231987151606140928 posted by 49916007 is retweeted from 1231722638751326208 posted by 729246649.

For each tweet and user, we extract attributes as follow:
tweet : (createdAt “time” ) (hashtag “text”) (mention “userid”) (retweetCount “number”)
user: (createdAt “time” ) (followerCount “number”) (followingCount “number”)
Example:
1231987148699639812 \t type \t Tweet
1231987148699639812 \t createdAt \t Mon Feb 24 17:00:00 +0000 2020
1231987148699639812 \t mention \t 968847322653306880

COVID-19 Twitter-Semantic Network

By extracting the full text of each COVId-19 realted tweet and leveraging OpenIE, we generate a semantic graph that contains COVID-related information. The graph COVID-twitter-semantic contains in total 15,853,229 triples, which includes 437,022 types of relations. For each triple, we associate the original twitter id that the triple is generated from. The maximal depth of the COVID-twitter-semantic network is 6.

Download Link

Twitter-Semantic-Network [COVID-Twitter-Semantic.zip]
Twitter-Retweet-Network [CCOVID-Twitter-Retweet.zip]
Twitter-Retweet-Attributes [COVID-Twitter-Attributes.zip]