COVID-19 Twitter Graph

This COVID-19 twitter graph dataset is generated based on the COVID-19-Tweet-IDs created by University of Southern California. Based on the original information of COVID-19-Tweet-IDs, we created a retweet network and a semantic network. By only extracting English tweets, we get in total 6 Million tweets and construct a retweet networks.

The details of the retweet network are shown below.

Dataset	# of nodes	type:tweet	type:user	# of edges	type:retweeted from	type:post	Maximal Depth
Retweet	6021624	4159802	1861822	9634294	2953175	6681119	2

Each network contains two files, the retweet network and the attributes file. The retweet network contains retweet relationships of all extracted tweets. Example: "1231987151606140928 49916007 \t 1231722638751326208 729246649". Each line in retweet network is a pair seperated by "\t". The example implies the tweet 1231987151606140928 posted by 49916007 is retweeted from 1231722638751326208 posted by 729246649.

For each tweet and user, we extract attributes as follow:
tweet : (createdAt “time” ) (hashtag “text”) (mention “userid”) (retweetCount “number”)
user: (createdAt “time” ) (followerCount “number”) (followingCount “number”)
Example:
1231987148699639812 \t type \t Tweet
1231987148699639812 \t createdAt \t Mon Feb 24 17:00:00 +0000 2020
1231987148699639812 \t mention \t 968847322653306880

COVID-19 Twitter-Semantic Network

By extracting the full text of each COVId-19 realted tweet and leveraging OpenIE, we generate a semantic graph that contains COVID-related information. The graph COVID-twitter-semantic contains in total 15,853,229 triples, which includes 437,022 types of relations. For each triple, we associate the original twitter id that the triple is generated from. The maximal depth of the COVID-twitter-semantic network is 6.

Download Link

Twitter-Semantic-Network [COVID-Twitter-Semantic.zip]
Twitter-Retweet-Network [CCOVID-Twitter-Retweet.zip]
Twitter-Retweet-Attributes [COVID-Twitter-Attributes.zip]