This COVID-19 twitter graph dataset is generated based on the COVID-19-Tweet-IDs created by University of Southern California. Based on the original information of COVID-19-Tweet-IDs, we created a retweet network and a semantic network. By only extracting English tweets, we get in total 6 Million tweets and construct a retweet networks.
The details of the retweet network are shown below.
|Dataset||# of nodes||type:tweet||type:user||# of edges||type:retweeted from||type:post||Maximal Depth|
Each network contains two files, the retweet network and the attributes file. The retweet network contains retweet relationships of all extracted tweets. Example: "1231987151606140928 49916007 \t 1231722638751326208 729246649". Each line in retweet network is a pair seperated by "\t". The example implies the tweet 1231987151606140928 posted by 49916007 is retweeted from 1231722638751326208 posted by 729246649.
For each tweet and user, we extract attributes as follow:
tweet : (createdAt “time” ) (hashtag “text”) (mention “userid”) (retweetCount “number”)
user: (createdAt “time” ) (followerCount “number”) (followingCount “number”)
1231987148699639812 \t type \t Tweet
1231987148699639812 \t createdAt \t Mon Feb 24 17:00:00 +0000 2020
1231987148699639812 \t mention \t 968847322653306880
By extracting the full text of each COVId-19 realted tweet and leveraging OpenIE,
we generate a semantic graph that contains COVID-related information. The graph
COVID-twitter-semantic contains in total 15,853,229 triples, which includes
437,022 types of relations. For each triple, we associate the original twitter id that the triple is
generated from. The maximal depth of the COVID-twitter-semantic network is 6.