The collected posts of 19,320 bloggers were gathered from blogger.com in August 2004. The corpus incorporates a total of 681,288 posts.
How to download files from archive.org in an automated way using widgets.
Access data from posts, threads, comments, users, and more from Reddit and subreddits.
Historical Reddit data has been collected as monthly CSV downloads.
Stanford Large Network Dataset Collection
The SNAP library collects data on large social and information networks since 2004.
The DREAM Lab supports research using social media data sources, with a focus on access and use of Twitter data. They provide consultation and instruction on a variety of tools and techniques, including Brandwatch (formerly Crimson Hexagon), and NCapture.
Monthly database backups of all Wikimedia wikis in various formats.
Access to business data, including location, photos, Yelp rating, price levels, hours of operation, and types of transactions. Also includes a Review API, which returns up to 3 review excerpts for a business.