-# ποΈ Wayback Tweets
+# Wayback Tweets
-[](https://waybacktweets.streamlit.app) [](https://github.com/claromes/waybacktweets/releases)
+[](https://pypi.org/project/waybacktweets)
-Tool that displays, via [Wayback CDX Server API](https://github.com/internetarchive/wayback/tree/master/wayback-cdx-server), multiple archived tweets on Wayback Machine to avoid opening each link manually. The application is a prototype written in Python with the Streamlit framework and hosted on Streamlit Cloud, allowing users to apply filters based on specific years and view tweets that lack the original URL.
+Retrieves archived tweets CDX data from the Wayback Machine, performs necessary parsing, and saves the data.
-## Community
+## Installation
-> "We're always delighted when we see our community members create tools for open source research." β [Bellingcat](https://twitter.com/bellingcat/status/1728085974138122604)
+```shell
+pip install waybacktweets
+```
-> "#myOSINTtip Clarissa Mendes launched a new tool for accessing old tweets via archive.org called the Wayback Tweets app. For those who love to look deeper at #osint tools, it is available on GitHub and uses the Wayback CDX Server API server (which is a hidden gem for accessing archive.org data!)" β [My OSINT Training](https://www.linkedin.com/posts/my-osint-training_myosinttip-osint-activity-7148425933324963841-0Q2n/)
+## Quickstart
-> "Original way to find deleted tweets." β [Henk Van Ess](https://twitter.com/henkvaness/status/1693298101765701676)
+### Using Wayback Tweets as a standalone command line tool
-> "This is an excellent tool to use now that most Twitter API-based tools have gone down with changes to the pricing structure over at X." β [The OSINT Newsletter - Issue #22](https://osintnewsletter.com/p/22#%C2%A7osint-community)
+waybacktweets [OPTIONS] USERNAME
-> "One of the keys to using the Wayback Machine effectively is knowing what it can and canβt archive. It can, and has, archived many, many Twitter accounts... Utilize fun tools such as Wayback Tweets to do so more effectively." β [Ari Ben Am](https://memeticwarfareweekly.substack.com/p/mww-paradise-by-the-telegram-dashboard)
+```shell
+waybacktweets --from 20150101 --to 20191231 --limit 250 jack
+```
-> "Want to see archived tweets on Wayback Machine in bulk? You can use Wayback Tweets." β [Daily OSINT](https://twitter.com/DailyOsint/status/1695065018662855102)
+### Using Wayback Tweets as a Python Module
-> "Untuk mempermudah penelusuran arsip, gunakan Wayback Tweets." β [GIJN Indonesia](https://twitter.com/gijnIndonesia/status/1685912219408805888)
+```python
+from waybacktweets import WaybackTweets
+from waybacktweets.utils import parse_date
-> "A tool to quickly view tweets saved on archive.org." β [Irina_Tech_Tips Newsletter #3](https://irinatechtips.substack.com/p/irina_tech_tips-newsletter-3-2023#%C2%A7wayback-tweets)
+username = "jack"
+collapse = "urlkey"
+timestamp_from = parse_date("20150101")
+timestamp_to = parse_date("20191231")
+limit = 250
+offset = 0
-## Development
+api = WaybackTweets(username, collapse, timestamp_from, timestamp_to, limit, offset)
-### Requirement
+archived_tweets = api.get()
+```
-- Python 3.8+
+### Using Wayback Tweets as a Web App
-### Installation
+[Access the application](https://waybacktweets.streamlit.app), a prototype written in Python with the Streamlit framework and hosted on Streamlit Cloud.
-$ `git clone git@github.com:claromes/waybacktweets.git`
+## Documentation
-$ `cd waybacktweets`
-
-$ `pip install -r requirements.txt`
-
-$ `streamlit run app.py`
-
-Streamlit will be served at http://localhost:8501
-
-### Changelog
-
-Check out the [releases](https://github.com/claromes/waybacktweets/releases).
-
-### Todo (2024 planning)
-
-- [ ] Code review
-- [ ] UX review (filter before requesting)
-- [ ] Add a calendar interface (Wayback Machine timestamp)
-- [ ] Prevent duplicate URLs/Review the "Unique tweets" option
- - Counters
- - Collapsing
-- [ ] Sorting in ascending and descending order
-- [ ] Download dataset
-- [ ] Fix `parse_links` exception
-- [ ] Update Streamlit version
-- [ ] Add metadata information
-- [ ] Parse MIME types: `warc/revisit`, `text/plain`, `application/http`
-- [ ] Documentation: Explain the mapping of archived URLs and the parsing process
-- [ ] Create CLI
-- [x] Pagination
- - [x] Footer
- - [x] Disabled/Empty states
-- [x] Feedback
-- [x] Review data cache
-- [x] Changelog
-- [x] Define range size by user
-- [x] Filter by period/datetime
-- [x] Add contributing guidelines
-
-## Contributing
-
-We welcome contributions from everyone, whether it's through bug reporting, feature suggestions or code contributions.
-
-If you need help, or have ideas on improving this app, please open a new issue or reach out to support@claromes.com.
+- [Wayback Tweets documentation]()
+- [Wayback CDX Server API - Beta documentation](https://archive.org/developers/wayback-cdx-server.html)
## Acknowledgements
- Tristan Lee (Bellingcat's Data Scientist) for the idea of the application.
- Jessica Smith (Snowflake's Marketing Specialist) and Streamlit/Snowflake teams for the additional server resources on Streamlit Cloud.
- OSINT Community for recommending the application.
-
-> [!NOTE]
-> If the application is down, please check the [Streamlit Cloud Status](https://www.streamlitstatus.com/).
-Quick Start
+Quickstart
================
CLI
Using Wayback Tweets as a standalone command line tool
-wbt [OPTIONS] USERNAME
+waybacktweets [OPTIONS] USERNAME
-$ ``wbt --from 2015-01-01 --to 2019-12-31 --limit 250 jack``
+.. code-block:: shell
+
+ waybacktweets --from 20150101 --to 20191231 --limit 250 jack`
Module
username = "jack"
collapse = "urlkey"
- timestamp_from = parse_date("2015-01-01")
- timestamp_to = parse_date("2019-12-31")
+ timestamp_from = parse_date("20150101")
+ timestamp_to = parse_date("20191231")
limit = 250
offset = 0
api = WaybackTweets(username, collapse, timestamp_from, timestamp_to, limit, offset)
+
archived_tweets = api.get()
+
+Web App
+-------------
+
+Using Wayback Tweets as a Streamlit Web App
+
+`Access the application <https://waybacktweets.streamlit.app>`_, a prototype written in Python with the Streamlit framework and hosted on Streamlit Cloud.
+++ /dev/null
-site_name: Wayback Tweets
-repo_url: https://github.com/claromes/waybacktweets/
-repo_name: claromes/waybacktweets
-edit_uri: tree/main/docs
-site_author: Claromes
-site_description: Retrieves archived tweets' CDX data from the Wayback Machine
-copyright: Copyright © 2023 - 2024 Claromes Β· Icons by TheDoodleLibrary
-
-theme:
- name: material
- logo: assets/parthenon.svg
- favicon: assets/parthenon.svg
- icon:
- repo: thedoodlelibrary/octopus
- search: thedoodlelibrary/magnifyingglass
- language: en
- palette:
- - scheme: slate
- primary: red
- accent: lime
- custom_dir: overrides
-
-extra:
- social:
- - icon: thedoodlelibrary/mammoth
- link: https://ruby.social/@claromes
- name: Claromes on Mastodon
- - icon: thedoodlelibrary/butterfly
- link: https://bsky.app/profile/claromes.com
- name: Claromes on Bluesky
- - icon: thedoodlelibrary/email
- link: mailto:support@claromes.com
- name: Support
-
-nav:
- - Home: index.md
- - Getting started:
- - License: license.md
- - Changelog: https://github.com/claromes/waybacktweets/releases/" target="_blank
-
-markdown_extensions:
- - attr_list
- - pymdownx.emoji:
- emoji_index: !!python/name:material.extensions.emoji.twemoji
- emoji_generator: !!python/name:material.extensions.emoji.to_svg
- options:
- custom_icons:
- - overrides/.icons
- - pymdownx.superfences:
- custom_fences:
- - name: mermaid
- class: mermaid
- format: !!python/name:pymdownx.superfences.fence_code_format
-
-extra_css:
- - stylesheets/extra.css