From b55d087742be5905541e4e1fb82424ad7b486903 Mon Sep 17 00:00:00 2001 From: Claromes Date: Fri, 22 Mar 2024 18:42:12 -0300 Subject: [PATCH] update readme --- README.md | 55 ++++++++++++++++++++------- docs/CHANGELOG.md | 94 ----------------------------------------------- docs/ROADMAP.md | 18 --------- 3 files changed, 42 insertions(+), 125 deletions(-) delete mode 100644 docs/CHANGELOG.md delete mode 100644 docs/ROADMAP.md diff --git a/README.md b/README.md index bf2676d..ea9ad37 100644 --- a/README.md +++ b/README.md @@ -1,18 +1,14 @@ # 🏛️ Wayback Tweets -[![Streamlit App](https://static.streamlit.io/badges/streamlit_badge_black_white.svg)](https://waybacktweets.streamlit.app) [![GitHub release (latest by date including pre-releases)](https://img.shields.io/github/v/release/claromes/waybacktweets?include_prereleases)](https://github.com/claromes/waybacktweets/releases) [![License](https://img.shields.io/github/license/claromes/waybacktweets)](https://github.com/claromes/waybacktweets/blob/main/LICENSE.md) +[![Streamlit App](https://static.streamlit.io/badges/streamlit_badge_black_white.svg)](https://waybacktweets.streamlit.app) [![GitHub release (latest by date including pre-releases)](https://img.shields.io/github/v/release/claromes/waybacktweets?include_prereleases)](https://github.com/claromes/waybacktweets/releases) -Tool that displays, via [Wayback CDX Server API](https://github.com/internetarchive/wayback/tree/master/wayback-cdx-server), multiple archived tweets on Wayback Machine to avoid opening each link manually. The app is a prototype written in Python with Streamlit and hosted at Streamlit Cloud with an extra 7 GiB provided free of charge by the Streamlit team (special thanks to Jessica Smith). - -Users can apply filters based on specific years and view tweets that lack the original URL. - -_Thanks Tristan Lee for the idea._ +Tool that displays, via [Wayback CDX Server API](https://github.com/internetarchive/wayback/tree/master/wayback-cdx-server), multiple archived tweets on Wayback Machine to avoid opening each link manually. The application is a prototype written in Python with the Streamlit framework and hosted on Streamlit Cloud, allowing users to apply filters based on specific years and view tweets that lack the original URL. ## Community > "We're always delighted when we see our community members create tools for open source research." — [Bellingcat](https://twitter.com/bellingcat/status/1728085974138122604) -> "#myOSINTtip Clarissa Mendes launched a new tool for accessing old tweets via archive.org called the Wayback Tweets app. For those who love to look deeper at #osint tools, it is available on GitHub and uses the Wayback CDX Server API server (which is a hidden gem for accessing archive.org data!)" - [My OSINT Training](https://www.linkedin.com/posts/my-osint-training_myosinttip-osint-activity-7148425933324963841-0Q2n/) +> "#myOSINTtip Clarissa Mendes launched a new tool for accessing old tweets via archive.org called the Wayback Tweets app. For those who love to look deeper at #osint tools, it is available on GitHub and uses the Wayback CDX Server API server (which is a hidden gem for accessing archive.org data!)" — [My OSINT Training](https://www.linkedin.com/posts/my-osint-training_myosinttip-osint-activity-7148425933324963841-0Q2n/) > "Original way to find deleted tweets." — [Henk Van Ess](https://twitter.com/henkvaness/status/1693298101765701676) @@ -26,11 +22,6 @@ _Thanks Tristan Lee for the idea._ > "A tool to quickly view tweets saved on archive.org." — [Irina_Tech_Tips Newsletter #3](https://irinatechtips.substack.com/p/irina_tech_tips-newsletter-3-2023#%C2%A7wayback-tweets) -## Docs - -- [Roadmap](docs/ROADMAP.md) -- [Changelog](docs/CHANGELOG.md) - ## Development ### Requirement @@ -49,9 +40,47 @@ $ `streamlit run app.py` Streamlit will be served at http://localhost:8501 +### Changelog + +Check out the [releases](https://github.com/claromes/waybacktweets/releases). + +### Todo (2024 planning) + +- [ ] Code review +- [ ] UX review (filter before requesting) +- [ ] Add a calendar interface (Wayback Machine timestamp) +- [ ] Prevent duplicate URLs/Review the "Unique tweets" option + - Counters + - Collapsing +- [ ] Sorting in ascending and descending order +- [ ] Download dataset +- [ ] Fix `parse_links` exception +- [ ] Update Streamlit version +- [ ] Add metadata information +- [ ] Parse MIME types: `warc/revisit`, `text/plain`, `application/http` +- [ ] Documentation: Explain the mapping of archived URLs and the parsing process +- [ ] Create CLI +- [x] Pagination + - [x] Footer + - [x] Disabled/Empty states +- [x] Feedback +- [x] Review data cache +- [x] Changelog +- [x] Define range size by user +- [x] Filter by period/datetime +- [x] Add contributing guidelines + ## Contributing -PRs are welcome. Check the roadmap or add a new feature. +We welcome contributions from everyone, whether it's through bug reporting, feature suggestions or code contributions. + +If you need help, or have ideas on improving this app, please open a new issue or reach out to support@claromes.com. + +## Acknowledgements + +- Tristan Lee (Bellingcat's Data Scientist) for the idea of the application. +- Jessica Smith (Snowflake's Marketing Specialist) and Streamlit/Snowflake teams for the additional server resources on Streamlit Cloud. +- OSINT Community for recommending the application. > [!NOTE] > If the application is down, please check the [Streamlit Cloud Status](https://www.streamlitstatus.com/). diff --git a/docs/CHANGELOG.md b/docs/CHANGELOG.md deleted file mode 100644 index 11cc44c..0000000 --- a/docs/CHANGELOG.md +++ /dev/null @@ -1,94 +0,0 @@ -# Changelog - -## [v0.4.3](https://github.com/claromes/waybacktweets/releases/tag/v0.4.3) - 2023-12-13 -- Add: - - 8-digit collapsing strategy (one capture per day) - - Messages about collapsing strategy and number of tweets displayed - -## [v0.4.2](https://github.com/claromes/waybacktweets/releases/tag/v0.4.2) - 2023-12-13 -- Add: - - Parse tweet URLs to delete `/photos`, `/likes`, `/retweets` and other sub-endpoints - - Only for "original url" - -## [v0.4.1](https://github.com/claromes/waybacktweets/releases/tag/v0.4.1) - 2023-12-13 -- Add: - - Warning message for non 200/300 status code -- Update: - - Set a fixed tweets per page (25) due the API rate limit - -## [v0.4](https://github.com/claromes/waybacktweets/releases/tag/v0.4) - 2023-12-13 -- Add: - - Parse old tweets URLs - - Picture: `twimg.com` - - Reply `username/status/"/user_reply/status/user_reply_msg_ID"` - - Allows MIME type `warc/revisit` and `unk` (**to be reviewed**) - -- Update: - - Change filter text "Only deleted tweets" to "Original URLs not available" with a help info - - Change "tweet" text to "original link" on each header - -## [v0.3](https://github.com/claromes/waybacktweets/releases/tag/v0.3) - 2023-11-13 -- Add: - - Add filter by year - - Add filter by range size - - Add spinner to load data - - Add f-string to code - -- Update: - - Streamlit version to 1.27.0 - - Style (font, BG color) - - README - - Fix MIME type display logic - - Fix pagination - - Fix error messages - - Fix JSON response - -- Delete: - Progress bar - -## [v0.2](https://github.com/claromes/waybacktweets/releases/tag/v0.2) - 2023-08-16 -- Displays tweets as text -- Displays RTs info -- Displays JSON MIME type as JSON (if tweet was deleted) -- Adds progress bar -- Adds warning to `warc/revisit` MIME type -- Improves code quality -- Screenshot tests as an alternative to `iframe` - - Keeps `iframe` - - Each website screenshot takes too long - -## [v0.1.4](https://github.com/claromes/waybacktweets/releases/tag/v0.1.4) - 2023-07-21 -- Add Pagination via CDX Server API -- Update theme/ style -- Update about -- Decrease tweets per page (30) -- Fix `cache_data` - -## [v0.1.3.2](https://github.com/claromes/waybacktweets/releases/tag/v0.1.3.2) - 2023-06-04 -- Update Streamlit version - -## [v0.1.3.1](https://github.com/claromes/waybacktweets/releases/tag/v0.1.3.1) - 2023-06-01 -- Add `cache_data` - -## [v0.1.3](https://github.com/claromes/waybacktweets/releases/tag/v0.1.3) - 2023-05-31 -- Fix TypeError 'NoneType' - -## [v0.1.2.1](https://github.com/claromes/waybacktweets/releases/tag/v0.1.2.1) - 2023-05-27 -- Fix range - -## [v0.1.2](https://github.com/claromes/waybacktweets/releases/tag/v0.1.2) - 2023-05-19 -- Increase tweets per page (100) -- Increase iframe height -- Fix "Only deleted tweets" msg - -## [v0.1.1](https://github.com/claromes/waybacktweets/releases/tag/v0.1.1) - 2023-05-19 -- Fix scroll to top - -## [v0.1.0](https://github.com/claromes/waybacktweets/releases/tag/v0.1.0) - 2023-05-19 -- Add Pagination - -## [v0.0.2](https://github.com/claromes/waybacktweets/releases/tag/v0.0.2) - 2023-05-12 -- Minor bugs - -## [v0.0.1](https://github.com/claromes/waybacktweets/releases/tag/v0.0.1) - 2023-05-11 -- Initial commit diff --git a/docs/ROADMAP.md b/docs/ROADMAP.md deleted file mode 100644 index 7f67c09..0000000 --- a/docs/ROADMAP.md +++ /dev/null @@ -1,18 +0,0 @@ -# Roadmap - -- [x] Pagination - - [x] Footer - - [x] Disabled/ Empty -- [x] Feedbacks -- [ ] Download dataset -- [x] Review data cache -- [x] Changelog -- [ ] Prevent duplicate URLs -- [x] Range size defined by user -- [ ] `parse_links` exception -- [ ] Parse MIME type `warc/revisit` -- [ ] Parse MIME type `text/plain` -- [ ] Parse MIME type `application/http` -- [x] Filter by period/datetime -- [ ] Apply filters by API endpoints -- [x] Add contributing guidelines \ No newline at end of file -- 2.34.1