update readme and var saved_at
authorClaromes <claromes@hey.com>
Sat, 4 Nov 2023 12:35:05 +0000 (09:35 -0300)
committerClaromes <claromes@hey.com>
Sat, 4 Nov 2023 12:35:05 +0000 (09:35 -0300)
.gitignore
README.md
app.py
assets/wbt-0.2.gif [deleted file]
docs/ROADMAP.md

index eba74f4cd2e2acfb1470300d69b58d8cff458c68..0cafc1cde1985c69113a5b2ae7ba42299aa7ebc2 100644 (file)
@@ -1 +1 @@
-venv/
\ No newline at end of file
+.venv/
\ No newline at end of file
index 5002568bfe6e02c0781d6c7f134dbe75e4f60c41..3d2432a30f3f1f9342e81a0a40adef2bb150b6aa 100644 (file)
--- a/README.md
+++ b/README.md
@@ -1,26 +1,17 @@
-> [!IMPORTANT]
-> If the application is down, please check the [Streamlit Cloud Status](https://www.streamlitstatus.com/).
-
-<br>
-
 # 🏛️ Wayback Tweets
 
 [![Streamlit App](https://static.streamlit.io/badges/streamlit_badge_black_white.svg)](https://waybacktweets.streamlit.app) [![GitHub release (latest by date including pre-releases)](https://img.shields.io/github/v/release/claromes/waybacktweets?include_prereleases)](https://github.com/claromes/waybacktweets/releases) [![License](https://img.shields.io/github/license/claromes/waybacktweets)](https://github.com/claromes/waybacktweets/blob/main/LICENSE.md)
 
 
-Tool that displays multiple archived tweets on Wayback Machine to avoid opening each link manually. Via [Wayback CDX Server API](https://github.com/internetarchive/wayback/tree/master/wayback-cdx-server).
-
-<p align="center">
-    <img src="assets/wbt-0.2.gif" width="500">
-</p>
+Tool that displays, via [Wayback CDX Server API](https://github.com/internetarchive/wayback/tree/master/wayback-cdx-server), multiple archived tweets on Wayback Machine to avoid opening each link manually. The app is a prototype written in Python with Streamlit and hosted at Streamlit Cloud.
 
 *Thanks Tristan Lee for the idea.*
 
 ## Features
 
 - Tweets per page defined by user
-- Filtering by saved date
-- Filtering by deleted tweets
+- Filter by years
+- Filter by only deleted tweets
 
 ## Development
 
@@ -43,13 +34,35 @@ Streamlit will be served at http://localhost:8501
 ## Bugs
 
 - [ ] "web.archive.org took too long to respond."
+- [ ] Pagination: set session variable on first click
+- [ ] Timeout error
 - [x] `only_deleted` checkbox selected for handles without deleted tweets
-- [x] Pagination: set session variable on first click
 - [x] Pagination: scroll to top
 - [x] `IndexError`
-- [ ] Timeout error
 
 ## Docs
 
 - [Roadmap](docs/ROADMAP.md)
 - [Changelog](docs/CHANGELOG.md)
+
+## Testimonials
+
+>"Original way to find deleted tweets." — [Henk Van Ess](https://twitter.com/henkvaness/status/1693298101765701676)
+
+>"This is an excellent tool to use now that most Twitter API-based tools have gone down with changes to the pricing structure over at X." — [The OSINT Newsletter - Issue #22](https://osintnewsletter.com/p/22#%C2%A7osint-community)
+
+>"One of the keys to using the Wayback Machine effectively is knowing what it can and can’t archive. It can, and has, archived many, many Twitter accounts... Utilize fun tools such as Wayback Tweets to do so more effectively." — [Ari Ben Am](https://memeticwarfareweekly.substack.com/p/mww-paradise-by-the-telegram-dashboard)
+
+>"Want to see archived tweets on Wayback Machine in bulk? You can use Wayback Tweets." — [Daily OSINT](https://twitter.com/DailyOsint/status/1695065018662855102)
+
+>"Untuk mempermudah penelusuran arsip, gunakan Wayback Tweets." — [GIJN Indonesia](https://twitter.com/gijnIndonesia/status/1685912219408805888)
+
+>"A tool to quickly view tweets saved on archive.org." — [Irina_Tech_Tips Newsletter #3](https://irinatechtips.substack.com/p/irina_tech_tips-newsletter-3-2023#%C2%A7wayback-tweets)
+
+
+## Contributing
+
+PRs are welcome. Please, check the bug topic above, the [roadmap](docs/ROADMAP.md) or add a new feature.
+
+> [!NOTE]
+> If the application is down, please check the [Streamlit Cloud Status](https://www.streamlitstatus.com/).
\ No newline at end of file
diff --git a/app.py b/app.py
index 2347fd6822bcc1a7d9d506eb691674bc00b08480..a28477449a967dba34639f4280cc7907701bdf36 100644 (file)
--- a/app.py
+++ b/app.py
@@ -18,13 +18,13 @@ st.set_page_config(
 
         [![GitHub release (latest by date including pre-releases)](https://img.shields.io/github/v/release/claromes/waybacktweets?include_prereleases)](https://github.com/claromes/waybacktweets/releases) [![License](https://img.shields.io/github/license/claromes/waybacktweets)](https://github.com/claromes/waybacktweets/blob/main/LICENSE.md)
 
-        Tool that displays multiple archived tweets on Wayback Machine to avoid opening each link manually. Via Wayback CDX Server API.
+        Tool that displays, via Wayback CDX Server API, multiple archived tweets on Wayback Machine to avoid opening each link manually.
 
         - Tweets per page defined by user
-        - Filtering by saved date
-        - Filtering by deleted tweets
+        - Filter by years
+        - Filter by only deleted tweets
 
-        This tool is experimental, please feel free to send your [feedbacks](https://github.com/claromes/waybacktweets/issues).
+        This tool is a prototype, please feel free to send your [feedbacks](https://github.com/claromes/waybacktweets/issues). Created and maintained by [@claromes](https://github.com/claromes).
 
         -------
         ''',
@@ -42,6 +42,9 @@ hide_streamlit_style = '''
         background-color: #dddddd;
         border-radius: 0.5rem;
     }
+    div[data-testid="InputInstructions"] {
+        visibility: hidden;
+    }
 </style>
 '''
 
@@ -71,8 +74,8 @@ if 'update_component' not in st.session_state:
 if 'offset' not in st.session_state:
     st.session_state.offset = 0
 
-if 'date_created' not in st.session_state:
-    st.session_state.date_created = (2006, year)
+if 'saved_at' not in st.session_state:
+    st.session_state.saved_at = (2006, year)
 
 if 'count' not in st.session_state:
     st.session_state.count = False
@@ -134,8 +137,8 @@ def embed(tweet):
         st.error('Connection to publish.twitter.com timed out.')
 
 @st.cache_data(ttl=1800, show_spinner=False)
-def tweets_count(handle, date_created):
-    url = f'https://web.archive.org/cdx/search/cdx?url=https://twitter.com/{handle}/status/*&output=json&from={date_created[0]}&to={date_created[1]}'
+def tweets_count(handle, saved_at):
+    url = f'https://web.archive.org/cdx/search/cdx?url=https://twitter.com/{handle}/status/*&output=json&from={saved_at[0]}&to={saved_at[1]}'
     try:
         response = requests.get(url)
 
@@ -148,14 +151,15 @@ def tweets_count(handle, date_created):
                 return 0
     except requests.exceptions.Timeout:
         st.error('Connection to web.archive.org timed out.')
+        st.stop()
 
 @st.cache_data(ttl=1800, show_spinner=False)
-def query_api(handle, limit, offset, date_created):
+def query_api(handle, limit, offset, saved_at):
     if not handle:
         st.warning('username, please!')
         st.stop()
 
-    url = f'https://web.archive.org/cdx/search/cdx?url=https://twitter.com/{handle}/status/*&output=json&limit={limit}&offset={offset}&from={date_created[0]}&to={date_created[1]}'
+    url = f'https://web.archive.org/cdx/search/cdx?url=https://twitter.com/{handle}/status/*&output=json&limit={limit}&offset={offset}&from={saved_at[0]}&to={saved_at[1]}'
     try:
         response = requests.get(url)
         response.raise_for_status()
@@ -185,7 +189,7 @@ def parse_links(links):
     return parsed_links, tweet_links, parsed_mimetype, timestamp
 
 def attr(i):
-    st.markdown(f'{i+1 + st.session_state.offset}. **Wayback Machine:** [link]({link}) · **MIME Type:** {mimetype[i]} · **Saved at:** {datetime.datetime.strptime(timestamp[i], "%Y%m%d%H%M%S")} · **Tweet:** [link]({tweet_links[i]})')
+    st.markdown(f'{i+1 + st.session_state.offset}. [**web.archive.org**]({link}) · **MIME Type:** {mimetype[i]} · **Saved at:** {datetime.datetime.strptime(timestamp[i], "%Y%m%d%H%M%S")} · [**tweet**]({tweet_links[i]})')
 
 # UI
 st.title('Wayback Tweets [![Star](https://img.shields.io/github/stars/claromes/waybacktweets?style=social)](https://github.com/claromes/waybacktweets)', anchor=False)
@@ -193,16 +197,14 @@ st.write('Display multiple archived tweets on Wayback Machine and avoid opening
 
 handle = st.text_input('Username', placeholder='jack')
 
-st.session_state.date_created = st.slider('Tweets saved between', 2006, year, (2006, year))
+st.session_state.saved_at = st.slider('Tweets saved between', 2006, year, (2006, year))
 
-tweets_per_page = st.slider('Tweets per page', 25, 1000, 25, 25)
+tweets_per_page = st.slider('Tweets per page', 25, 250, 25, 25)
 
 only_deleted = st.checkbox('Only deleted tweets')
 
 query = st.button('Query', type='primary', use_container_width=True)
 
-bar = st.empty()
-
 if query or st.session_state.count:
     if handle != st.session_state.current_handle:
         st.session_state.offset = 0
@@ -210,17 +212,17 @@ if query or st.session_state.count:
     if query != st.session_state.current_query:
         st.session_state.offset = 0
 
-    st.session_state.count = tweets_count(handle, st.session_state.date_created)
+    st.session_state.count = tweets_count(handle, st.session_state.saved_at)
 
     st.write(f'**{st.session_state.count} URLs have been captured**')
 
-    if tweets_per_page > st.session_state.count:
-        tweets_per_page = st.session_state.count
+    if st.session_state.count:
+        if tweets_per_page > st.session_state.count:
+            tweets_per_page = st.session_state.count
 
     try:
-        bar.progress(0)
         progress = st.empty()
-        links = query_api(handle, tweets_per_page, st.session_state.offset, st.session_state.date_created)
+        links = query_api(handle, tweets_per_page, st.session_state.offset, st.session_state.saved_at)
 
         parse = parse_links(links)
         parsed_links = parse[0]
@@ -290,56 +292,54 @@ if query or st.session_state.count:
             start_index = st.session_state.offset
             end_index = min(st.session_state.count, start_index + tweets_per_page)
 
-            for i in range(tweets_per_page):
-                try:
-                    bar.progress((i*3) + 13)
+            with st.spinner('Fetching...'):
+                for i in range(tweets_per_page):
+                    try:
+                        link = parsed_links[i]
+                        tweet = embed(tweet_links[i])
 
-                    link = parsed_links[i]
-                    tweet = embed(tweet_links[i])
+                        if not only_deleted:
+                            attr(i)
 
-                    if not only_deleted:
-                        attr(i)
+                            if tweet:
+                                status_code = tweet[0]
+                                tweet_content = tweet[1]
+                                user_info = tweet[2]
+                                is_RT = tweet[3]
 
-                        if tweet:
-                            status_code = tweet[0]
-                            tweet_content = tweet[1]
-                            user_info = tweet[2]
-                            is_RT = tweet[3]
+                                if mimetype[i] == 'application/json':
+                                    display_tweet()
 
-                            if mimetype[i] == 'application/json':
-                                display_tweet()
+                                if mimetype[i] == 'text/html':
+                                    display_tweet()
+                            elif not tweet:
+                                display_not_tweet()
 
-                            if mimetype[i] == 'text/html':
-                                display_tweet()
-                        elif not tweet:
-                            display_not_tweet()
+                        if only_deleted:
+                            if not tweet:
+                                return_none_count += 1
+                                attr(i)
 
-                    if only_deleted:
-                        if not tweet:
-                            return_none_count += 1
-                            attr(i)
+                                display_not_tweet()
 
-                            display_not_tweet()
+                            progress.write(f'{return_none_count} URLs have been captured in the range {start_index}-{end_index}')
 
-                        progress.write(f'{return_none_count} URLs have been captured in the range {start_index}-{end_index}')
+                        if start_index <= 0:
+                            st.session_state.prev_disabled = True
+                        else:
+                            st.session_state.prev_disabled = False
 
-                    if start_index <= 0:
-                        st.session_state.prev_disabled = True
-                    else:
-                        st.session_state.prev_disabled = False
+                        if i + 1 == st.session_state.count:
+                            st.session_state.next_disabled = True
+                        else:
+                            st.session_state.next_disabled = False
+                    except IndexError:
+                        if start_index <= 0:
+                            st.session_state.prev_disabled = True
+                        else:
+                            st.session_state.prev_disabled = False
 
-                    if i + 1 == st.session_state.count:
                         st.session_state.next_disabled = True
-                    else:
-                        st.session_state.next_disabled = False
-                # TODO
-                except IndexError:
-                    if start_index <= 0:
-                        st.session_state.prev_disabled = True
-                    else:
-                        st.session_state.prev_disabled = False
-
-                    st.session_state.next_disabled = True
 
             prev, _ , next = st.columns([3, 4, 3])
 
@@ -350,7 +350,7 @@ if query or st.session_state.count:
             st.error('Unable to query the Wayback Machine API.')
     except TypeError as e:
         st.error(f'''
-        {f}. Refresh this page and try again.
+        {e}. Refresh this page and try again.
 
         If the problem persists [open an issue](https://github.com/claromes/waybacktweets/issues).
         ''')
diff --git a/assets/wbt-0.2.gif b/assets/wbt-0.2.gif
deleted file mode 100644 (file)
index 1cf478c..0000000
Binary files a/assets/wbt-0.2.gif and /dev/null differ
index e6ae500cac89fc85cc55f459d381df6ef8bcb79f..04a30e263ffc8ed6f3b721c7a03a5ed7d735ab0b 100644 (file)
@@ -10,9 +10,8 @@
 - [ ] Prevent duplicate URLs
 - [x] Range size defined by user
 - [ ] `parse_links` exception
-- [ ] Add current page to page title
 - [ ] Parse MIME type `warc/revisit`
 - [ ] Parse MIME type `text/plain`
 - [x] Filter by period/datetime
 - [ ] Apply filters by API endpoints
-- [ ] Add contributing guidelines
\ No newline at end of file
+- [x] Add contributing guidelines
\ No newline at end of file