From: Claromes <claromes@hey.com>
Date: Sat, 4 Nov 2023 12:35:05 +0000 (-0300)
Subject: update readme and var saved_at
X-Git-Url: https://git.claromes.com/?a=commitdiff_plain;h=482b805fe6891d61d2c36efdf1a6d942a8275b0e;p=waybacktweets.git

update readme and var saved_at
---

diff --git a/.gitignore b/.gitignore
index eba74f4..0cafc1c 100644
--- a/.gitignore
+++ b/.gitignore
@@ -1 +1 @@
-venv/
\ No newline at end of file
+.venv/
\ No newline at end of file
diff --git a/README.md b/README.md
index 5002568..3d2432a 100644
--- a/README.md
+++ b/README.md
@@ -1,26 +1,17 @@
-> [!IMPORTANT]
-> If the application is down, please check the [Streamlit Cloud Status](https://www.streamlitstatus.com/).
-
-<br>
-
 # ðï¸ Wayback Tweets
 
 [![Streamlit App](https://static.streamlit.io/badges/streamlit_badge_black_white.svg)](https://waybacktweets.streamlit.app) [![GitHub release (latest by date including pre-releases)](https://img.shields.io/github/v/release/claromes/waybacktweets?include_prereleases)](https://github.com/claromes/waybacktweets/releases) [![License](https://img.shields.io/github/license/claromes/waybacktweets)](https://github.com/claromes/waybacktweets/blob/main/LICENSE.md)
 
 
-Tool that displays multiple archived tweets on Wayback Machine to avoid opening each link manually. Via [Wayback CDX Server API](https://github.com/internetarchive/wayback/tree/master/wayback-cdx-server).
-
-<p align="center">
-    <img src="assets/wbt-0.2.gif" width="500">
-</p>
+Tool that displays, via [Wayback CDX Server API](https://github.com/internetarchive/wayback/tree/master/wayback-cdx-server), multiple archived tweets on Wayback Machine to avoid opening each link manually. The app is a prototype written in Python with Streamlit and hosted at Streamlit Cloud.
 
 *Thanks Tristan Lee for the idea.*
 
 ## Features
 
 - Tweets per page defined by user
-- Filtering by saved date
-- Filtering by deleted tweets
+- Filter by years
+- Filter by only deleted tweets
 
 ## Development
 
@@ -43,13 +34,35 @@ Streamlit will be served at http://localhost:8501
 ## Bugs
 
 - [ ] "web.archive.org took too long to respond."
+- [ ] Pagination: set session variable on first click
+- [ ] Timeout error
 - [x] `only_deleted` checkbox selected for handles without deleted tweets
-- [x] Pagination: set session variable on first click
 - [x] Pagination: scroll to top
 - [x] `IndexError`
-- [ ] Timeout error
 
 ## Docs
 
 - [Roadmap](docs/ROADMAP.md)
 - [Changelog](docs/CHANGELOG.md)
+
+## Testimonials
+
+>"Original way to find deleted tweets." â [Henk Van Ess](https://twitter.com/henkvaness/status/1693298101765701676)
+
+>"This is an excellent tool to use now that most Twitter API-based tools have gone down with changes to the pricing structure over at X." â [The OSINT Newsletter - Issue #22](https://osintnewsletter.com/p/22#%C2%A7osint-community)
+
+>"One of the keys to using the Wayback Machine effectively is knowing what it can and canât archive. It can, and has, archived many, many Twitter accounts... Utilize fun tools such as Wayback Tweets to do so more effectively." â [Ari Ben Am](https://memeticwarfareweekly.substack.com/p/mww-paradise-by-the-telegram-dashboard)
+
+>"Want to see archived tweets on Wayback Machine in bulk? You can use Wayback Tweets." â [Daily OSINT](https://twitter.com/DailyOsint/status/1695065018662855102)
+
+>"Untuk mempermudah penelusuran arsip, gunakan Wayback Tweets." â [GIJN Indonesia](https://twitter.com/gijnIndonesia/status/1685912219408805888)
+
+>"A tool to quickly view tweets saved on archive.org." â [Irina_Tech_Tips Newsletter #3](https://irinatechtips.substack.com/p/irina_tech_tips-newsletter-3-2023#%C2%A7wayback-tweets)
+
+
+## Contributing
+
+PRs are welcome. Please, check the bug topic above, the [roadmap](docs/ROADMAP.md) or add a new feature.
+
+> [!NOTE]
+> If the application is down, please check the [Streamlit Cloud Status](https://www.streamlitstatus.com/).
\ No newline at end of file
diff --git a/app.py b/app.py
index 2347fd6..a284774 100644
--- a/app.py
+++ b/app.py
@@ -18,13 +18,13 @@ st.set_page_config(
 
         [![GitHub release (latest by date including pre-releases)](https://img.shields.io/github/v/release/claromes/waybacktweets?include_prereleases)](https://github.com/claromes/waybacktweets/releases) [![License](https://img.shields.io/github/license/claromes/waybacktweets)](https://github.com/claromes/waybacktweets/blob/main/LICENSE.md)
 
-        Tool that displays multiple archived tweets on Wayback Machine to avoid opening each link manually. Via Wayback CDX Server API.
+        Tool that displays, via Wayback CDX Server API, multiple archived tweets on Wayback Machine to avoid opening each link manually.
 
         - Tweets per page defined by user
-        - Filtering by saved date
-        - Filtering by deleted tweets
+        - Filter by years
+        - Filter by only deleted tweets
 
-        This tool is experimental, please feel free to send your [feedbacks](https://github.com/claromes/waybacktweets/issues).
+        This tool is a prototype, please feel free to send your [feedbacks](https://github.com/claromes/waybacktweets/issues). Created and maintained by [@claromes](https://github.com/claromes).
 
         -------
         ''',
@@ -42,6 +42,9 @@ hide_streamlit_style = '''
         background-color: #dddddd;
         border-radius: 0.5rem;
     }
+    div[data-testid="InputInstructions"] {
+        visibility: hidden;
+    }
 </style>
 '''
 
@@ -71,8 +74,8 @@ if 'update_component' not in st.session_state:
 if 'offset' not in st.session_state:
     st.session_state.offset = 0
 
-if 'date_created' not in st.session_state:
-    st.session_state.date_created = (2006, year)
+if 'saved_at' not in st.session_state:
+    st.session_state.saved_at = (2006, year)
 
 if 'count' not in st.session_state:
     st.session_state.count = False
@@ -134,8 +137,8 @@ def embed(tweet):
         st.error('Connection to publish.twitter.com timed out.')
 
 @st.cache_data(ttl=1800, show_spinner=False)
-def tweets_count(handle, date_created):
-    url = f'https://web.archive.org/cdx/search/cdx?url=https://twitter.com/{handle}/status/*&output=json&from={date_created[0]}&to={date_created[1]}'
+def tweets_count(handle, saved_at):
+    url = f'https://web.archive.org/cdx/search/cdx?url=https://twitter.com/{handle}/status/*&output=json&from={saved_at[0]}&to={saved_at[1]}'
     try:
         response = requests.get(url)
 
@@ -148,14 +151,15 @@ def tweets_count(handle, date_created):
                 return 0
     except requests.exceptions.Timeout:
         st.error('Connection to web.archive.org timed out.')
+        st.stop()
 
 @st.cache_data(ttl=1800, show_spinner=False)
-def query_api(handle, limit, offset, date_created):
+def query_api(handle, limit, offset, saved_at):
     if not handle:
         st.warning('username, please!')
         st.stop()
 
-    url = f'https://web.archive.org/cdx/search/cdx?url=https://twitter.com/{handle}/status/*&output=json&limit={limit}&offset={offset}&from={date_created[0]}&to={date_created[1]}'
+    url = f'https://web.archive.org/cdx/search/cdx?url=https://twitter.com/{handle}/status/*&output=json&limit={limit}&offset={offset}&from={saved_at[0]}&to={saved_at[1]}'
     try:
         response = requests.get(url)
         response.raise_for_status()
@@ -185,7 +189,7 @@ def parse_links(links):
     return parsed_links, tweet_links, parsed_mimetype, timestamp
 
 def attr(i):
-    st.markdown(f'{i+1 + st.session_state.offset}. **Wayback Machine:** [link]({link}) Â· **MIME Type:** {mimetype[i]} Â· **Saved at:** {datetime.datetime.strptime(timestamp[i], "%Y%m%d%H%M%S")} Â· **Tweet:** [link]({tweet_links[i]})')
+    st.markdown(f'{i+1 + st.session_state.offset}. [**web.archive.org**]({link}) Â· **MIME Type:** {mimetype[i]} Â· **Saved at:** {datetime.datetime.strptime(timestamp[i], "%Y%m%d%H%M%S")} Â· [**tweet**]({tweet_links[i]})')
 
 # UI
 st.title('Wayback Tweets [![Star](https://img.shields.io/github/stars/claromes/waybacktweets?style=social)](https://github.com/claromes/waybacktweets)', anchor=False)
@@ -193,16 +197,14 @@ st.write('Display multiple archived tweets on Wayback Machine and avoid opening
 
 handle = st.text_input('Username', placeholder='jack')
 
-st.session_state.date_created = st.slider('Tweets saved between', 2006, year, (2006, year))
+st.session_state.saved_at = st.slider('Tweets saved between', 2006, year, (2006, year))
 
-tweets_per_page = st.slider('Tweets per page', 25, 1000, 25, 25)
+tweets_per_page = st.slider('Tweets per page', 25, 250, 25, 25)
 
 only_deleted = st.checkbox('Only deleted tweets')
 
 query = st.button('Query', type='primary', use_container_width=True)
 
-bar = st.empty()
-
 if query or st.session_state.count:
     if handle != st.session_state.current_handle:
         st.session_state.offset = 0
@@ -210,17 +212,17 @@ if query or st.session_state.count:
     if query != st.session_state.current_query:
         st.session_state.offset = 0
 
-    st.session_state.count = tweets_count(handle, st.session_state.date_created)
+    st.session_state.count = tweets_count(handle, st.session_state.saved_at)
 
     st.write(f'**{st.session_state.count} URLs have been captured**')
 
-    if tweets_per_page > st.session_state.count:
-        tweets_per_page = st.session_state.count
+    if st.session_state.count:
+        if tweets_per_page > st.session_state.count:
+            tweets_per_page = st.session_state.count
 
     try:
-        bar.progress(0)
         progress = st.empty()
-        links = query_api(handle, tweets_per_page, st.session_state.offset, st.session_state.date_created)
+        links = query_api(handle, tweets_per_page, st.session_state.offset, st.session_state.saved_at)
 
         parse = parse_links(links)
         parsed_links = parse[0]
@@ -290,56 +292,54 @@ if query or st.session_state.count:
             start_index = st.session_state.offset
             end_index = min(st.session_state.count, start_index + tweets_per_page)
 
-            for i in range(tweets_per_page):
-                try:
-                    bar.progress((i*3) + 13)
+            with st.spinner('Fetching...'):
+                for i in range(tweets_per_page):
+                    try:
+                        link = parsed_links[i]
+                        tweet = embed(tweet_links[i])
 
-                    link = parsed_links[i]
-                    tweet = embed(tweet_links[i])
+                        if not only_deleted:
+                            attr(i)
 
-                    if not only_deleted:
-                        attr(i)
+                            if tweet:
+                                status_code = tweet[0]
+                                tweet_content = tweet[1]
+                                user_info = tweet[2]
+                                is_RT = tweet[3]
 
-                        if tweet:
-                            status_code = tweet[0]
-                            tweet_content = tweet[1]
-                            user_info = tweet[2]
-                            is_RT = tweet[3]
+                                if mimetype[i] == 'application/json':
+                                    display_tweet()
 
-                            if mimetype[i] == 'application/json':
-                                display_tweet()
+                                if mimetype[i] == 'text/html':
+                                    display_tweet()
+                            elif not tweet:
+                                display_not_tweet()
 
-                            if mimetype[i] == 'text/html':
-                                display_tweet()
-                        elif not tweet:
-                            display_not_tweet()
+                        if only_deleted:
+                            if not tweet:
+                                return_none_count += 1
+                                attr(i)
 
-                    if only_deleted:
-                        if not tweet:
-                            return_none_count += 1
-                            attr(i)
+                                display_not_tweet()
 
-                            display_not_tweet()
+                            progress.write(f'{return_none_count} URLs have been captured in the range {start_index}-{end_index}')
 
-                        progress.write(f'{return_none_count} URLs have been captured in the range {start_index}-{end_index}')
+                        if start_index <= 0:
+                            st.session_state.prev_disabled = True
+                        else:
+                            st.session_state.prev_disabled = False
 
-                    if start_index <= 0:
-                        st.session_state.prev_disabled = True
-                    else:
-                        st.session_state.prev_disabled = False
+                        if i + 1 == st.session_state.count:
+                            st.session_state.next_disabled = True
+                        else:
+                            st.session_state.next_disabled = False
+                    except IndexError:
+                        if start_index <= 0:
+                            st.session_state.prev_disabled = True
+                        else:
+                            st.session_state.prev_disabled = False
 
-                    if i + 1 == st.session_state.count:
                         st.session_state.next_disabled = True
-                    else:
-                        st.session_state.next_disabled = False
-                # TODO
-                except IndexError:
-                    if start_index <= 0:
-                        st.session_state.prev_disabled = True
-                    else:
-                        st.session_state.prev_disabled = False
-
-                    st.session_state.next_disabled = True
 
             prev, _ , next = st.columns([3, 4, 3])
 
@@ -350,7 +350,7 @@ if query or st.session_state.count:
             st.error('Unable to query the Wayback Machine API.')
     except TypeError as e:
         st.error(f'''
-        {f}. Refresh this page and try again.
+        {e}. Refresh this page and try again.
 
         If the problem persists [open an issue](https://github.com/claromes/waybacktweets/issues).
         ''')
diff --git a/assets/wbt-0.2.gif b/assets/wbt-0.2.gif
deleted file mode 100644
index 1cf478c..0000000
Binary files a/assets/wbt-0.2.gif and /dev/null differ
diff --git a/docs/ROADMAP.md b/docs/ROADMAP.md
index e6ae500..04a30e2 100644
--- a/docs/ROADMAP.md
+++ b/docs/ROADMAP.md
@@ -10,9 +10,8 @@
 - [ ] Prevent duplicate URLs
 - [x] Range size defined by user
 - [ ] `parse_links` exception
-- [ ] Add current page to page title
 - [ ] Parse MIME type `warc/revisit`
 - [ ] Parse MIME type `text/plain`
 - [x] Filter by period/datetime
 - [ ] Apply filters by API endpoints
-- [ ] Add contributing guidelines
\ No newline at end of file
+- [x] Add contributing guidelines
\ No newline at end of file