# Wayback Tweets
-[](https://doi.org/10.5281/zenodo.12528448) [](https://pypi.org/project/waybacktweets) [](https://github.com/claromes/waybacktweets/actions/workflows/docs.yml) [](https://waybacktweets.streamlit.app)
+[](https://pypi.org/project/waybacktweets) [](https://doi.org/10.5281/zenodo.12528448) [](https://waybacktweets.streamlit.app) [](https://colab.research.google.com/drive/1zRqi6uTMiGi5z8GQ-PC0tbpCJWULCqMO?usp=sharing)
-Retrieves archived tweets CDX data from the Wayback Machine, performs necessary parsing (see [Field Options](https://claromes.github.io/waybacktweets/field_options.html)), and saves the data in HTML (for easy viewing of the tweets using the `iframe` tag), CSV, and JSON formats.
+
+Retrieves archived tweets CDX data from the Wayback Machine, performs necessary parsing (see [Field Options](https://claromes.github.io/waybacktweets/field_options.html)), and saves the data in HTML, for easy viewing of the tweets using the iframe tags, CSV, and JSON formats.
## Installation
## Acknowledgements
- Tristan Lee (Bellingcat's Data Scientist) for the idea of the application.
-- Jessica Smith (Snowflake's Marketing Specialist) and Streamlit/Snowflake teams for the additional server resources on Streamlit Cloud.
+- Jessica Smith (Snowflake's Community Growth Specialist) and Streamlit/Snowflake team for the additional server resources on Streamlit Cloud.
- OSINT Community for recommending the application.
> [!NOTE]
layout="centered",
menu_items={
"About": f"""
- [](https://github.com/claromes/waybacktweets/releases) [](https://github.com/claromes/waybacktweets/blob/main/LICENSE.md) [](https://github.com/claromes/waybacktweets)
+ [](https://github.com/claromes/waybacktweets/blob/main/LICENSE.md)
The application is a prototype hosted on Streamlit Cloud, serving as an alternative to the command line tool.
# ------ User Interface Settings ------ #
-st.info(
- "🥳 [**Pre-release 1.0x: Python module, CLI, and new Streamlit app**](https://github.com/claromes/waybacktweets/releases)" # noqa: E501
-)
-
st.image(TITLE, use_column_width="never")
st.caption(
- "[](https://github.com/claromes/waybacktweets/releases) [](https://github.com/claromes/waybacktweets)" # noqa: E501
+ "[](https://github.com/claromes/waybacktweets/releases) [](https://github.com/sponsors/claromes)" # noqa: E501
)
st.write(
- "Retrieves archived tweets CDX data in HTML (for easy viewing of the tweets using the `iframe` tag), CSV, and JSON formats." # noqa: E501
+ "Retrieves archived tweets CDX data in HTML (for easy viewing of the tweets using the iframe tag), CSV, and JSON formats." # noqa: E501
)
st.write(
# -- Rendering -- #
- if csv_data and json_data and html_content:
- st.session_state.count = len(df)
- st.write(f"**{st.session_state.count} URLs have been captured**")
+ st.session_state.count = len(df)
+ st.write(f"**{st.session_state.count} URLs have been captured**")
- # -- HTML -- #
+ tab1, tab2, tab3 = st.tabs(["HTML", "CSV", "JSON"])
- st.header("HTML", divider="gray", anchor=False)
+ # -- HTML -- #
+ with tab1:
st.write(
- f"Visualize tweets more efficiently through `iframes`. Download the @{st.session_state.current_username}'s archived tweets in HTML." # noqa: E501
+ f"Visualize tweets more efficiently through iframe tags. Download the @{st.session_state.current_username}'s archived tweets in HTML." # noqa: E501
)
col5, col6 = st.columns([1, 18])
)
# -- CSV -- #
-
- st.header("CSV", divider="gray", anchor=False)
+ with tab2:
st.write(
"Check the data returned in the dataframe below and download the file."
)
st.dataframe(df, use_container_width=True)
# -- JSON -- #
-
- st.header("JSON", divider="gray", anchor=False)
+ with tab3:
st.write(
"Check the data returned in JSON format below and download the file."
)
"sphinx_new_tab_link",
"sphinx_click.ext",
"sphinx_autodoc_typehints",
+ "sphinxcontrib.youtube",
]
templates_path = ["_templates"]
--- /dev/null
+Hands-On Examples
+====================
+
+- **Notebook**
+
+ This notebook demonstrates how to fetch, parse, and export archived tweets for a specific user using the ``waybacktweets`` library.
+
+ .. image:: https://colab.research.google.com/assets/colab-badge.svg
+ :target: https://colab.research.google.com/drive/1zRqi6uTMiGi5z8GQ-PC0tbpCJWULCqMO?usp=sharing
+ :alt: Open In Collab
+
+.. raw:: html
+
+ <br>
+ <br>
+
+- **Video**
+
+ Demonstration of how to use Wayback Tweets and other tools to retrieve tweets (in Spanish)
+
+ .. youtube:: qy3wOnUxe6A
+ :width: 100%
Pre-release: |release|
-Retrieves archived tweets CDX data from the Wayback Machine, performs necessary parsing (see :ref:`field_options`), and saves the data in HTML (for easy viewing of the tweets using the ``iframe`` tag), CSV, and JSON formats.
+Retrieves archived tweets CDX data from the Wayback Machine, performs necessary parsing (see :ref:`field_options`), and saves the data in HTML, for easy viewing of the tweets using the iframe tags, CSV, and JSON formats.
-.. image:: https://zenodo.org/badge/DOI/10.5281/zenodo.12528448.svg
- :target: https://doi.org/10.5281/zenodo.12528448
+.. image:: https://img.shields.io/badge/Donate-via%20Sponsors-ff69b4.svg?logo=github
+ :target: https://github.com/sponsors/claromes
+ :alt: GitHub Sponsors
.. note::
Intensive queries can lead to rate limiting, resulting in a temporary ban of a few minutes from web.archive.org.
field_options
outputs
exceptions
+ handson
contribute
todo
layout="centered",
menu_items={
"About": """
- ## 🏛️ Wayback Tweets
-
- Tool that displays, via Wayback CDX Server API, multiple archived tweets on Wayback Machine to avoid opening each link manually. Users can apply filters based on specific years and view tweets that do not have the original URL available.
-
- This tool is a prototype, please feel free to send your [feedbacks](https://github.com/claromes/waybacktweets/issues). Created by [@claromes](https://claromes.com).
+ This is the legacy application of [Wayback Tweets](https://waybacktweets.streamlit.app/).
-------
""", # noqa: E501
# UI
st.title(
- "Wayback Tweets [](https://github.com/claromes/waybacktweets)", # noqa: E501
+ "Wayback Tweets", # noqa: E501
anchor=False,
help="v0.4.3",
)
standalone = ["Sphinx (>=5)"]
test = ["pytest"]
+[[package]]
+name = "sphinxcontrib-youtube"
+version = "1.4.1"
+description = "Sphinx \"youtube\" extension."
+optional = false
+python-versions = "*"
+files = [
+ {file = "sphinxcontrib_youtube-1.4.1-py2.py3-none-any.whl", hash = "sha256:de9cb454f066d580a1e7ad64efae7dd9e12c1b1567a31faa330b1aeaeed40460"},
+ {file = "sphinxcontrib_youtube-1.4.1.tar.gz", hash = "sha256:eb7871c8af47fd2b5c9727615354b7f95bce554be8be45b9fa8e5bc022f88059"},
+]
+
+[package.dependencies]
+requests = "*"
+Sphinx = ">=6.1"
+
+[package.extras]
+dev = ["nox"]
+doc = ["pydata-sphinx-theme", "sphinx-copybutton", "sphinx-design"]
+test = ["beautifulsoup4", "pytest", "pytest-cov", "pytest-regressions"]
+
[[package]]
name = "streamlit"
version = "1.36.0"
[metadata]
lock-version = "2.0"
python-versions = "^3.10"
-content-hash = "4017fc7af7b13a774406ad205ef03952ef96dc5c3e0413c624c8a459e0619a4c"
+content-hash = "e41f880cd350ecafc461396adeec717dd632a56071c030fab761265acc0773f6"
[tool.poetry]
name = "waybacktweets"
-version = "1.0a6"
+version = "1.0a7"
description = "Retrieves archived tweets CDX data from the Wayback Machine, performs necessary parsing, and saves the data."
authors = ["Claromes <support@claromes.com>"]
license = "GPLv3"
sphinx-new-tab-link = "^0.4.0"
sphinx-click = "^6.0.0"
sphinx-autodoc-typehints = "^2.1.1"
+sphinxcontrib-youtube = "^1.4.1"
[tool.poetry.group.dev.dependencies]
streamlit = "1.36.0"
"verbose",
is_flag=True,
default=False,
- help="Shows the error log.",
+ help="Shows the log.",
)
def main(
username: str,