.. autofunction:: semicolon_parser
.. autofunction:: timestamp_parser
-Exceptions
-------------
-
-.. automodule:: waybacktweets.exceptions.exceptions
-
-.. autoclass:: ReadTimeoutError
- :members:
-
-.. autoclass:: ConnectionError
- :members:
-
-.. autoclass:: HTTPError
- :members:
-
-.. autoclass:: EmptyResponseError
- :members:
-
-.. autoclass:: GetResponseError
- :members:
Config
------------
Retrieves archived tweets CDX data from the Wayback Machine, performs necessary parsing (see :ref:`field_options`), and saves the data in CSV, JSON, and HTML formats.
-.. image:: ../assets/preview_image.jpg
- :alt: Preview image
- :align: center
-
.. note::
Intensive queries can lead to rate limiting, resulting in a temporary ban of a few minutes from web.archive.org.
Filters
----------
+
- Filtering by date range: Using the ``from`` and ``to`` filters
- Limit: Query result limits.
- Offset: Allows for a simple way to scroll through the results.
-- Only unavailable tweets: Checks if the archived URL still exists on Twitter (see the :ref:`flowchart`)
-
- Only unique Wayback Machine URLs: Filtering by the collapse option using the ``urlkey`` field and the URL Match Scope ``prefix``
Workflow
================
-The tool was written following a proposal not only to Retrieve data from archived tweets, but also to facilitate the reading of these tweets. Therefore, a flow is defined to obtain these results in the best possible way.
+The tool was written following a proposal not only to retrieve data from archived tweets, but also to facilitate the reading of these tweets. Therefore, a flow is defined to obtain these results in the best possible way.
Due to limitations of the Wayback CDX Server API, it is not always possible to parse the results with the mimetype ``application/json``, regardless, the data in CDX format are saved.
C--> |4xx| E[return None]
E--> F{request Archived\nTweet URL}
F--> |4xx| G[return Only CDX data]
- F--> |2xx/3xx: application/json| J[return JSON text]
+ F--> |TODO: 2xx/3xx: application/json| J[return JSON text]
F--> |2xx/3xx: text/html, warc/revisit, unk| K[return HTML iframe tag]