Skip to content

get_mars_data.py

This module handles getting data from the MCD by scraping the cgi interface.

We simply pass parameters up in the url, like the web version interface does. Then we scrape the resulting web page for the link to the data and (optionally) the image[s].

Note that this is a simple scraper and is not in any sense affiliated with the MCD project. Please do not run it against the server too often or unreasonably. Where possible use the saved output (this is why we provide a saved output).

base_params ¤

Parameters which can be passed to the server. Defaults set here are extracted from the web interface. Any parameter set to None will not be passed. To pass "none" use a string. Do not override this dict directly; rather pass the parameter and value as keyword arguments to fetch_data().

FetchingError ¤

Error fetching resource.

The server returns 200 with an html error message, so we raise an exception and pass the error message up.

fetch_data(outdir='.', get_data=True, get_img=False, **params) ¤

Fetch data from the MCD and save in outdir.

Keyword arguments (other than outdir) will override the defaults in base_params.

Parameters:

Name Type Description Default
outdir Union[pathlib.Path, str]

dir to save in (Default value = ".")

'.'
get_data bool

get data or not (Default value = True)

True
get_img bool

get img or not (Default value = False)

False
**params

Parameters to override.

{}

Exceptions:

Type Description
FetchingError

Failed to fetch requested data.

Returns:

Type Description
(Path)

output file.

Call this function to retrieve data from the server and save it in a file. Keyword arguments passed here will override the defaults in base_params, e.g.:

>> fetch_data(ls=0.5, localtime=1).dataf
Path("marsdata_ls_0.5-localtime_1.txt")
For more information on any particular parameter see the web interface.

Source code in mars_mcd_helper/get_mars_data.py
def fetch_data(outdir: Union[Path, str] = ".", get_data: bool = True, get_img: bool = False, **params):
    """
    Fetch data from the MCD and save in outdir.

    Keyword arguments (other
    than `outdir`) will override the defaults in `base_params`.

    Args:
        outdir (Union[Path, str]): dir to save in (Default value = ".")
        get_data (bool): get data or not (Default value = True)
        get_img (bool): get img or not (Default value = False)
        **params: Parameters to override.

    Raises:
        FetchingError: Failed to fetch requested data.

    Returns:
        (Path): output file.

    Call this function to retrieve data from the server and save it in a file.
    Keyword arguments passed here will override the defaults in `base_params`,
    e.g.:

    ```python
    >> fetch_data(ls=0.5, localtime=1).dataf
    Path("marsdata_ls_0.5-localtime_1.txt")
    ```
    For more information on any particular parameter see the web interface.
    """
    p = base_params.copy()
    p.update(params)
    logger.info("Fetching page")
    r = get(url, params=p)
    if "Ooops!" in r.text:
        raise FetchingError(f"Failed to download, server said {r.text}")
    print(r, r.text)
    soup = BeautifulSoup(r.text, features="html.parser")
    if isinstance(outdir, str):
        outdir = Path(outdir).expanduser().resolve()

    dataf, imgf = None, None

    if get_data:
        data_url = urlbase + soup.body.a["href"].replace("../", "")
        logger.info(f"Fetching ascii data from {data_url}")
        r = get(data_url)
        dataf = outdir / generate_fn(**params)
        with dataf.open("w") as f:
            f.write(r.text)

    if get_img:
        img_url = urlbase + soup.body.img["src"].replace("../", "")
        logger.info(f"Fetching img from {img_url}")
        r = get(img_url)
        imgf = (outdir / generate_fn(**params)).with_suffix(".png")
        with imgf.open("wb") as im:
            im.write(r.content)

    return _FetchedFiles(dataf, imgf)

generate_fn(**params) ¤

Generate a unique filename from given params.

This function is used internally with the parameters used by fetch_data(). It is provided here in case you need to generate the filename from a given set of params.

Parameters:

Name Type Description Default
**params

params to consider.

{}

Returns:

Type Description
str

(str): Fn from params.

Source code in mars_mcd_helper/get_mars_data.py
def generate_fn(**params) -> str:
    """
    Generate a unique filename from given params.

    This function is used
    internally with the parameters used by `fetch_data()`.  It is provided here
    in case you need to generate the filename from a given set of params.

    Args:
        **params: params to consider.

    Returns:
        (str): Fn from params.
    """
    fn = "-".join(f"{k}_{x}" for k, x in params.items() if x is not None)
    return f"marsdata_{fn}.txt"

get(*args, *, max_wait=30, **kwargs) ¤

Get with exponential backoff.

Parameters:

Name Type Description Default
*args

Args for requests.get

()
max_wait int

Max seconds to wait. (Default value = 30)

30
**kwargs

Kwargs for requests.get

{}

Returns:

Type Description
Response

(requests.Response): response

Exceptions:

Type Description
ConnectionError

if unable to connect.

Source code in mars_mcd_helper/get_mars_data.py
def get(*args, max_wait: int = 30, **kwargs) -> requests.Response:
    """
    Get with exponential backoff.

    Args:
        *args: Args for requests.get
        max_wait (int): Max seconds to wait.  (Default value = 30)
        **kwargs: Kwargs for requests.get

    Returns:
        (requests.Response): response

    Raises:
        ConnectionError: if unable to connect.


    """
    start = time.monotonic()
    max_wait *= 1000
    i = 0
    while time.monotonic() - start < max_wait:
        try:
            return requests.get(*args, **kwargs)
        except ConnectionError:
            wait = (2 ** i) + (random.randint(0, 1000) / 1000)
            logger.warning(f"Failed to connect, pausing {wait} s and retrying")
            time.sleep(wait)
    logger.error("Failed to fetch.")
    raise ConnectionError("Max retries exceeded.")