get_mars_data.py
This module handles getting data from the MCD by scraping the cgi interface.
We simply pass parameters up in the url, like the web version interface does. Then we scrape the resulting web page for the link to the data and (optionally) the image[s].
Note that this is a simple scraper and is not in any sense affiliated with the MCD project. Please do not run it against the server too often or unreasonably. Where possible use the saved output (this is why we provide a saved output).
base_params
¤
Parameters which can be passed to the server. Defaults set here are
extracted from the web interface. Any parameter set to None
will not be
passed. To pass "none"
use a string. Do not override this dict directly;
rather pass the parameter and value as keyword arguments to fetch_data()
.
FetchingError
¤
Error fetching resource.
The server returns 200
with an html error
message, so we raise an exception and pass the error message up.
fetch_data(outdir='.', get_data=True, get_img=False, **params)
¤
Fetch data from the MCD and save in outdir.
Keyword arguments (other
than outdir
) will override the defaults in base_params
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
outdir |
Union[pathlib.Path, str] |
dir to save in (Default value = ".") |
'.' |
get_data |
bool |
get data or not (Default value = True) |
True |
get_img |
bool |
get img or not (Default value = False) |
False |
**params |
|
Parameters to override. |
{} |
Exceptions:
Type | Description |
---|---|
FetchingError |
Failed to fetch requested data. |
Returns:
Type | Description |
---|---|
(Path) |
output file. |
Call this function to retrieve data from the server and save it in a file.
Keyword arguments passed here will override the defaults in base_params
,
e.g.:
>> fetch_data(ls=0.5, localtime=1).dataf
Path("marsdata_ls_0.5-localtime_1.txt")
Source code in mars_mcd_helper/get_mars_data.py
def fetch_data(outdir: Union[Path, str] = ".", get_data: bool = True, get_img: bool = False, **params):
"""
Fetch data from the MCD and save in outdir.
Keyword arguments (other
than `outdir`) will override the defaults in `base_params`.
Args:
outdir (Union[Path, str]): dir to save in (Default value = ".")
get_data (bool): get data or not (Default value = True)
get_img (bool): get img or not (Default value = False)
**params: Parameters to override.
Raises:
FetchingError: Failed to fetch requested data.
Returns:
(Path): output file.
Call this function to retrieve data from the server and save it in a file.
Keyword arguments passed here will override the defaults in `base_params`,
e.g.:
```python
>> fetch_data(ls=0.5, localtime=1).dataf
Path("marsdata_ls_0.5-localtime_1.txt")
```
For more information on any particular parameter see the web interface.
"""
p = base_params.copy()
p.update(params)
logger.info("Fetching page")
r = get(url, params=p)
if "Ooops!" in r.text:
raise FetchingError(f"Failed to download, server said {r.text}")
print(r, r.text)
soup = BeautifulSoup(r.text, features="html.parser")
if isinstance(outdir, str):
outdir = Path(outdir).expanduser().resolve()
dataf, imgf = None, None
if get_data:
data_url = urlbase + soup.body.a["href"].replace("../", "")
logger.info(f"Fetching ascii data from {data_url}")
r = get(data_url)
dataf = outdir / generate_fn(**params)
with dataf.open("w") as f:
f.write(r.text)
if get_img:
img_url = urlbase + soup.body.img["src"].replace("../", "")
logger.info(f"Fetching img from {img_url}")
r = get(img_url)
imgf = (outdir / generate_fn(**params)).with_suffix(".png")
with imgf.open("wb") as im:
im.write(r.content)
return _FetchedFiles(dataf, imgf)
generate_fn(**params)
¤
Generate a unique filename from given params.
This function is used
internally with the parameters used by fetch_data()
. It is provided here
in case you need to generate the filename from a given set of params.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
**params |
|
params to consider. |
{} |
Returns:
Type | Description |
---|---|
str |
(str): Fn from params. |
Source code in mars_mcd_helper/get_mars_data.py
def generate_fn(**params) -> str:
"""
Generate a unique filename from given params.
This function is used
internally with the parameters used by `fetch_data()`. It is provided here
in case you need to generate the filename from a given set of params.
Args:
**params: params to consider.
Returns:
(str): Fn from params.
"""
fn = "-".join(f"{k}_{x}" for k, x in params.items() if x is not None)
return f"marsdata_{fn}.txt"
get(*args, *, max_wait=30, **kwargs)
¤
Get with exponential backoff.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
*args |
|
Args for requests.get |
() |
max_wait |
int |
Max seconds to wait. (Default value = 30) |
30 |
**kwargs |
|
Kwargs for requests.get |
{} |
Returns:
Type | Description |
---|---|
Response |
(requests.Response): response |
Exceptions:
Type | Description |
---|---|
ConnectionError |
if unable to connect. |
Source code in mars_mcd_helper/get_mars_data.py
def get(*args, max_wait: int = 30, **kwargs) -> requests.Response:
"""
Get with exponential backoff.
Args:
*args: Args for requests.get
max_wait (int): Max seconds to wait. (Default value = 30)
**kwargs: Kwargs for requests.get
Returns:
(requests.Response): response
Raises:
ConnectionError: if unable to connect.
"""
start = time.monotonic()
max_wait *= 1000
i = 0
while time.monotonic() - start < max_wait:
try:
return requests.get(*args, **kwargs)
except ConnectionError:
wait = (2 ** i) + (random.randint(0, 1000) / 1000)
logger.warning(f"Failed to connect, pausing {wait} s and retrying")
time.sleep(wait)
logger.error("Failed to fetch.")
raise ConnectionError("Max retries exceeded.")