client.get_measurements problem #151

ColorfulQuark · 2023-01-19T10:51:17Z

Late yesterday (18 Jan) client.get_measurements stopped working for me. Logging in with client = myfitnesspal.Client() and client.get_date continue to work.

I did notice the design of the https://www.myfitnesspal.com/measurements/check-in page changed, so perhaps that's related.

The text was updated successfully, but these errors were encountered:

TimOgden · 2023-01-20T02:19:04Z

Same thing happening to me, no measurements are found on that page, looks like the webscraper for this part will have to be redone, seems like they took out any easily identifiable id's and I don't have experience parsing XML's so hopefully someone can find a fix for this

ColorfulQuark · 2023-01-20T13:11:51Z

What's the function to get a mfp page? Pending a fix, I'd like to retrieve https://www.myfitnesspal.com/measurements/check-in which contains the data I most need and scrape it myself. I seem to be missing something obvious.

TimOgden · 2023-01-20T19:10:49Z

@ColorfulQuark You can see the process in Client.get_measurements() in client.py line 528. self._get_url_for_measurements() returns 'https://www.myfitnesspal.com/measurements/edit?page=1&type=1' which I believe needs to be changed.

Then in line 531, we call self._get_measurement_ids(document) on the document we loaded and do some XML scraping to find the measurements on the page. This XML scraping is also broken because it relies on id attribute matching which seems like doesn't exist in the new page.

It'd be great if you or someone could figure out the XML scraping. I tried for like an hour but with the lack of ids, it's really hard for me to find the information I'm looking for, especially because I've never worked with XPath

ColorfulQuark · 2023-01-20T21:29:46Z

@TimOgden I was looking for the function that will return the contents of a page given the URL, Something like requests.get(url), but that transmits cookies and whatever else might be needed for authentication. There are a number of likely looking function names in client.py, but I can't figure out how to get the contents of a page.

https://www.myfitnesspal.com/measurements/edit?page=1&type=1 contains weight information, so it should be possible to extract that information. You can also get that page from https://www.myfitnesspal.com/measurements/edit?type=Weight&page=1 and other measurements by substituting the measurement you're looking for in the URL, e.g., type=Neck.

The data is a list of dicts: [{"id":"12345678901234","date":"2023-01-20","unit":"pounds","type":"Weight","updated_at":"2023-01-20T13:23:38Z","value":123}, ...] If nothing else, it should be possible to fetch the list and parse the dicts, presuming I can figure out how to download the contents of the relevant pages. Alas, this will only get recent entries, so we need to figure out how to get the pages with older entries.

TimOgden · 2023-01-20T22:00:43Z

Can't you use client._get_document_for_url(url)? Not sure where you found that list of dicts but that seems perfect! As for non-recent entries, seems like we have to iterate the page number in the url until we find that the table says "No measurements found".

TimOgden · 2023-01-20T22:01:59Z

Oh I just found them, good catch, that should be perfect

ColorfulQuark · 2023-01-20T22:05:24Z

'def _get_document_for_url(self, url):
        content = self._get_content_for_url(url)

        return lxml.html.document_fromstring(content)

That parses xml I had thought content = self._get_content_for_url(url) would do it, but for some reason it doesn't return the page I see when logged in.

TimOgden · 2023-01-20T22:15:05Z

Weird, it seems like I get the page and am logged in just fine using self._get_content_for_url(url). I can write a parser but I probably would have to use beautifulsoup and it probably won't be until after this weekend, so up to you if you want to try to figure out the issue you're facing with that, maybe try clearing your cookies on chrome, restarting chrome, restarting the python script, etc

ColorfulQuark · 2023-01-20T22:24:06Z

EDIT: this is now working:

import datetime
import json
import re

import myfitnesspal

client = myfitnesspal.Client()
day = client.get_date(datetime.date.today())
print(day)

url = "https://www.myfitnesspal.com/measurements/edit?type=Weight&page=1"
data = client._get_content_for_url(url)	
print(len(data))

if res := re.search(r'\[\\"idm-user-with-consents\\"]"},{"state":{"data":{"items":(.*?)]', data):
    for item in json.loads(res[1]+']'):
        print(item['date'], item['value'])
else:
    print('oops')

ColorfulQuark · 2023-01-21T22:35:02Z

import datetime
import json
import re
from itertools import count

import myfitnesspal


def get_day(client):
    day = client.get_date(datetime.date.today())
    print(day)

def get_measures(client, id, lower_date):
    data = {}
    stop = False
    for page_num in count(1, 1):
        url = f"https://www.myfitnesspal.com/measurements/edit?type={id}&page={page_num}"
        page = client._get_content_for_url(url)	

        if res := re.search(r'\[\\"idm-user-with-consents\\"]"},{"state":{"data":{"items":(.*?)]', page):
            for item in json.loads(res[1]+']'):
                if item['date'] < lower_date:
                    stop = True
                    break
                data[item['date']] = item['value']
        else:
            print('oops', len(page))
        if stop or re.search('"has_more":(.*?),', page)[1] == 'false':
            break
                
    return data

def latest_measures():    
    url ="https://www.myfitnesspal.com/measurements/check-in"
    page = client._get_content_for_url(url)
    res = re.search(r'{"mutations":\[\],"queries":\[{"state":{"data":{"items":(.*?)]', page)
    data = {}
    for item in json.loads(res[1]+']'):
        data[item['type']] = item['value']
    return data
    
    
client = myfitnesspal.Client()

data = latest_measures()
print(data)

print(data.keys()) # measurement ids

data = get_measures(client, 'Weight', '2023-01-02')
for dt, item in data.items():        
    print(dt, item)

TimOgden · 2023-01-23T15:11:40Z

Sorry @ColorfulQuark, I was gone for the weekend. I just ran your script and it seems like it works perfect and also grabs the whole dataset instead of just the first page. I can integrate this into the actual code and make a PR so it will be fixed for everyone.

ColorfulQuark · 2023-01-25T00:22:41Z

@TimOgden Sounds good. Glad you like it. With luck it will fit in with just a bit of tweaking to just get data between two dates (rather than my everything back to a specified date), add annotations, etc. I don't think the mainline has a latest_measures function, but I find it useful.

hannahburkhardt mentioned this issue May 27, 2023

Fix get_measurements #161

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

client.get_measurements problem #151

client.get_measurements problem #151

ColorfulQuark commented Jan 19, 2023 •

edited

Loading

TimOgden commented Jan 20, 2023 •

edited

Loading

ColorfulQuark commented Jan 20, 2023

TimOgden commented Jan 20, 2023

ColorfulQuark commented Jan 20, 2023 •

edited

Loading

TimOgden commented Jan 20, 2023

TimOgden commented Jan 20, 2023

ColorfulQuark commented Jan 20, 2023

TimOgden commented Jan 20, 2023

ColorfulQuark commented Jan 20, 2023 •

edited

Loading

ColorfulQuark commented Jan 21, 2023

TimOgden commented Jan 23, 2023

ColorfulQuark commented Jan 25, 2023

client.get_measurements problem #151

client.get_measurements problem #151

Comments

ColorfulQuark commented Jan 19, 2023 • edited Loading

TimOgden commented Jan 20, 2023 • edited Loading

ColorfulQuark commented Jan 20, 2023

TimOgden commented Jan 20, 2023

ColorfulQuark commented Jan 20, 2023 • edited Loading

TimOgden commented Jan 20, 2023

TimOgden commented Jan 20, 2023

ColorfulQuark commented Jan 20, 2023

TimOgden commented Jan 20, 2023

ColorfulQuark commented Jan 20, 2023 • edited Loading

ColorfulQuark commented Jan 21, 2023

TimOgden commented Jan 23, 2023

ColorfulQuark commented Jan 25, 2023

ColorfulQuark commented Jan 19, 2023 •

edited

Loading

TimOgden commented Jan 20, 2023 •

edited

Loading

ColorfulQuark commented Jan 20, 2023 •

edited

Loading

ColorfulQuark commented Jan 20, 2023 •

edited

Loading