Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

client.get_measurements problem #151

Open
ColorfulQuark opened this issue Jan 19, 2023 · 12 comments
Open

client.get_measurements problem #151

ColorfulQuark opened this issue Jan 19, 2023 · 12 comments

Comments

@ColorfulQuark
Copy link

ColorfulQuark commented Jan 19, 2023

Late yesterday (18 Jan) client.get_measurements stopped working for me. Logging in with client = myfitnesspal.Client() and client.get_date continue to work.

I did notice the design of the https://www.myfitnesspal.com/measurements/check-in page changed, so perhaps that's related.

@TimOgden
Copy link

TimOgden commented Jan 20, 2023

Same thing happening to me, no measurements are found on that page, looks like the webscraper for this part will have to be redone, seems like they took out any easily identifiable id's and I don't have experience parsing XML's so hopefully someone can find a fix for this

@ColorfulQuark
Copy link
Author

What's the function to get a mfp page? Pending a fix, I'd like to retrieve https://www.myfitnesspal.com/measurements/check-in which contains the data I most need and scrape it myself. I seem to be missing something obvious.

@TimOgden
Copy link

@ColorfulQuark You can see the process in Client.get_measurements() in client.py line 528. self._get_url_for_measurements() returns 'https://www.myfitnesspal.com/measurements/edit?page=1&type=1' which I believe needs to be changed.

Then in line 531, we call self._get_measurement_ids(document) on the document we loaded and do some XML scraping to find the measurements on the page. This XML scraping is also broken because it relies on id attribute matching which seems like doesn't exist in the new page.

It'd be great if you or someone could figure out the XML scraping. I tried for like an hour but with the lack of ids, it's really hard for me to find the information I'm looking for, especially because I've never worked with XPath

@ColorfulQuark
Copy link
Author

ColorfulQuark commented Jan 20, 2023

@TimOgden I was looking for the function that will return the contents of a page given the URL, Something like requests.get(url), but that transmits cookies and whatever else might be needed for authentication. There are a number of likely looking function names in client.py, but I can't figure out how to get the contents of a page.

https://www.myfitnesspal.com/measurements/edit?page=1&type=1 contains weight information, so it should be possible to extract that information. You can also get that page from https://www.myfitnesspal.com/measurements/edit?type=Weight&page=1 and other measurements by substituting the measurement you're looking for in the URL, e.g., type=Neck.

The data is a list of dicts: [{"id":"12345678901234","date":"2023-01-20","unit":"pounds","type":"Weight","updated_at":"2023-01-20T13:23:38Z","value":123}, ...] If nothing else, it should be possible to fetch the list and parse the dicts, presuming I can figure out how to download the contents of the relevant pages. Alas, this will only get recent entries, so we need to figure out how to get the pages with older entries.

@TimOgden
Copy link

Can't you use client._get_document_for_url(url)? Not sure where you found that list of dicts but that seems perfect! As for non-recent entries, seems like we have to iterate the page number in the url until we find that the table says "No measurements found".

@TimOgden
Copy link

Oh I just found them, good catch, that should be perfect

@ColorfulQuark
Copy link
Author

'def _get_document_for_url(self, url):
        content = self._get_content_for_url(url)

        return lxml.html.document_fromstring(content)

That parses xml I had thought content = self._get_content_for_url(url) would do it, but for some reason it doesn't return the page I see when logged in.

@TimOgden
Copy link

Weird, it seems like I get the page and am logged in just fine using self._get_content_for_url(url). I can write a parser but I probably would have to use beautifulsoup and it probably won't be until after this weekend, so up to you if you want to try to figure out the issue you're facing with that, maybe try clearing your cookies on chrome, restarting chrome, restarting the python script, etc

@ColorfulQuark
Copy link
Author

ColorfulQuark commented Jan 20, 2023

EDIT: this is now working:

import datetime
import json
import re

import myfitnesspal

client = myfitnesspal.Client()
day = client.get_date(datetime.date.today())
print(day)

url = "https://www.myfitnesspal.com/measurements/edit?type=Weight&page=1"
data = client._get_content_for_url(url)	
print(len(data))

if res := re.search(r'\[\\"idm-user-with-consents\\"]"},{"state":{"data":{"items":(.*?)]', data):
    for item in json.loads(res[1]+']'):
        print(item['date'], item['value'])
else:
    print('oops')
    

@ColorfulQuark
Copy link
Author

import datetime
import json
import re
from itertools import count

import myfitnesspal


def get_day(client):
    day = client.get_date(datetime.date.today())
    print(day)

def get_measures(client, id, lower_date):
    data = {}
    stop = False
    for page_num in count(1, 1):
        url = f"https://www.myfitnesspal.com/measurements/edit?type={id}&page={page_num}"
        page = client._get_content_for_url(url)	

        if res := re.search(r'\[\\"idm-user-with-consents\\"]"},{"state":{"data":{"items":(.*?)]', page):
            for item in json.loads(res[1]+']'):
                if item['date'] < lower_date:
                    stop = True
                    break
                data[item['date']] = item['value']
        else:
            print('oops', len(page))
        if stop or re.search('"has_more":(.*?),', page)[1] == 'false':
            break
                
    return data

def latest_measures():    
    url ="https://www.myfitnesspal.com/measurements/check-in"
    page = client._get_content_for_url(url)
    res = re.search(r'{"mutations":\[\],"queries":\[{"state":{"data":{"items":(.*?)]', page)
    data = {}
    for item in json.loads(res[1]+']'):
        data[item['type']] = item['value']
    return data
    
    
client = myfitnesspal.Client()

data = latest_measures()
print(data)

print(data.keys()) # measurement ids

data = get_measures(client, 'Weight', '2023-01-02')
for dt, item in data.items():        
    print(dt, item)

@TimOgden
Copy link

Sorry @ColorfulQuark, I was gone for the weekend. I just ran your script and it seems like it works perfect and also grabs the whole dataset instead of just the first page. I can integrate this into the actual code and make a PR so it will be fixed for everyone.

@ColorfulQuark
Copy link
Author

@TimOgden Sounds good. Glad you like it. With luck it will fit in with just a bit of tweaking to just get data between two dates (rather than my everything back to a specified date), add annotations, etc. I don't think the mainline has a latest_measures function, but I find it useful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants