-
Notifications
You must be signed in to change notification settings - Fork 38
Fix string length computation for issue_command function #33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
+ Determine the Python version of current environment + Deal with the args according to py2 or py3
|
@quique0194, @pjrobertson: # -*- coding: utf-8 -*-
import sys
import dryscrape
dryscrape.start_xvfb()
br = dryscrape.Session()
br.visit("https://google.com")
if sys.version_info[0] == 2:
PY2 = True
PY3 = False
if sys.version_info[0] == 3:
PY2 = False
PY3 = True
def test_py2():
q = br.at_xpath('//*[@name="q"]')
q.set(u"canción")
q.form().submit()
br.render("py2-unicode.png")
q = br.at_xpath('//*[@name="q"]')
q.set("北京")
q.form().submit()
br.render("py2-str.png")
q = br.at_xpath('//*[@name="q"]')
q.set(b"123")
q.form().submit()
br.render("py2-bytes.png")
def test_py3():
q = br.at_xpath('//*[@name="q"]')
q.set("北京")
q.form().submit()
br.render("py3-unicode.png")
q = br.at_xpath('//*[@name="q"]')
q.set("canción".encode("utf-8"))
q.form().submit()
br.render("py3-bytes.png")
if __name__ == "__main__":
if PY2:
test_py2()
if PY3:
test_py3()
print("Done") |
| """ Deal with the unicode and bytes problem in Python 2 and Python 3 for socket.sendall """ | ||
|
|
||
| if PY2: | ||
| if type(arg) != unicode: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure if this raise an NameError in Python 3 since there's no unicode type in Python 3.
But I've tested it in Python 3 and it worked.
Would be appreciate if anyone has some advices on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While this would be a problem in other languages such as C++, where identifiers need to be defined at compilation time even if they are never used at runtime, this should not be an issue for Python, since here definitions of identifiers are only checked when encountered, and not during the syntax check.
In Python, it is also common to import packages only when they are required, and therefore import statements can be nested into if statements, indicating that an identifier does not have to be defined until it is encountered by the interpreter.
Therefore, I do not expect to find any problems with this line of code. In addition, I have successfully tested the suggested changes while observing no error messages or warnings.
webkit_server.py
Outdated
| arg = str(arg) | ||
| self._writeline(str(len(arg))) | ||
| self._sock.sendall(arg.encode("utf-8")) | ||
| self.send_arg(arg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file seems to be using an indentation of 2 spaces rather than 4 consistently throughout the file. I suggest correcting the indentation of line 523.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I've fixed it in this commit 62ad2e9.
|
I came across the same problem, and found a very similar solution. I can confirm that the fundamental problem is the computed length of each argument as I have reviewed the solution presented in this pull request. I find the updated code well written, and have successfully tested the updated version with both Python2 and Python3 using the provided example script from Dryscrape, as well as a script that I had written myself for a different purpose. In response to @M157q's comment on the use of the I suggest to the maintainers of this repository to merge this pull request #33, to reject pull requests #28 and #31, and to mark issues #27 and #30 as fixed (or duplicates where appropriate). |
To fit the coding style of this project. ref: niklasb#33 (comment)
|
@M157q thank you for the patch and @PhilippVeerport thank you for your detailed analysis. I will try to look at this ASAP. |
|
@M157q @PhilippVerpoort Thank you for your contribution. Unfortunately I have decided to no longer maintain dryscrape or webkit-server, since properly doing so would require rewriting webkit-server using QtWebEngine. Consider using a different package for your scraping if you do not want to be vulnerable to old WebKit bugs. |
|
@niklasb Thanks for your reply. This is completely fair, but would you consider implementing this change and filing a new release with this bug fix anyway? At least people who do consider for some reason to use this piece of software will not have to go through the same process of manually applying this patch. |
When
argis not a ascii char,len(arg)may be differrent fromlen(arg.encode("utf-8")).For example, a Chinese character
'哈',len('哈')is 1, butlen('哈'.encode("utf-8"))is 3,which will raise an
InvalidResponseErrorbecause of the wrong length.This PR should fix this kind of problem.