You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This maybe a ticket that needs to be broken into many tickets, but thought easier to open new as needed rather than a bunch.
support for text file is not documented - I was surprised to see txt files copied and then modified. This actually fails when the files are not utf8 and/or the locale for the system does not match utf8 as the json payload added to the head won't match. Hack below so as to get it working on my machine. I'm still unclear on the why here (I suspect back tracking support)
Some Linux installs default to 7-bit us-ascii
Many Windows installation in US and Western Europe default to
Error count at end of import is confusing. In my case I'm pretty confident this is a "skip" count, where the files where skipped due to already existing (i.e. duplicate avoidance).
Basically preserve existing data (in what ever encoding its in) but add a utf8 encoded first line with a newline
$ git diff elodie/media/text.py
diff --git a/elodie/media/text.py b/elodie/media/text.py
index 4e3c6bb..54b9d33 100644
--- a/elodie/media/text.py
+++ b/elodie/media/text.py
@@ -145,8 +145,15 @@ class Text(Base):
if source is None:
return None
+ # FIXME / TODO document why this is being done. A *.txt file is being opened BUT only the first line is read and then assumed to be a complete valid payload? Why do the IO, what purpose does this serve? why possible sort of usefulness does this offer?
with open(source, 'r') as f:
- first_line = f.readline().strip()
+ #first_line = f.readline().strip()
+ try:
+ first_line = f.readline().strip()
+ except UnicodeDecodeError:
+ print('file %r UnicodeDecodeError' % source)
+ #raise
+ return None # seems to be in keeping with other exit points
try:
parsed_json = loads(first_line)
@@ -191,6 +198,9 @@ class Text(Base):
copyfileobj(f_read, f_write)
else:
# Prepend the metadata to the file
+ #print('DEBUG %r' % metadata_as_json)
+ #print('DEBUG %r' % type(metadata_as_json))
+ """
with open(source, 'r') as f_read:
original_contents = f_read.read()
with open(source, 'w') as f_write:
@@ -198,6 +208,13 @@ class Text(Base):
metadata_as_json,
original_contents)
)
+ """
+ with open(source, 'rb') as f_read:
+ original_contents = f_read.read()
+ with open(source, 'wb') as f_write:
+ f_write.write(metadata_as_json.encode('utf8')) # write first line json (utf-8 encoded) header
+ f_write.write(original_contents) # what ever format was already there
+
error report doc comments
$ git diff elodie/result.py
diff --git a/elodie/result.py b/elodie/result.py
index 3fa7851..650411d 100644
--- a/elodie/result.py
+++ b/elodie/result.py
@@ -15,7 +15,7 @@ class Result(object):
if status:
self.success += 1
else:
- self.error += 1
+ self.error += 1 # which may simple mean skipped, not an actual error!
self.error_items.append(id)
def write(self):
@@ -32,7 +32,7 @@ class Result(object):
headers = ["Metric", "Count"]
result = [
["Success", self.success],
- ["Error", self.error],
+ ["Error", self.error], # which may simple mean skipped, not an actual error!
]
print("****** SUMMARY ******")
The text was updated successfully, but these errors were encountered:
BTW thanks for making this available. It's implemented a bunch of stuff I don't want to implement myself so saved me a bunch of work (even with the current reverse geocode issue with MapQuest API) :-)
This maybe a ticket that needs to be broken into many tickets, but thought easier to open new as needed rather than a bunch.
text file hack
Superceded by #441 to resolve #440
Basically preserve existing data (in what ever encoding its in) but add a utf8 encoded first line with a newline
error report doc comments
The text was updated successfully, but these errors were encountered: