using/doc issues #424

clach04 · 2022-11-14T18:44:30Z

This maybe a ticket that needs to be broken into many tickets, but thought easier to open new as needed rather than a bunch.

support for text file is not documented - I was surprised to see txt files copied and then modified. This actually fails when the files are not utf8 and/or the locale for the system does not match utf8 as the json payload added to the head won't match. Hack below so as to get it working on my machine. I'm still unclear on the why here (I suspect back tracking support)
- Some Linux installs default to 7-bit us-ascii
- Many Windows installation in US and Western Europe default to
Error count at end of import is confusing. In my case I'm pretty confident this is a "skip" count, where the files where skipped due to already existing (i.e. duplicate avoidance).

text file hack

Superceded by #441 to resolve #440

Basically preserve existing data (in what ever encoding its in) but add a utf8 encoded first line with a newline

$ git diff elodie/media/text.py
diff --git a/elodie/media/text.py b/elodie/media/text.py
index 4e3c6bb..54b9d33 100644
--- a/elodie/media/text.py
+++ b/elodie/media/text.py
@@ -145,8 +145,15 @@ class Text(Base):
         if source is None:
             return None

+        # FIXME  / TODO document why this is being done. A *.txt file is being opened BUT only the first line is read and then assumed to be a complete valid payload? Why do the IO, what purpose does this serve? why possible sort of usefulness does this offer?
         with open(source, 'r') as f:
-            first_line = f.readline().strip()
+            #first_line = f.readline().strip()
+            try:
+                first_line = f.readline().strip()
+            except UnicodeDecodeError:
+                print('file %r UnicodeDecodeError' % source)
+                #raise
+                return None  # seems to be in keeping with other exit points

         try:
             parsed_json = loads(first_line)
@@ -191,6 +198,9 @@ class Text(Base):
                     copyfileobj(f_read, f_write)
         else:
             # Prepend the metadata to the file
+            #print('DEBUG %r' % metadata_as_json)
+            #print('DEBUG %r' % type(metadata_as_json))
+            """
             with open(source, 'r') as f_read:
                 original_contents = f_read.read()
                 with open(source, 'w') as f_write:
@@ -198,6 +208,13 @@ class Text(Base):
                         metadata_as_json,
                         original_contents)
                     )
+            """
+            with open(source, 'rb') as f_read:
+                original_contents = f_read.read()
+                with open(source, 'wb') as f_write:
+                    f_write.write(metadata_as_json.encode('utf8'))  # write first line json (utf-8 encoded) header
+                    f_write.write(original_contents)  # what ever format was already there
+

error report doc comments

$ git diff elodie/result.py
diff --git a/elodie/result.py b/elodie/result.py
index 3fa7851..650411d 100644
--- a/elodie/result.py
+++ b/elodie/result.py
@@ -15,7 +15,7 @@ class Result(object):
         if status:
             self.success += 1
         else:
-            self.error += 1
+            self.error += 1  # which may simple mean skipped, not an actual error!
             self.error_items.append(id)

     def write(self):
@@ -32,7 +32,7 @@ class Result(object):
         headers = ["Metric", "Count"]
         result = [
                     ["Success", self.success],
-                    ["Error", self.error],
+                    ["Error", self.error],  # which may simple mean skipped, not an actual error!
                  ]

         print("****** SUMMARY ******")

The text was updated successfully, but these errors were encountered:

clach04 · 2022-11-14T18:47:09Z

BTW thanks for making this available. It's implemented a bunch of stuff I don't want to implement myself so saved me a bunch of work (even with the current reverse geocode issue with MapQuest API) :-)

clach04 mentioned this issue Dec 11, 2022

Crash on text file NOT containing us-ascii or utf-8 in first line #440

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

using/doc issues #424

using/doc issues #424

clach04 commented Nov 14, 2022 •

edited

Loading

clach04 commented Nov 14, 2022

using/doc issues #424

using/doc issues #424

Comments

clach04 commented Nov 14, 2022 • edited Loading

text file hack

error report doc comments

clach04 commented Nov 14, 2022

clach04 commented Nov 14, 2022 •

edited

Loading