Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 18 additions & 3 deletions src/extract-kindle-book.ts
100644 → 100755
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,8 @@ import {
parseJsonpResponse
} from './utils'

import * as os from 'os'

interface PageNav {
page?: number
location?: number
Expand Down Expand Up @@ -44,11 +46,23 @@ async function main() {
const krRendererMainImageSelector = '#kr-renderer .kg-full-page-img img'
const bookReaderUrl = `https://read.amazon.com/?asin=${asin}`

//Switch for multi-OS operation
const getChromeExecutablePath = () => {
switch (os.platform()) {
case 'win32':
return 'C:/Program Files/Google/Chrome/Application/chrome.exe';
case 'darwin':
return '/Applications/Google Chrome.app/Contents/MacOS/Google Chrome';
default:
return '/Applications/Google Chrome.app/Contents/MacOS/Google Chrome';
}
}
const chromePath = getChromeExecutablePath();
Comment on lines +50 to +60
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wouldnt merge this in this PR as it's unrelated, and hardcoding paths is not guaranteed, e.g. i do not use chrome and use a custom build of chromium


const context = await chromium.launchPersistentContext(userDataDir, {
headless: false,
channel: 'chrome',
executablePath:
'/Applications/Google Chrome.app/Contents/MacOS/Google Chrome',
executablePath: chromePath,
args: ['--hide-crash-restore-bubble'],
ignoreDefaultArgs: ['--enable-automation'],
deviceScaleFactor: 2,
Expand Down Expand Up @@ -259,7 +273,8 @@ async function main() {
if (pageNav?.page === undefined) {
break
}
if (pageNav.page > totalContentPages) {
if (pageNav.page >= totalContentPages) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed this in one of the books I extracted, but I had to do (totalContentPages - 1) otherwise it would keep trying to go to a last page that couldn't be navigated to.

It's strange because the kindle app calculates the total content pages the same, but you can't navigate to that last page. It then gets stuck in a loop of screenshotting the same page and trying to navigate to the next.

With more time, a smarter solution would probably be checking if the current page has already been navigated to, then break

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, in my testing it seemed to be slightly different for different books. i never had the chance to improve the robustness, but I like the idea of trying to get the current page ID and breaking if it's been visited

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some books do not include page numbers. I stumbled across this project while doing some research for another, so I decided to see if I could add some robustness to extract-kindkle-book. Seems to work well with three different books I tried, one is a biblical commentary, and the other two were both business books. I hope it helps.

console.log("Last page reached.")
break
}

Expand Down