New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Augment non-ISBN ASIN BWB records with BookWorm data #8903
Augment non-ISBN ASIN BWB records with BookWorm data #8903
Conversation
d22019c
to
f2f509f
Compare
46d1710
to
93b7848
Compare
For books, Amazon usually returns ISBN 10s as its ASIN, but sometimes, particularly in the case of ebooks, they may be numbers starting with `B`, rather than ISBN 10s. This commit keeps the non-ISBN 10 ASIN in the `source_records: ["amazon:asin"]` line, but prevents them from entering the `isbn_10` field. It also prevents `isbn_10` and `isbn_13` from having truthy `[None]` values.
93b7848
to
67dc088
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the code is probably all good, the one potential blocker (isbn_13 -> isbn_10 not always a thing) seems like it shouldn't break anything.
We may be able to dry up the code by having a discrete union of cases where something is exactly either an isbn13, isbn10, or asin, but cannot any more than 1 (e.g. a 1234567890 is an isbn10 and not an asin, B127939 is an asin)
Feel free to try a data class :)
@@ -993,6 +1001,13 @@ def load(rec, account_key=None, from_marc_record: bool = False): | |||
|
|||
normalize_import_record(rec) | |||
|
|||
# For recs with a non-ISBN ASIN, supplement the record with BookWorm data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really understand what BookWorm data is. The code is appears to be looking for pre-exisiting "staged or pending" import items by ASIN regardless of the original source?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will fix that comment because it's actually supplementing with data from the import_item
table, which could be ISBNdb data, it could be from Amazon's Product API, or another source. Thanks for pointing this out.
And indeed, for any rec
with a B*
ASIN, this will try to supplement that rec
by using a match from the import_item
table. This could easily be more limited if desired.
As for what BookWorm is, that's the name for the affiliate-server.
scripts/affiliate_server.py
Outdated
""" | ||
Call the Amazon API to get the products for a list of isbn_10s and store | ||
each product in memcache using amazon_product_{isbn_13} as the cache key. | ||
""" | ||
logger.info(f"process_amazon_batch(): {len(isbn_10s)} items") | ||
logger.info(f"process_amazon_batch(): {len(isbn_10s_or_asins)} items") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some of the Amazon specific terminology in the comments and naming may be a bit confusing, an ASIN is a 10-digit identifier which is often an ISBN-10. Variable names like isbn_10s_or_asins
could simply be asin
if that's what the Amazon API is returning, but from the surrounding context I'm not sure.
Referring to all ASINs as ISBN-10 is going to be wrong sometimes (unless they are pre-filtered ASINs), but calling them ISBN-10s or ASINs is either redundant, or implies that some of the values are ISBN-10s which do not correspond to ASINs on Amazon.com, but I don't think that's what's happening in this module?
I thought the Amazon API was clear with its terminology and distinguishing between ISBNs and ASINs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will try to fix up the nomenclature. I agree it's not going to win awards. :)
from openlibrary.core.imports import ImportItem # Evade circular import. | ||
|
||
if item := ImportItem.find_staged_or_pending([non_isbn_asin]).first(): | ||
rec = json.loads(item.get("data", '{}')) | rec |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR title and description say "augment" and "supplement" but this looks like it simply replaces the record if a match is found. I don't know if it'll have much difference in practice, but it'll help to be clear about what is happening. I don't think anyone has investigated whether the source Amazon record is guaranteed to be complete...
now that I think of it it needs to be augmented because at the very least the BWB SKU and source record from the original record needs to be carried forward to enable the barcode matching we need for scanning. This might be a blocker.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I may be repeating what you've just stated is the problem, so if that's the case, my apologies. I just want to get on the same page first.
What the highlighted code section is hopefully is doing is having the incoming rec
data clobber fields that exist in both rec
and whatever was found in the import_item
table by way of ImportItem.find_staged_or_pending()
.
E.g. in the following, the publish_date
from staged_import_item
is added to the returned dictionary, but the title
is not:
>>> staged_import_item = {"title": "A Great Book %!-", "identifiers": {"amazon": ["B012345678"]} , "publish_date": "2023"}
>>> rec = {"title": "A Great Book", "identifiers": {"amazon": ["B012345678"]}}
>>> staged_import_item | rec
{'title': 'A Great Book', 'identifiers': {'amazon': ['B012345678']}, 'publish_date': '2023'}
In this way, rec
should be supplemented with whatever was found in the match from import_item
, but no fields in rec
should be overwritten.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@scottbarnes That |
is doing a lot of work, and it's hard to tell how the problem data is going to be processed, I don't have a good picture of what each of the two rec
s have as keys.
Here are some real samples of promise item data from one of the promise items for testing:
wget https://archive.org/download/bwb_daily_pallets_2023-11-02/DailyPallets__2023-11-02.json
First 10 no-Author no-Date B0* ASINs from promise item bwb_daily_pallets_2023-11-02 :
sed 's/^\[//;s/},{/}\n{/g;s/\]$//' DailyPallets__2023-11-02.json | grep B0 | grep '"Author":""' | grep '"PublicationDate":null' | head
{"BookBarcode":"KT-047-072","PackedLocation":"Dunfermline","Sort":"Never Seen","PalletBarcode":"11\/1\/2023 8:30:14 PM","BookSKUB":"KT-047-072","BookSKU":"","ISBN":"BWBM57920566","ASIN":"B0007IX3CM","ProductJSON":{"ISBN":"BWBM57920566","ASIN":"B0007IX3CM","Title":"The economics of \" under-developed \" areas;: An annotated reading list of books, articles, and official publications","MasterProductId":"57920566","BookId":"227351119","Author":"","Publisher":null,"PublicationDate":null}}
{"BookBarcode":"KT-017-548","PackedLocation":"Dunfermline","Sort":"Never Seen","PalletBarcode":"11\/1\/2023 8:30:14 PM","BookSKUB":"KT-017-548","BookSKU":"","ISBN":"BWBM57920580","ASIN":"B001W8W7EQ","ProductJSON":{"ISBN":"BWBM57920580","ASIN":"B001W8W7EQ","Title":"The New Cambridge Bibliography of English Literature","MasterProductId":"57920580","BookId":"227351595","Author":"","Publisher":null,"PublicationDate":null}}
{"BookBarcode":"KT-085-946","PackedLocation":"Dunfermline","Sort":"Never Seen","PalletBarcode":"11\/1\/2023 8:30:14 PM","BookSKUB":"KT-085-946","BookSKU":"","ISBN":"BWBM57920582","ASIN":"B000QV8BMM","ProductJSON":{"ISBN":"BWBM57920582","ASIN":"B000QV8BMM","Title":"The Barbarians: Warriors and Wars of the Dark Ages","MasterProductId":"57920582","BookId":"227351630","Author":"","Publisher":null,"PublicationDate":null}}
{"BookBarcode":"O9-CCW-518","PackedLocation":"Mishawaka","Sort":"Never Seen","PalletBarcode":"11\/1\/2023 8:30:14 PM","BookSKUB":"O9-CCW-518","BookSKU":"","ISBN":"BWBM57920590","ASIN":"B000OSM4I4","ProductJSON":{"ISBN":"BWBM57920590","ASIN":"B000OSM4I4","Title":"The Next Move: Current Events in Bible Prophecy","MasterProductId":"57920590","BookId":"227351789","Author":"","Publisher":null,"PublicationDate":null}}
{"BookBarcode":"O9-CCW-521","PackedLocation":"Mishawaka","Sort":"Never Seen","PalletBarcode":"11\/1\/2023 8:30:14 PM","BookSKUB":"O9-CCW-521","BookSKU":"","ISBN":"BWBM57920592","ASIN":"B0017XK6GW","ProductJSON":{"ISBN":"BWBM57920592","ASIN":"B0017XK6GW","Title":"The Experience","MasterProductId":"57920592","BookId":"227351786","Author":"","Publisher":null,"PublicationDate":null}}
{"BookBarcode":"O9-CCW-524","PackedLocation":"Mishawaka","Sort":"Never Seen","PalletBarcode":"11\/1\/2023 8:30:14 PM","BookSKUB":"O9-CCW-524","BookSKU":"","ISBN":"BWBM57920595","ASIN":"B01JCR2WES","ProductJSON":{"ISBN":"BWBM57920595","ASIN":"B01JCR2WES","Title":"Nandi customary law","MasterProductId":"57920595","BookId":"227351962","Author":"","Publisher":null,"PublicationDate":null}}
{"BookBarcode":"O9-CCW-551","PackedLocation":"Mishawaka","Sort":"Never Seen","PalletBarcode":"11\/1\/2023 8:30:14 PM","BookSKUB":"O9-CCW-551","BookSKU":"","ISBN":"BWBM57920622","ASIN":"B0000EE0HG","ProductJSON":{"ISBN":"BWBM57920622","ASIN":"B0000EE0HG","Title":"VISIONS OF AFRICA","MasterProductId":"57920622","BookId":"227352471","Author":"","Publisher":null,"PublicationDate":null}}
{"BookBarcode":"O9-CCB-679","PackedLocation":"Mishawaka","Sort":"Never Seen","PalletBarcode":"11\/1\/2023 8:30:14 PM","BookSKUB":"O9-CCB-679","BookSKU":"","ISBN":"BWBM57920626","ASIN":"B004GF3UUW","ProductJSON":{"ISBN":"BWBM57920626","ASIN":"B004GF3UUW","Title":"Eusebius Werke, Achter Band: Die Praeparatio Evangelica, Erster Teil: Einleitung, Die Bucher I bis X (Die Griechischen Christlichen Schriftsteller der Ersten Jahrhunderte 8\/1)","MasterProductId":"57920626","BookId":"227352681","Author":"","Publisher":null,"PublicationDate":null}}
{"BookBarcode":"O9-CCW-555","PackedLocation":"Mishawaka","Sort":"Never Seen","PalletBarcode":"11\/1\/2023 8:30:14 PM","BookSKUB":"O9-CCW-555","BookSKU":"","ISBN":"BWBM57920630","ASIN":"B0046GNRKO","ProductJSON":{"ISBN":"BWBM57920630","ASIN":"B0046GNRKO","Title":"TOPICS IN EAST AFRICAN HISTORY 1000-1970.","MasterProductId":"57920630","BookId":"227352779","Author":"","Publisher":null,"PublicationDate":null}}
{"BookBarcode":"O9-CCB-681","PackedLocation":"Mishawaka","Sort":"Never Seen","PalletBarcode":"11\/1\/2023 8:30:14 PM","BookSKUB":"O9-CCB-681","BookSKU":"","ISBN":"BWBM57920633","ASIN":"B001KVTCLG","ProductJSON":{"ISBN":"BWBM57920633","ASIN":"B001KVTCLG","Title":"Hegemonius, Acta Archelai","MasterProductId":"57920633","BookId":"227352891","Author":"","Publisher":null,"PublicationDate":null}}
Random sample of 10:
sed 's/^\[//;s/},{/}\n{/g;s/\]$//' DailyPallets__2023-11-02.json | grep B0 | grep '"Author":""' | grep '"PublicationDate":null' | shuf -n10
{"BookBarcode":"P8-CQU-196","PackedLocation":"Reno","Sort":"Never Seen","PalletBarcode":"11\/1\/2023 8:30:14 PM","BookSKUB":"P8-CQU-196","BookSKU":"","ISBN":"BWBM57921210","ASIN":"B000YBOQM8","ProductJSON":{"ISBN":"BWBM57921210","ASIN":"B000YBOQM8","Title":"Revelations of an Opera Manager in 19th Century America","MasterProductId":"57921210","BookId":"227392511","Author":"","Publisher":null,"PublicationDate":null}}
{"BookBarcode":"O9-BJH-243","PackedLocation":"York","Sort":"Never Seen","PalletBarcode":"11\/1\/2023 8:30:14 PM","BookSKUB":"O9-BJH-243","BookSKU":"","ISBN":"BWBM57921011","ASIN":"B09CSCZ3RD","ProductJSON":{"ISBN":"BWBM57921011","ASIN":"B09CSCZ3RD","Title":"Letters from Madame la Marquise de Sevigne. Selected, translated, and introduced by Violet Hammersley. With a preface by W. Somerset Maugham","MasterProductId":"57921011","BookId":"227376486","Author":"","Publisher":null,"PublicationDate":null}}
{"BookBarcode":"P9-AHG-152","PackedLocation":"Mishawaka","Sort":"Never Seen","PalletBarcode":"11\/1\/2023 8:30:14 PM","BookSKUB":"P9-AHG-152","BookSKU":"","ISBN":"BWBM57921312","ASIN":"B00KFT10UA","ProductJSON":{"ISBN":"BWBM57921312","ASIN":"B00KFT10UA","Title":"A Taste of Heaven - United Methodist Women Church Cookbook, North Carolina Cook Book","MasterProductId":"57921312","BookId":"227403938","Author":"","Publisher":null,"PublicationDate":null}}
{"BookBarcode":"P8-CQP-559","PackedLocation":"Reno","Sort":"Never Seen","PalletBarcode":"11\/1\/2023 8:30:14 PM","BookSKUB":"P8-CQP-559","BookSKU":"","ISBN":"BWBM57921089","ASIN":"B009NNX69W","ProductJSON":{"ISBN":"BWBM57921089","ASIN":"B009NNX69W","Title":"Baby Looney Tunes I Play","MasterProductId":"57921089","BookId":"227380126","Author":"","Publisher":null,"PublicationDate":null}}
{"BookBarcode":"P9-AYU-600","PackedLocation":"Mishawaka","Sort":"Never Seen","PalletBarcode":"11\/1\/2023 8:30:14 PM","BookSKUB":"P9-AYU-600","BookSKU":"","ISBN":"BWBM57921206","ASIN":"B077KLGN71","ProductJSON":{"ISBN":"BWBM57921206","ASIN":"B077KLGN71","Title":"Born in Paradise","MasterProductId":"57921206","BookId":"227392442","Author":"","Publisher":null,"PublicationDate":null}}
{"BookBarcode":"P8-CRU-380","PackedLocation":"Reno","Sort":"Never Seen","PalletBarcode":"11\/1\/2023 8:30:14 PM","BookSKUB":"P8-CRU-380","BookSKU":"","ISBN":"BWBM57920894","ASIN":"B0006RX9JG","ProductJSON":{"ISBN":"BWBM57920894","ASIN":"B0006RX9JG","Title":"Sharing your faith with people of other faiths","MasterProductId":"57920894","BookId":"227372026","Author":"","Publisher":null,"PublicationDate":null}}
{"BookBarcode":"O9-ATC-548","PackedLocation":"Reno","Sort":"Never Seen","PalletBarcode":"11\/1\/2023 8:30:14 PM","BookSKUB":"O9-ATC-548","BookSKU":"","ISBN":"BWBM57920846","ASIN":"B001QVTZ1W","ProductJSON":{"ISBN":"BWBM57920846","ASIN":"B001QVTZ1W","Title":"Warbonnet Law (Signet Brand Western, 451-Q5867-095)","MasterProductId":"57920846","BookId":"227365450","Author":"","Publisher":null,"PublicationDate":null}}
{"BookBarcode":"O9-ACL-477","PackedLocation":"Mishawaka","Sort":"Never Seen","PalletBarcode":"11\/1\/2023 8:30:14 PM","BookSKUB":"O9-ACL-477","BookSKU":"","ISBN":"BWBM57920754","ASIN":"B001P8E2R8","ProductJSON":{"ISBN":"BWBM57920754","ASIN":"B001P8E2R8","Title":"Pagan and Christian Rome","MasterProductId":"57920754","BookId":"227361420","Author":"","Publisher":null,"PublicationDate":null}}
{"BookBarcode":"KT-042-281","PackedLocation":"Dunfermline","Sort":"Never Seen","PalletBarcode":"11\/1\/2023 8:30:14 PM","BookSKUB":"KT-042-281","BookSKU":"","ISBN":"BWBM57920797","ASIN":"B000TLRD0U","ProductJSON":{"ISBN":"BWBM57920797","ASIN":"B000TLRD0U","Title":"Principles of Understanding: an Introduction to Logic from the Standpoint of Personal Idealism","MasterProductId":"57920797","BookId":"227363861","Author":"","Publisher":null,"PublicationDate":null}}
{"BookBarcode":"O9-CCW-604","PackedLocation":"Mishawaka","Sort":"Never Seen","PalletBarcode":"11\/1\/2023 8:30:14 PM","BookSKUB":"O9-CCW-604","BookSKU":"","ISBN":"BWBM57920699","ASIN":"B001AIR496","ProductJSON":{"ISBN":"BWBM57920699","ASIN":"B001AIR496","Title":"The Maasai","MasterProductId":"57920699","BookId":"227359752","Author":"","Publisher":null,"PublicationDate":null}}
I think it is scripts/promise_batch_imports.py that converts these to record JSON for import.
This is what that script converts one of the records into:
{'local_id': ['urn:bwbsku:P8-CQU-196'], 'identifiers': {'amazon': ['B000YBOQM8'], 'better_world_books': ['BWBM57921210']}, 'title': 'Revelations of an Opera Manager in 19th Century America', 'authors': [{'name': '????'}], 'publishers': ['????'], 'source_records': ['promise:bwb_daily_pallets_2023-11-02:P8-CQU-196'], 'publish_date': '????'}
Will the ????
placeholder values get overwritten? It looks like for some of these ASINs we won't get any more metadata. There might be a strong case for if we don't have a date or ISBN or other non-ASIN identifier to discard the record entirely as its effectively just a title.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hornc, I am still fiddling with this, but here is how those 10 items might look.
Before:
{"authors": [{"name": "????"}], "identifiers": {"amazon": ["B000YBOQM8"], "better_world_books": ["BWBM57921210"]}, "local_id": ["urn:bwbsku:P8-CQU-196"], "publish_date": "????", "publishers": ["????"], "source_records": ["promise:bwb_daily_pallets_2023-11-02:P8-CQU-196"], "title": "Revelations of an Opera Manager in 19th Century America"}
{"authors": [{"name": "????"}], "identifiers": {"amazon": ["B09CSCZ3RD"], "better_world_books": ["BWBM57921011"]}, "local_id": ["urn:bwbsku:O9-BJH-243"], "publish_date": "????", "publishers": ["????"], "source_records": ["promise:bwb_daily_pallets_2023-11-02:O9-BJH-243"], "title": "Letters from Madame la Marquise de Sevigne. Selected, translated, and introduced by Violet Hammersley. With a preface by W. Somerset Maugham"}
{"authors": [{"name": "????"}], "identifiers": {"amazon": ["B00KFT10UA"], "better_world_books": ["BWBM57921312"]}, "local_id": ["urn:bwbsku:P9-AHG-152"], "publish_date": "????", "publishers": ["????"], "source_records": ["promise:bwb_daily_pallets_2023-11-02:P9-AHG-152"], "title": "A Taste of Heaven - United Methodist Women Church Cookbook, North Carolina Cook Book"}
{"authors": [{"name": "????"}], "identifiers": {"amazon": ["B009NNX69W"], "better_world_books": ["BWBM57921089"]}, "local_id": ["urn:bwbsku:P8-CQP-559"], "publish_date": "????", "publishers": ["????"], "source_records": ["promise:bwb_daily_pallets_2023-11-02:P8-CQP-559"], "title": "Baby Looney Tunes I Play"}
{"authors": [{"name": "????"}], "identifiers": {"amazon": ["B077KLGN71"], "better_world_books": ["BWBM57921206"]}, "local_id": ["urn:bwbsku:P9-AYU-600"], "publish_date": "????", "publishers": ["????"], "source_records": ["promise:bwb_daily_pallets_2023-11-02:P9-AYU-600"], "title": "Born in Paradise"}
{"authors": [{"name": "????"}], "identifiers": {"amazon": ["B0006RX9JG"], "better_world_books": ["BWBM57920894"]}, "local_id": ["urn:bwbsku:P8-CRU-380"], "publish_date": "????", "publishers": ["????"], "source_records": ["promise:bwb_daily_pallets_2023-11-02:P8-CRU-380"], "title": "Sharing your faith with people of other faiths"}
{"authors": [{"name": "????"}], "identifiers": {"amazon": ["B001QVTZ1W"], "better_world_books": ["BWBM57920846"]}, "local_id": ["urn:bwbsku:O9-ATC-548"], "publish_date": "????", "publishers": ["????"], "source_records": ["promise:bwb_daily_pallets_2023-11-02:O9-ATC-548"], "title": "Warbonnet Law (Signet Brand Western, 451-Q5867-095)"}
{"authors": [{"name": "????"}], "identifiers": {"amazon": ["B001P8E2R8"], "better_world_books": ["BWBM57920754"]}, "local_id": ["urn:bwbsku:O9-ACL-477"], "publish_date": "????", "publishers": ["????"], "source_records": ["promise:bwb_daily_pallets_2023-11-02:O9-ACL-477"], "title": "Pagan and Christian Rome"}
{"authors": [{"name": "????"}], "identifiers": {"amazon": ["B000TLRD0U"], "better_world_books": ["BWBM57920797"]}, "local_id": ["urn:bwbsku:KT-042-281"], "publish_date": "????", "publishers": ["????"], "source_records": ["promise:bwb_daily_pallets_2023-11-02:KT-042-281"], "title": "Principles of Understanding: an Introduction to Logic from the Standpoint of Personal Idealism"}
{"authors": [{"name": "????"}], "identifiers": {"amazon": ["B001AIR496"], "better_world_books": ["BWBM57920699"]}, "local_id": ["urn:bwbsku:O9-CCW-604"], "publish_date": "????", "publishers": ["????"], "source_records": ["promise:bwb_daily_pallets_2023-11-02:O9-CCW-604"], "title": "The Maasai"}
After:
{"authors": [{"name": "Max Maretzek"}, {"name": "Poesal"}], "identifiers": {"amazon": ["B000YBOQM8"], "better_world_books": ["BWBM57921210"]}, "local_id": ["urn:bwbsku:P8-CQU-196"], "physical_format": "paperback", "publish_date": "Jan 01, 1968", "publishers": ["DOVER-1957"], "source _records": ["promise:bwb_daily_pallets_2023-11-02:P8-CQU-196"], "title": "Revelations of an Opera Manager in 19th Century America"}
{"authors": [{"name": "Marie de Rabutin-Chantal d S_vign_"}], "identifiers": {"amazon": ["B09CSCZ3RD"], "better_world_books": ["BWBM57921011"]}, "local_id": ["urn:bwbsku:O9-BJH-243"], "physical_format": "hardcover", "publish_date": "Apr 23, 1956", "publishers": ["New York, Harcourt, Brace and Company"], "source_records": ["promise:bwb_daily_pallets_2023-11-02:O9-BJH-243"], "title": "Letters from Madame la Marquise de Sevigne. Selected, translated, and introduced by Violet Hammersley. With a preface by W. Somerset Maugham"}
{"authors": [{"name": "Lebanon Methodist Women"}], "identifiers": {"amazon": ["B00KFT10UA"], "better_world_books": ["BWBM57921312"]}, "local_id": ["urn:bwbsku:P9-AHG-152"], "number_of_pages": 192, "physical_format": "loose leaf", "publish_date": "????", "publishers": ["Morris Press" ], "source_records": ["promise:bwb_daily_pallets_2023-11-02:P9-AHG-152"], "title": "A Taste of Heaven - United Methodist Women Church Cookbook, North Carolina Cook Book"}
{"authors": [{"name": "Editor"}], "identifiers": {"amazon": ["B009NNX69W"], "better_world_books": ["BWBM57921089"]}, "local_id": ["urn:bwbsku:P8-CQP-559"], "physical_format": "hardcover", "publish_date": "Apr 23, 2000", "publishers": ["DS-MAX"], "source_records": ["promise:bwb_daily _pallets_2023-11-02:P8-CQP-559"], "title": "Baby Looney Tunes I Play"}
{"authors": [{"name": "Tempski, Armine von"}], "identifiers": {"amazon": ["B077KLGN71"], "better_world_books": ["BWBM57921206"]}, "local_id": ["urn:bwbsku:P9-AYU-600"], "physical_format": "hardcover", "publish_date": "Apr 23, 1968", "publishers": ["Meredith Press"], "source_records" : ["promise:bwb_daily_pallets_2023-11-02:P9-AYU-600"], "title": "Born in Paradise"}
{"authors": [{"name": "Cooper, David C"}], "identifiers": {"amazon": ["B0006RX9JG"], "better_world_books": ["BWBM57920894"]}, "local_id": ["urn:bwbsku:P8-CRU-380"], "number_of_pages": 273, "physical_format": "unknown binding", "publish_date": "Apr 23, 1996", "publishers": ["David C. Cooper"], "source_records": ["promise:bwb_daily_pallets_2023-11-02:P8-CRU-380"], "title": "Sharing your faith with people of other faiths"}
{"authors": [{"name": "Frank O'Rourke"}], "identifiers": {"amazon": ["B001QVTZ1W"], "better_world_books": ["BWBM57920846"]}, "local_id": ["urn:bwbsku:O9-ATC-548"], "physical_format": "mass market paperback", "publish_date": "Apr 23, 1967", "publishers": ["Signet"], "source_records": ["promise:bwb_daily_pallets_2023-11-02:O9-ATC-548"], "title": "Warbonnet Law (Signet Brand Western, 451-Q5867-095)"}
{"authors": [{"name": "????"}], "identifiers": {"amazon": ["B001P8E2R8"], "better_world_books": ["BWBM57920754"]}, "local_id": ["urn:bwbsku:O9-ACL-477"], "publish_date": "????", "publishers": ["????"], "source_records": ["promise:bwb_daily_pallets_2023-11-02:O9-ACL-477"], "title": " Pagan and Christian Rome"}
{"authors": [{"name": "Henry Sturt"}], "identifiers": {"amazon": ["B000TLRD0U"], "better_world_books": ["BWBM57920797"]}, "local_id": ["urn:bwbsku:KT-042-281"], "physical_format": "hardcover", "publish_date": "Apr 23, 1915", "publishers": ["Cambridge University Press"], "source_records": ["promise:bwb_daily_pallets_2023-11-02:KT-042-281"], "title": "Principles of Understanding: an Introduction to Logic from the Standpoint of Personal Idealism"}
{"authors": [{"name": "????"}], "identifiers": {"amazon": ["B001AIR496"], "better_world_books": ["BWBM57920699"]}, "local_id": ["urn:bwbsku:O9-CCW-604"], "publish_date": "????", "publishers": ["????"], "source_records": ["promise:bwb_daily_pallets_2023-11-02:O9-CCW-604"], "title": "The Maasai"}
Summary of changes:
- 7 books get 'good'
authors
data (and an eighth book getsEditor
as the author). - 7 books get 'good'
publish_date
data. - 7 (or 8) books get 'good'
publishers
data. - A handful of books get some other stuff, such as
number_of_pages
andphysical_format
.
The Amazon Products API used by the affiliate server can return product information for Amazon-specific ASINs that start with `B`. This commit makes changes sufficient to allow `/isbn` to support "ISBNs" (i.e. Amazon-specific ASINs) that start with `B`. The high level description of how this works is that the validation has been modified all through the pipline to allow `B` ASINs, from `/isbn` on through to the validation for importing items from Amazon.
Sometimes promise items with non-ISBN ASINS (e.g. ASINs that start with `B`) don't have the most fulsome metadata. This commit causes such promise items to look to the affiliate server to supplement their data. When promise items are processed by `scripts/promise_batch_imports.py`, any promise items with such non-ISBN ASINs will make a request to the affiliate server ("BookWorm") to `stage` the items for import. Then, when the promise item eventually hits `load()`, it will check the `import_item` table for a matching record. If a match is found, that metadata is added to *empty* fields in the promise item--no promise item metadata is overwritten.
When visiting `/api/books.json?bibkeys=bibkey&high_priority=true`, if a bibkey is a non-ISBN ASIN (i.e. one starting with `B`), then the code will check there's metadata for matching `staged` `B*` bibkey, and if so, it will trigger a import/reimport, which will either: 1. create a new edition, work, etc., based on BookWorm metadata, or; 2. match the existing edition, etc., and supplement the metadata with BookWorm metadata for any emty fields in the original record. See internetarchive#9030.
The upshot of this commit is that it's now easier to import functions, such as `do_import()` from `manage-imports.py`. However, it will also break cron. At the very least the cron to "Add new scans of yesterday to import queue" on `ol-home0` will need updating with something like: `30 4 * * * PYTHONPATH=/openlibrary $PYTHON /openlibrary/scripts..etc`
These classes are being moved prior to code modification so code modifications are easier to see in the `diff`. In the next commit, functions defined before `PrioritizedISBN` will be modified to rely on `PrioritizedISBN`, and `PrioritizedISBN` itself will be modified and renamed.
This commit adds a URL parameter to BookWorm's `/isbn` endpoint such that adding `?stage_import=false` will stop the result from being staged for import. This setting is likely only useful if `high_priority=true`, as the result, if any, will be returned by the endpoint, and to see the result high_priority=true. Note: the exception to the above is if the result is already cached. Then the endpoint will return the result as well.
This takes advantage of `/isbn/<asin>?high_priority=true&stage_import=false` on BookWorm to return metadata without staging anything for import so that the new metadata can supplement the import record associated with the `B*` ASIN (e.g. BWB promise item). The strategy this uses is quite slow, as it goes through the existing `_get_amazon_metadata`, which is not `async`, and therefore each request for `B*` metadata takes at least a second. This may need updating in the future. Another alternative would be to simply stage the BookWorm records for import, and update them in `load()`, after first staging them here in `scripts/promise_batch_imports`
This commit (hopefully) clears up some of the nomenclature around ASINs, ISBN 10s, and ISBN 13s. The solution is to simply create whatever is available of `b_asin`, `isbn_10`, and `isbn_13`, where `b_asin` is a `B*` ASIN from Amazon. Then, a variable `key` is introduced, which is used for querying Amazon via BookWorm, and it is either the value of `isbn_10`, or `b_asin` if `isbn_10` doesn't exist.
This commit moves the metadata supplementing of `B*` ASIN records back to `load()`. This makes the code slightly more clean, centralizes the import logic, and avoids the need to either do slow, non-async http GETs to BookWorm in `promise_batch_imports` to supplement each record before staging, or to substantially modify code to use async requests.
67dc088
to
b3dfddd
Compare
Closes #9030.
/isbn
endpoint to handleB*
non-ISBN ASINs.scripts/promise_batch_imports.py
so it makes a request to/isbn/<isbn>
tostage
the item for import with metadata found on BookWorm.load()
so that if an item has aB*
Amazonsource_record
oridentifier
that load will attempt to supplement the incomingrec
withstaged
BookWorm data (for fields that are empty)./api/books.json?bibkeys=whatever&high_priority=true
augments ASIN-only records with BookWorm metadata.Technical
Note for the reviewer. The commits should be meaningful such that stepping through them and reading the commit messages may be the easiest way to review this. Maybe.
One question: when do we want to look to
staged
BookWorm/Amazon data to supplement empty fields? In the current state of this PR it's doing that if an incomingrec
(i.e. a record to be imported) has aB*
ASIN in eithersource_records
, which it gets if imported via/isbn
, or if it has an ASIN in theamazon
key of anidentifier
(such as in the case of a BWB promise item) and there's astaged
item with a matching ASIN.If this is too broad, one way to restrict it could be attempt supplement only if an ISBN is absent. Currently the attempt to supplement happens if there's a
B*
ASIN insource_records
oridentifiers
and there's astaged
item with a matching ASIN.Stakeholders
@mekarpeles
@hornc
@judec