www.fgks.org   »   [go: up one dir, main page]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Augment non-ISBN ASIN BWB records with BookWorm data #8903

Conversation

scottbarnes
Copy link
Collaborator
@scottbarnes scottbarnes commented Mar 13, 2024

Closes #9030.

  • updates the /isbn endpoint to handle B* non-ISBN ASINs.
  • updates scripts/promise_batch_imports.py so it makes a request to /isbn/<isbn> to stage the item for import with metadata found on BookWorm.
  • updates load() so that if an item has a B* Amazon source_record or identifier that load will attempt to supplement the incoming rec with staged BookWorm data (for fields that are empty).
  • make sure /api/books.json?bibkeys=whatever&high_priority=true augments ASIN-only records with BookWorm metadata.

Technical

Note for the reviewer. The commits should be meaningful such that stepping through them and reading the commit messages may be the easiest way to review this. Maybe.

One question: when do we want to look to staged BookWorm/Amazon data to supplement empty fields? In the current state of this PR it's doing that if an incoming rec (i.e. a record to be imported) has a B* ASIN in either source_records, which it gets if imported via /isbn, or if it has an ASIN in the amazon key of an identifier (such as in the case of a BWB promise item) and there's a staged item with a matching ASIN.

If this is too broad, one way to restrict it could be attempt supplement only if an ISBN is absent. Currently the attempt to supplement happens if there's a B* ASIN in source_records or identifiers and there's a staged item with a matching ASIN.

Stakeholders

@mekarpeles
@hornc
@judec

@scottbarnes scottbarnes marked this pull request as draft March 13, 2024 18:57
@scottbarnes scottbarnes force-pushed the feature/make-affiliate-server-look-up-non-isbn-10-asins branch from d22019c to f2f509f Compare March 13, 2024 21:54
@scottbarnes scottbarnes force-pushed the feature/make-affiliate-server-look-up-non-isbn-10-asins branch 8 times, most recently from 46d1710 to 93b7848 Compare April 12, 2024 03:59
@scottbarnes scottbarnes marked this pull request as ready for review April 12, 2024 04:13
@scottbarnes scottbarnes changed the title Feature/make affiliate server look up non isbn 10 asins Augument non-ISBN ASIN BWB records with BookWorm data Apr 12, 2024
@mekarpeles mekarpeles self-assigned this Apr 15, 2024
@mekarpeles mekarpeles added the Priority: 1 Do this week, receiving emails, time sensitive, . [managed] label Apr 15, 2024
For books, Amazon usually returns ISBN 10s as its ASIN, but sometimes,
particularly in the case of ebooks, they may be numbers starting with
`B`, rather than ISBN 10s. This commit keeps the non-ISBN 10 ASIN in the
`source_records: ["amazon:asin"]` line, but prevents them from entering
the `isbn_10` field.

It also prevents `isbn_10` and `isbn_13` from having truthy `[None]`
values.
@scottbarnes scottbarnes force-pushed the feature/make-affiliate-server-look-up-non-isbn-10-asins branch from 93b7848 to 67dc088 Compare April 17, 2024 18:55
Copy link
Member
@mekarpeles mekarpeles left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the code is probably all good, the one potential blocker (isbn_13 -> isbn_10 not always a thing) seems like it shouldn't break anything.

We may be able to dry up the code by having a discrete union of cases where something is exactly either an isbn13, isbn10, or asin, but cannot any more than 1 (e.g. a 1234567890 is an isbn10 and not an asin, B127939 is an asin)

Feel free to try a data class :)

@@ -993,6 +1001,13 @@ def load(rec, account_key=None, from_marc_record: bool = False):

normalize_import_record(rec)

# For recs with a non-ISBN ASIN, supplement the record with BookWorm data.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really understand what BookWorm data is. The code is appears to be looking for pre-exisiting "staged or pending" import items by ASIN regardless of the original source?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will fix that comment because it's actually supplementing with data from the import_item table, which could be ISBNdb data, it could be from Amazon's Product API, or another source. Thanks for pointing this out.

And indeed, for any rec with a B* ASIN, this will try to supplement that rec by using a match from the import_item table. This could easily be more limited if desired.

As for what BookWorm is, that's the name for the affiliate-server.

"""
Call the Amazon API to get the products for a list of isbn_10s and store
each product in memcache using amazon_product_{isbn_13} as the cache key.
"""
logger.info(f"process_amazon_batch(): {len(isbn_10s)} items")
logger.info(f"process_amazon_batch(): {len(isbn_10s_or_asins)} items")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of the Amazon specific terminology in the comments and naming may be a bit confusing, an ASIN is a 10-digit identifier which is often an ISBN-10. Variable names like isbn_10s_or_asins could simply be asin if that's what the Amazon API is returning, but from the surrounding context I'm not sure.

Referring to all ASINs as ISBN-10 is going to be wrong sometimes (unless they are pre-filtered ASINs), but calling them ISBN-10s or ASINs is either redundant, or implies that some of the values are ISBN-10s which do not correspond to ASINs on Amazon.com, but I don't think that's what's happening in this module?

I thought the Amazon API was clear with its terminology and distinguishing between ISBNs and ASINs?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will try to fix up the nomenclature. I agree it's not going to win awards. :)

from openlibrary.core.imports import ImportItem # Evade circular import.

if item := ImportItem.find_staged_or_pending([non_isbn_asin]).first():
rec = json.loads(item.get("data", '{}')) | rec
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR title and description say "augment" and "supplement" but this looks like it simply replaces the record if a match is found. I don't know if it'll have much difference in practice, but it'll help to be clear about what is happening. I don't think anyone has investigated whether the source Amazon record is guaranteed to be complete...
now that I think of it it needs to be augmented because at the very least the BWB SKU and source record from the original record needs to be carried forward to enable the barcode matching we need for scanning. This might be a blocker.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may be repeating what you've just stated is the problem, so if that's the case, my apologies. I just want to get on the same page first.

What the highlighted code section is hopefully is doing is having the incoming rec data clobber fields that exist in both rec and whatever was found in the import_item table by way of ImportItem.find_staged_or_pending().

E.g. in the following, the publish_date from staged_import_item is added to the returned dictionary, but the title is not:

>>> staged_import_item = {"title": "A Great Book %!-", "identifiers": {"amazon": ["B012345678"]} , "publish_date": "2023"}
>>> rec = {"title": "A Great Book", "identifiers": {"amazon": ["B012345678"]}}
>>> staged_import_item | rec
{'title': 'A Great Book', 'identifiers': {'amazon': ['B012345678']}, 'publish_date': '2023'}

In this way, rec should be supplemented with whatever was found in the match from import_item, but no fields in rec should be overwritten.

Copy link
Collaborator
@hornc hornc Apr 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@scottbarnes That | is doing a lot of work, and it's hard to tell how the problem data is going to be processed, I don't have a good picture of what each of the two recs have as keys.

Here are some real samples of promise item data from one of the promise items for testing:

wget https://archive.org/download/bwb_daily_pallets_2023-11-02/DailyPallets__2023-11-02.json

First 10 no-Author no-Date B0* ASINs from promise item bwb_daily_pallets_2023-11-02 :

 sed 's/^\[//;s/},{/}\n{/g;s/\]$//'  DailyPallets__2023-11-02.json | grep B0 | grep '"Author":""' | grep '"PublicationDate":null' | head
{"BookBarcode":"KT-047-072","PackedLocation":"Dunfermline","Sort":"Never Seen","PalletBarcode":"11\/1\/2023 8:30:14 PM","BookSKUB":"KT-047-072","BookSKU":"","ISBN":"BWBM57920566","ASIN":"B0007IX3CM","ProductJSON":{"ISBN":"BWBM57920566","ASIN":"B0007IX3CM","Title":"The economics of \" under-developed \" areas;: An annotated reading list of books, articles, and official publications","MasterProductId":"57920566","BookId":"227351119","Author":"","Publisher":null,"PublicationDate":null}}
{"BookBarcode":"KT-017-548","PackedLocation":"Dunfermline","Sort":"Never Seen","PalletBarcode":"11\/1\/2023 8:30:14 PM","BookSKUB":"KT-017-548","BookSKU":"","ISBN":"BWBM57920580","ASIN":"B001W8W7EQ","ProductJSON":{"ISBN":"BWBM57920580","ASIN":"B001W8W7EQ","Title":"The New Cambridge Bibliography of English Literature","MasterProductId":"57920580","BookId":"227351595","Author":"","Publisher":null,"PublicationDate":null}}
{"BookBarcode":"KT-085-946","PackedLocation":"Dunfermline","Sort":"Never Seen","PalletBarcode":"11\/1\/2023 8:30:14 PM","BookSKUB":"KT-085-946","BookSKU":"","ISBN":"BWBM57920582","ASIN":"B000QV8BMM","ProductJSON":{"ISBN":"BWBM57920582","ASIN":"B000QV8BMM","Title":"The Barbarians: Warriors and Wars of the Dark Ages","MasterProductId":"57920582","BookId":"227351630","Author":"","Publisher":null,"PublicationDate":null}}
{"BookBarcode":"O9-CCW-518","PackedLocation":"Mishawaka","Sort":"Never Seen","PalletBarcode":"11\/1\/2023 8:30:14 PM","BookSKUB":"O9-CCW-518","BookSKU":"","ISBN":"BWBM57920590","ASIN":"B000OSM4I4","ProductJSON":{"ISBN":"BWBM57920590","ASIN":"B000OSM4I4","Title":"The Next Move: Current Events in Bible Prophecy","MasterProductId":"57920590","BookId":"227351789","Author":"","Publisher":null,"PublicationDate":null}}
{"BookBarcode":"O9-CCW-521","PackedLocation":"Mishawaka","Sort":"Never Seen","PalletBarcode":"11\/1\/2023 8:30:14 PM","BookSKUB":"O9-CCW-521","BookSKU":"","ISBN":"BWBM57920592","ASIN":"B0017XK6GW","ProductJSON":{"ISBN":"BWBM57920592","ASIN":"B0017XK6GW","Title":"The Experience","MasterProductId":"57920592","BookId":"227351786","Author":"","Publisher":null,"PublicationDate":null}}
{"BookBarcode":"O9-CCW-524","PackedLocation":"Mishawaka","Sort":"Never Seen","PalletBarcode":"11\/1\/2023 8:30:14 PM","BookSKUB":"O9-CCW-524","BookSKU":"","ISBN":"BWBM57920595","ASIN":"B01JCR2WES","ProductJSON":{"ISBN":"BWBM57920595","ASIN":"B01JCR2WES","Title":"Nandi customary law","MasterProductId":"57920595","BookId":"227351962","Author":"","Publisher":null,"PublicationDate":null}}
{"BookBarcode":"O9-CCW-551","PackedLocation":"Mishawaka","Sort":"Never Seen","PalletBarcode":"11\/1\/2023 8:30:14 PM","BookSKUB":"O9-CCW-551","BookSKU":"","ISBN":"BWBM57920622","ASIN":"B0000EE0HG","ProductJSON":{"ISBN":"BWBM57920622","ASIN":"B0000EE0HG","Title":"VISIONS OF AFRICA","MasterProductId":"57920622","BookId":"227352471","Author":"","Publisher":null,"PublicationDate":null}}
{"BookBarcode":"O9-CCB-679","PackedLocation":"Mishawaka","Sort":"Never Seen","PalletBarcode":"11\/1\/2023 8:30:14 PM","BookSKUB":"O9-CCB-679","BookSKU":"","ISBN":"BWBM57920626","ASIN":"B004GF3UUW","ProductJSON":{"ISBN":"BWBM57920626","ASIN":"B004GF3UUW","Title":"Eusebius Werke, Achter Band: Die Praeparatio Evangelica, Erster Teil: Einleitung, Die Bucher I bis X (Die Griechischen Christlichen Schriftsteller der Ersten Jahrhunderte 8\/1)","MasterProductId":"57920626","BookId":"227352681","Author":"","Publisher":null,"PublicationDate":null}}
{"BookBarcode":"O9-CCW-555","PackedLocation":"Mishawaka","Sort":"Never Seen","PalletBarcode":"11\/1\/2023 8:30:14 PM","BookSKUB":"O9-CCW-555","BookSKU":"","ISBN":"BWBM57920630","ASIN":"B0046GNRKO","ProductJSON":{"ISBN":"BWBM57920630","ASIN":"B0046GNRKO","Title":"TOPICS IN EAST AFRICAN HISTORY 1000-1970.","MasterProductId":"57920630","BookId":"227352779","Author":"","Publisher":null,"PublicationDate":null}}
{"BookBarcode":"O9-CCB-681","PackedLocation":"Mishawaka","Sort":"Never Seen","PalletBarcode":"11\/1\/2023 8:30:14 PM","BookSKUB":"O9-CCB-681","BookSKU":"","ISBN":"BWBM57920633","ASIN":"B001KVTCLG","ProductJSON":{"ISBN":"BWBM57920633","ASIN":"B001KVTCLG","Title":"Hegemonius, Acta Archelai","MasterProductId":"57920633","BookId":"227352891","Author":"","Publisher":null,"PublicationDate":null}}

Random sample of 10:

sed 's/^\[//;s/},{/}\n{/g;s/\]$//'  DailyPallets__2023-11-02.json | grep B0 | grep '"Author":""' | grep '"PublicationDate":null' | shuf -n10
{"BookBarcode":"P8-CQU-196","PackedLocation":"Reno","Sort":"Never Seen","PalletBarcode":"11\/1\/2023 8:30:14 PM","BookSKUB":"P8-CQU-196","BookSKU":"","ISBN":"BWBM57921210","ASIN":"B000YBOQM8","ProductJSON":{"ISBN":"BWBM57921210","ASIN":"B000YBOQM8","Title":"Revelations of an Opera Manager in 19th Century America","MasterProductId":"57921210","BookId":"227392511","Author":"","Publisher":null,"PublicationDate":null}}
{"BookBarcode":"O9-BJH-243","PackedLocation":"York","Sort":"Never Seen","PalletBarcode":"11\/1\/2023 8:30:14 PM","BookSKUB":"O9-BJH-243","BookSKU":"","ISBN":"BWBM57921011","ASIN":"B09CSCZ3RD","ProductJSON":{"ISBN":"BWBM57921011","ASIN":"B09CSCZ3RD","Title":"Letters from Madame la Marquise de Sevigne. Selected, translated, and introduced by Violet Hammersley. With a preface by W. Somerset Maugham","MasterProductId":"57921011","BookId":"227376486","Author":"","Publisher":null,"PublicationDate":null}}
{"BookBarcode":"P9-AHG-152","PackedLocation":"Mishawaka","Sort":"Never Seen","PalletBarcode":"11\/1\/2023 8:30:14 PM","BookSKUB":"P9-AHG-152","BookSKU":"","ISBN":"BWBM57921312","ASIN":"B00KFT10UA","ProductJSON":{"ISBN":"BWBM57921312","ASIN":"B00KFT10UA","Title":"A Taste of Heaven - United Methodist Women Church Cookbook, North Carolina Cook Book","MasterProductId":"57921312","BookId":"227403938","Author":"","Publisher":null,"PublicationDate":null}}
{"BookBarcode":"P8-CQP-559","PackedLocation":"Reno","Sort":"Never Seen","PalletBarcode":"11\/1\/2023 8:30:14 PM","BookSKUB":"P8-CQP-559","BookSKU":"","ISBN":"BWBM57921089","ASIN":"B009NNX69W","ProductJSON":{"ISBN":"BWBM57921089","ASIN":"B009NNX69W","Title":"Baby Looney Tunes I Play","MasterProductId":"57921089","BookId":"227380126","Author":"","Publisher":null,"PublicationDate":null}}
{"BookBarcode":"P9-AYU-600","PackedLocation":"Mishawaka","Sort":"Never Seen","PalletBarcode":"11\/1\/2023 8:30:14 PM","BookSKUB":"P9-AYU-600","BookSKU":"","ISBN":"BWBM57921206","ASIN":"B077KLGN71","ProductJSON":{"ISBN":"BWBM57921206","ASIN":"B077KLGN71","Title":"Born in Paradise","MasterProductId":"57921206","BookId":"227392442","Author":"","Publisher":null,"PublicationDate":null}}
{"BookBarcode":"P8-CRU-380","PackedLocation":"Reno","Sort":"Never Seen","PalletBarcode":"11\/1\/2023 8:30:14 PM","BookSKUB":"P8-CRU-380","BookSKU":"","ISBN":"BWBM57920894","ASIN":"B0006RX9JG","ProductJSON":{"ISBN":"BWBM57920894","ASIN":"B0006RX9JG","Title":"Sharing your faith with people of other faiths","MasterProductId":"57920894","BookId":"227372026","Author":"","Publisher":null,"PublicationDate":null}}
{"BookBarcode":"O9-ATC-548","PackedLocation":"Reno","Sort":"Never Seen","PalletBarcode":"11\/1\/2023 8:30:14 PM","BookSKUB":"O9-ATC-548","BookSKU":"","ISBN":"BWBM57920846","ASIN":"B001QVTZ1W","ProductJSON":{"ISBN":"BWBM57920846","ASIN":"B001QVTZ1W","Title":"Warbonnet Law (Signet Brand Western, 451-Q5867-095)","MasterProductId":"57920846","BookId":"227365450","Author":"","Publisher":null,"PublicationDate":null}}
{"BookBarcode":"O9-ACL-477","PackedLocation":"Mishawaka","Sort":"Never Seen","PalletBarcode":"11\/1\/2023 8:30:14 PM","BookSKUB":"O9-ACL-477","BookSKU":"","ISBN":"BWBM57920754","ASIN":"B001P8E2R8","ProductJSON":{"ISBN":"BWBM57920754","ASIN":"B001P8E2R8","Title":"Pagan and Christian Rome","MasterProductId":"57920754","BookId":"227361420","Author":"","Publisher":null,"PublicationDate":null}}
{"BookBarcode":"KT-042-281","PackedLocation":"Dunfermline","Sort":"Never Seen","PalletBarcode":"11\/1\/2023 8:30:14 PM","BookSKUB":"KT-042-281","BookSKU":"","ISBN":"BWBM57920797","ASIN":"B000TLRD0U","ProductJSON":{"ISBN":"BWBM57920797","ASIN":"B000TLRD0U","Title":"Principles of Understanding: an Introduction to Logic from the Standpoint of Personal Idealism","MasterProductId":"57920797","BookId":"227363861","Author":"","Publisher":null,"PublicationDate":null}}
{"BookBarcode":"O9-CCW-604","PackedLocation":"Mishawaka","Sort":"Never Seen","PalletBarcode":"11\/1\/2023 8:30:14 PM","BookSKUB":"O9-CCW-604","BookSKU":"","ISBN":"BWBM57920699","ASIN":"B001AIR496","ProductJSON":{"ISBN":"BWBM57920699","ASIN":"B001AIR496","Title":"The Maasai","MasterProductId":"57920699","BookId":"227359752","Author":"","Publisher":null,"PublicationDate":null}}

I think it is scripts/promise_batch_imports.py that converts these to record JSON for import.

This is what that script converts one of the records into:

{'local_id': ['urn:bwbsku:P8-CQU-196'], 'identifiers': {'amazon': ['B000YBOQM8'], 'better_world_books': ['BWBM57921210']}, 'title': 'Revelations of an Opera Manager in 19th Century America', 'authors': [{'name': '????'}], 'publishers': ['????'], 'source_records': ['promise:bwb_daily_pallets_2023-11-02:P8-CQU-196'], 'publish_date': '????'}

Will the ???? placeholder values get overwritten? It looks like for some of these ASINs we won't get any more metadata. There might be a strong case for if we don't have a date or ISBN or other non-ASIN identifier to discard the record entirely as its effectively just a title.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hornc, I am still fiddling with this, but here is how those 10 items might look.

Before:

{"authors": [{"name": "????"}], "identifiers": {"amazon": ["B000YBOQM8"], "better_world_books": ["BWBM57921210"]}, "local_id": ["urn:bwbsku:P8-CQU-196"], "publish_date": "????", "publishers": ["????"], "source_records": ["promise:bwb_daily_pallets_2023-11-02:P8-CQU-196"], "title": "Revelations of an Opera Manager in 19th Century America"}
{"authors": [{"name": "????"}], "identifiers": {"amazon": ["B09CSCZ3RD"], "better_world_books": ["BWBM57921011"]}, "local_id": ["urn:bwbsku:O9-BJH-243"], "publish_date": "????", "publishers": ["????"], "source_records": ["promise:bwb_daily_pallets_2023-11-02:O9-BJH-243"], "title": "Letters from Madame la Marquise de Sevigne. Selected, translated, and introduced by Violet Hammersley. With a preface by W. Somerset Maugham"}
{"authors": [{"name": "????"}], "identifiers": {"amazon": ["B00KFT10UA"], "better_world_books": ["BWBM57921312"]}, "local_id": ["urn:bwbsku:P9-AHG-152"], "publish_date": "????", "publishers": ["????"], "source_records": ["promise:bwb_daily_pallets_2023-11-02:P9-AHG-152"], "title": "A Taste of Heaven - United Methodist Women Church Cookbook, North Carolina Cook Book"}
{"authors": [{"name": "????"}], "identifiers": {"amazon": ["B009NNX69W"], "better_world_books": ["BWBM57921089"]}, "local_id": ["urn:bwbsku:P8-CQP-559"], "publish_date": "????", "publishers": ["????"], "source_records": ["promise:bwb_daily_pallets_2023-11-02:P8-CQP-559"], "title": "Baby Looney Tunes I Play"}
{"authors": [{"name": "????"}], "identifiers": {"amazon": ["B077KLGN71"], "better_world_books": ["BWBM57921206"]}, "local_id": ["urn:bwbsku:P9-AYU-600"], "publish_date": "????", "publishers": ["????"], "source_records": ["promise:bwb_daily_pallets_2023-11-02:P9-AYU-600"], "title": "Born in Paradise"}
{"authors": [{"name": "????"}], "identifiers": {"amazon": ["B0006RX9JG"], "better_world_books": ["BWBM57920894"]}, "local_id": ["urn:bwbsku:P8-CRU-380"], "publish_date": "????", "publishers": ["????"], "source_records": ["promise:bwb_daily_pallets_2023-11-02:P8-CRU-380"], "title": "Sharing your faith with people of other faiths"}
{"authors": [{"name": "????"}], "identifiers": {"amazon": ["B001QVTZ1W"], "better_world_books": ["BWBM57920846"]}, "local_id": ["urn:bwbsku:O9-ATC-548"], "publish_date": "????", "publishers": ["????"], "source_records": ["promise:bwb_daily_pallets_2023-11-02:O9-ATC-548"], "title": "Warbonnet Law (Signet Brand Western, 451-Q5867-095)"}
{"authors": [{"name": "????"}], "identifiers": {"amazon": ["B001P8E2R8"], "better_world_books": ["BWBM57920754"]}, "local_id": ["urn:bwbsku:O9-ACL-477"], "publish_date": "????", "publishers": ["????"], "source_records": ["promise:bwb_daily_pallets_2023-11-02:O9-ACL-477"], "title": "Pagan and Christian Rome"}
{"authors": [{"name": "????"}], "identifiers": {"amazon": ["B000TLRD0U"], "better_world_books": ["BWBM57920797"]}, "local_id": ["urn:bwbsku:KT-042-281"], "publish_date": "????", "publishers": ["????"], "source_records": ["promise:bwb_daily_pallets_2023-11-02:KT-042-281"], "title": "Principles of Understanding: an Introduction to Logic from the Standpoint of Personal Idealism"}
{"authors": [{"name": "????"}], "identifiers": {"amazon": ["B001AIR496"], "better_world_books": ["BWBM57920699"]}, "local_id": ["urn:bwbsku:O9-CCW-604"], "publish_date": "????", "publishers": ["????"], "source_records": ["promise:bwb_daily_pallets_2023-11-02:O9-CCW-604"], "title": "The Maasai"}

After:

{"authors": [{"name": "Max Maretzek"}, {"name": "Poesal"}], "identifiers": {"amazon": ["B000YBOQM8"], "better_world_books": ["BWBM57921210"]}, "local_id": ["urn:bwbsku:P8-CQU-196"], "physical_format": "paperback", "publish_date": "Jan 01, 1968", "publishers": ["DOVER-1957"], "source _records": ["promise:bwb_daily_pallets_2023-11-02:P8-CQU-196"], "title": "Revelations of an Opera Manager in 19th Century America"}
{"authors": [{"name": "Marie de Rabutin-Chantal d S_vign_"}], "identifiers": {"amazon": ["B09CSCZ3RD"], "better_world_books": ["BWBM57921011"]}, "local_id": ["urn:bwbsku:O9-BJH-243"], "physical_format": "hardcover", "publish_date": "Apr 23, 1956", "publishers": ["New York, Harcourt, Brace and Company"], "source_records": ["promise:bwb_daily_pallets_2023-11-02:O9-BJH-243"], "title": "Letters from Madame la Marquise de Sevigne. Selected, translated, and introduced by Violet Hammersley. With a preface by W. Somerset Maugham"}
{"authors": [{"name": "Lebanon Methodist Women"}], "identifiers": {"amazon": ["B00KFT10UA"], "better_world_books": ["BWBM57921312"]}, "local_id": ["urn:bwbsku:P9-AHG-152"], "number_of_pages": 192, "physical_format": "loose leaf", "publish_date": "????", "publishers": ["Morris Press" ], "source_records": ["promise:bwb_daily_pallets_2023-11-02:P9-AHG-152"], "title": "A Taste of Heaven - United Methodist Women Church Cookbook, North Carolina Cook Book"}
{"authors": [{"name": "Editor"}], "identifiers": {"amazon": ["B009NNX69W"], "better_world_books": ["BWBM57921089"]}, "local_id": ["urn:bwbsku:P8-CQP-559"], "physical_format": "hardcover", "publish_date": "Apr 23, 2000", "publishers": ["DS-MAX"], "source_records": ["promise:bwb_daily _pallets_2023-11-02:P8-CQP-559"], "title": "Baby Looney Tunes I Play"}
{"authors": [{"name": "Tempski, Armine von"}], "identifiers": {"amazon": ["B077KLGN71"], "better_world_books": ["BWBM57921206"]}, "local_id": ["urn:bwbsku:P9-AYU-600"], "physical_format": "hardcover", "publish_date": "Apr 23, 1968", "publishers": ["Meredith Press"], "source_records" : ["promise:bwb_daily_pallets_2023-11-02:P9-AYU-600"], "title": "Born in Paradise"}
{"authors": [{"name": "Cooper, David C"}], "identifiers": {"amazon": ["B0006RX9JG"], "better_world_books": ["BWBM57920894"]}, "local_id": ["urn:bwbsku:P8-CRU-380"], "number_of_pages": 273, "physical_format": "unknown binding", "publish_date": "Apr 23, 1996", "publishers": ["David C. Cooper"], "source_records": ["promise:bwb_daily_pallets_2023-11-02:P8-CRU-380"], "title": "Sharing your faith with people of other faiths"}
{"authors": [{"name": "Frank O'Rourke"}], "identifiers": {"amazon": ["B001QVTZ1W"], "better_world_books": ["BWBM57920846"]}, "local_id": ["urn:bwbsku:O9-ATC-548"], "physical_format": "mass market paperback", "publish_date": "Apr 23, 1967", "publishers": ["Signet"], "source_records": ["promise:bwb_daily_pallets_2023-11-02:O9-ATC-548"], "title": "Warbonnet Law (Signet Brand Western, 451-Q5867-095)"}
{"authors": [{"name": "????"}], "identifiers": {"amazon": ["B001P8E2R8"], "better_world_books": ["BWBM57920754"]}, "local_id": ["urn:bwbsku:O9-ACL-477"], "publish_date": "????", "publishers": ["????"], "source_records": ["promise:bwb_daily_pallets_2023-11-02:O9-ACL-477"], "title": " Pagan and Christian Rome"}
{"authors": [{"name": "Henry Sturt"}], "identifiers": {"amazon": ["B000TLRD0U"], "better_world_books": ["BWBM57920797"]}, "local_id": ["urn:bwbsku:KT-042-281"], "physical_format": "hardcover", "publish_date": "Apr 23, 1915", "publishers": ["Cambridge University Press"], "source_records": ["promise:bwb_daily_pallets_2023-11-02:KT-042-281"], "title": "Principles of Understanding: an Introduction to Logic from the Standpoint of Personal Idealism"}
{"authors": [{"name": "????"}], "identifiers": {"amazon": ["B001AIR496"], "better_world_books": ["BWBM57920699"]}, "local_id": ["urn:bwbsku:O9-CCW-604"], "publish_date": "????", "publishers": ["????"], "source_records": ["promise:bwb_daily_pallets_2023-11-02:O9-CCW-604"], "title": "The Maasai"}

Summary of changes:

  • 7 books get 'good' authors data (and an eighth book gets Editor as the author).
  • 7 books get 'good' publish_date data.
  • 7 (or 8) books get 'good' publishers data.
  • A handful of books get some other stuff, such as number_of_pages and physical_format.

The Amazon Products API used by the affiliate server can return product
information for Amazon-specific ASINs that start with `B`. This commit
makes changes sufficient to allow `/isbn` to support "ISBNs" (i.e.
Amazon-specific ASINs) that start with `B`.

The high level description of how this works is that the validation has
been modified all through the pipline to allow `B` ASINs, from `/isbn`
on through to the validation for importing items from Amazon.
Sometimes promise items with non-ISBN ASINS (e.g. ASINs that start with
`B`) don't have the most fulsome metadata. This commit causes such
promise items to look to the affiliate server to supplement their data.

When promise items are processed by `scripts/promise_batch_imports.py`,
any promise items with such non-ISBN ASINs will make a request to the
affiliate server ("BookWorm") to `stage` the items for import.

Then, when the promise item eventually hits `load()`, it will check the
`import_item` table for a matching record. If a match is found, that
metadata is added to *empty* fields in the promise item--no promise item
metadata is overwritten.
When visiting `/api/books.json?bibkeys=bibkey&high_priority=true`, if
a bibkey is a non-ISBN ASIN (i.e. one starting with `B`), then the code
will check there's metadata for matching `staged` `B*` bibkey, and if
so, it will trigger a import/reimport, which will either:
1. create a new edition, work, etc., based on BookWorm metadata, or;
2. match the existing edition, etc., and supplement the metadata with
   BookWorm metadata for any emty fields in the original record.

See internetarchive#9030.
The upshot of this commit is that it's now easier to import functions,
such as `do_import()` from `manage-imports.py`.

However, it will also break cron. At the very least the cron to "Add new
scans of yesterday to import queue" on `ol-home0` will need updating
with something like:
`30 4 * * * PYTHONPATH=/openlibrary $PYTHON /openlibrary/scripts..etc`
These classes are being moved prior to code modification so code
modifications are easier to see in the `diff`.

In the next commit, functions defined before `PrioritizedISBN` will be
modified to rely on `PrioritizedISBN`, and `PrioritizedISBN` itself will
be modified and renamed.
This commit adds a URL parameter to BookWorm's `/isbn` endpoint such
that adding `?stage_import=false` will stop the result from being staged
for import. This setting is likely only useful if `high_priority=true`,
as the result, if any, will be returned by the endpoint, and to see the
result high_priority=true.

Note: the exception to the above is if the result is already cached.
Then the endpoint will return the result as well.
This takes advantage of
`/isbn/<asin>?high_priority=true&stage_import=false` on BookWorm to
return metadata without staging anything for import so that the new
metadata can supplement the import record associated with the `B*` ASIN
(e.g. BWB promise item).

The strategy this uses is quite slow, as it goes through the existing
`_get_amazon_metadata`, which is not `async`, and therefore each request
for `B*` metadata takes at least a second. This may need updating in
the future.

Another alternative would be to simply stage the BookWorm records for
import, and update them in `load()`, after first staging them here in
`scripts/promise_batch_imports`
This commit (hopefully) clears up some of the nomenclature around ASINs,
ISBN 10s, and ISBN 13s.

The solution is to simply create whatever is available of `b_asin`,
`isbn_10`, and `isbn_13`, where `b_asin` is a `B*` ASIN from Amazon.

Then, a variable `key` is introduced, which is used for querying Amazon
via BookWorm, and it is either the value of `isbn_10`, or `b_asin` if
`isbn_10` doesn't exist.
This commit moves the metadata supplementing of `B*` ASIN records back
to `load()`.

This makes the code slightly more clean, centralizes the import logic,
and avoids the need to either do slow, non-async http GETs to BookWorm
in `promise_batch_imports` to supplement each record before staging, or
to substantially modify code to use async requests.
@scottbarnes scottbarnes force-pushed the feature/make-affiliate-server-look-up-non-isbn-10-asins branch from 67dc088 to b3dfddd Compare April 26, 2024 14:50
@scottbarnes scottbarnes changed the title Augument non-ISBN ASIN BWB records with BookWorm data Augment non-ISBN ASIN BWB records with BookWorm data Apr 26, 2024
@mekarpeles mekarpeles merged commit 2858ee2 into internetarchive:master Apr 26, 2024
3 checks passed
@scottbarnes scottbarnes deleted the feature/make-affiliate-server-look-up-non-isbn-10-asins branch April 27, 2024 16:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Priority: 1 Do this week, receiving emails, time sensitive, . [managed]
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Augment non-ISBN ASIN BWB records with BookWorm data
3 participants