Community Batch Imports endpoint#8122
Conversation
c721508 to
fb0f567
Compare
|
Converting this to a draft for now since it's kind of in limbo! |
fb0f567 to
f0ce028
Compare
This:
- still has debugging lines that need to be removed;
- creates an endpoint at /api/import?batch=true that reads JSON data;
- uses an ISBN (preferring ISBN 13) for the ia_id;
- uses the `Batch` class for imports;
- relies on there being a 'submitter' column in thte import_item table;
- only works for people with `can_write()` privileges; and
- generates a hash based on the incoming JSON byte string to use as the
name of the import batch.
To use:
curl -X POST http://localhost:8080/api/import\?batch\=true -H \
"Content-Type: application/json" -H "Cookie: $OL_COOKIE" -d \
'[{"title": "test book 1", "isbn_10": "test_1"}, {"title": "test book 2", "isbn_13": "test_2"}]'
f0ce028 to
23a7dd8
Compare
cdrini
left a comment
There was a problem hiding this comment.
Nice! This looks great! A few code fixes, and one annoying rename :P
| ] | ||
|
|
||
| # Create the batch | ||
| batch = Batch.find(batch_name) or Batch.new(name=batch_name, submitter=username) |
There was a problem hiding this comment.
I think in the future we should move away from treating batch_name like a unqiue string; it should just be effectively a comment. The batch id is what should be... the id :P But future PR problem, I think there were some good reasons why we did this.
| @@ -31,17 +30,28 @@ | |||
|
|
|||
|
|
|||
| class Batch(web.storage): | |||
There was a problem hiding this comment.
Todo in a future PR: make this not extend web.storage
Adding feedback from CR Co-authored-by: Drini Cami <cdrini@gmail.com> [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci
042d27f to
ce52c90
Compare
ec84040 to
fdcf2c4
Compare
cdrini
left a comment
There was a problem hiding this comment.
Nice lgtm! Tested it erroring correctly; and tested it actually queueing items 🥳 Note I didn't see it actually get imported; caused around this time of a month we have a big import that goes through, so it'll take a whiiile for it to actually go through (note not ideal UX :/ but problem for another time and should only affect ~15-20th of every month).
Closes #7705
Feature.
This adds a (currently admin-only) endpoint at
/import/batch/newfor community batch imports.Technical
This:
/import/batch/newthat takes JSONL data via a multipart POST;#as a comment in JSONL, just to make life a bit easier.recfor import viaload();source_records[0]for theia_id;Batchclass for imports;import_itemtable;/usergroup/admingroup; andnameof the import batch (so that an attempted import of the same items will get the same name). Note: this isn't super sophisticated as the hash would change if the order changed.Some outstanding questions:
/import/batch/newand/api/import/bulk; which do we want to use, if either?Batch.add_items(), what should that limit be?This was also tested against some of the data from @Billa05's work in #8551, found here: #8551 (comment)
Testing
Visit http://localhost:8080/import/batch/new

Try to upload a JSONL file with some errors:
{"title": "Blob Book 1", "source_records": "blob_source", "authors": [{"name": "Blob Author 1"}], "publishers": "Fail Publishers", "publish_date": "January 1, 2000", "isbn_10": "1111111111"} {'blah': True} {"title": "Blob Book 2", "source_records": "blob_source", "authors": [{"name": "Blob Author 2"}], "publishers": "Fail Publishers", "publish_date": "January 2, 2000", "isbn_10": "2222222222"} {"source_records": ["blob_source"], "authors": [{"name": "Blob Author 2"}], "publishers": ["Not Fail Publishers"], "publish_date": "January 2, 2000", "isbn_10": "2222222222"} {"title": "Blob Book 3", "source_records": ["blob_source"], "authors": [{"name": "Blob Author 2"}], "publishers": ["Not Fail Publishers"], "publish_date": "January 2, 2040", "isbn_10": "2222222222"} {"identifiers": {"open_textbook_library": ["1581"]}, "source_records": ["open_textbook_library:1581"], "title": "Legal Fundamentals of Healthcare Law", "languages": ["eng"], "description": "Healthcare, a field dedicated to the well-being of individuals and communities, operates within an intricate web of legal principles. Understanding these laws is not simply a professional necessity for doctors, nurses, administrators, and researchers; it\u2019s also an ethical imperative for anyone who interacts with the healthcare system. This book is your compass, guiding you through the labyrinth of legal fundamentals that shape the landscape of healthcare.", "subjects": ["Medicine", "Law"], "publishers": ["University of West Florida Pressbooks"], "publish_date": "2024", "authors": [{"name": "Tiffany Jackman"}], "lc_classifications": ["RA440", "KF385.A4"]} {"identifiers": {"open_textbook_library": ["1580"]}, "source_records": ["open_textbook_library:1580"], "title": "Introduction to Literature: Fairy Tales, Folk Tales, and How They Shape Us", "languages": ["eng"], "description": "Introduction to Literature: Fairy Tales, Folk Tales, and How They Shape Us introduces college freshmen to the study of literature through a focus on texts that, generally, they already know, or think they know, and how those texts aim to shape audiences to be compliant cultural objects. The book is organized around several prominent story groups, including various genres and forms, meant to promote discussion and discovery leading to students\u2019 understanding that these texts function as cultural sculptors of readers\u2019 principles and behaviors. Students develop the skill of analyzing texts and creating sound arguments about them through class discussions and a series of writing assignments. Ideally, they leave the course understanding how to create a sound argument and, more pointedly, that there is no such thing as \u201cjust a story.\u201d", "subjects": ["Humanities", "Literature, Rhetoric, and Poetry"], "publishers": ["University of West Florida Pressbooks"], "publish_date": "2023", "authors": [{"name": "Judy Young"}], "lc_classifications": ["PE1408"]}See that the validation errors are displayed to the patrons:

Fix the errors by... commenting out the problematic lines:
Verify the records are not in the
import_itemtable:Then try to upload again:

See that the items ended up in
import_item:Try to upload the same file again and see that the duplicates are caught and reported, and that the total tried, added, and not queued are correct, along with the duplicate items:
