Skip to content

Commit 00a3d7b

Browse files
authored
Source identifier validations (#1165)
* Source identifier validations Validates that source identifiers are present and unique. This addresses three situations which can occur with source identifiers: 1. A source identifier is missing from a row in the CSV file. 2. A source identifier is duplicated within the same CSV file. 3. A source identifier already exists in the repository, which would result in an update instead of a new record being created. * Update finding existing records Finding existence in the entity table isn't adequate for determining if a record exists in the repository. Additionally the find by identifier wasn't using the correct term to look up. * Split row validation displays Splits row validations into errors or warnings. This allows us to display warnings in the UI without preventing the user from proceeding with the import. The UI will display warnings in a different color and with a warning icon, while errors will still prevent the user from proceeding until they are resolved.
1 parent 3423eb6 commit 00a3d7b

18 files changed

Lines changed: 347 additions & 107 deletions

File tree

app/parsers/concerns/bulkrax/csv_parser/csv_validation.rb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ def validate_csv(csv_file:, zip_file: nil, admin_set_id: nil)
2828

2929
header_issues = check_headers(headers, raw_csv, mapping_manager, mappings, field_metadata, field_analyzer)
3030
missing_required = header_issues[:missing_required]
31-
find_record = build_find_record(mapping_manager, mappings)
31+
find_record = build_find_record
3232
row_errors = run_row_validators(csv_data, all_ids, source_id_key, mappings, field_metadata, find_record)
3333
file_validator = CsvTemplate::FileValidator.new(csv_data, zip_file, admin_set_id)
3434
collections, works, file_sets = extract_hierarchy_items(csv_data, all_ids, find_record, mappings)

app/parsers/concerns/bulkrax/csv_parser/csv_validation_helpers.rb

Lines changed: 12 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -109,10 +109,12 @@ def apply_rights_statement_validation_override!(result, missing_required)
109109

110110
# Assembles the final result hash returned to the guided import UI.
111111
def assemble_result(headers:, missing_required:, header_issues:, row_errors:, csv_data:, file_validator:, collections:, works:, file_sets:) # rubocop:disable Metrics/ParameterLists
112+
row_error_entries = row_errors.select { |e| e[:severity] == 'error' }
113+
row_warning_entries = row_errors.select { |e| e[:severity] == 'warning' }
112114
has_errors = missing_required.any? || headers.blank? || csv_data.empty? ||
113-
file_validator.missing_files.any? || row_errors.any?
115+
file_validator.missing_files.any? || row_error_entries.any?
114116
has_warnings = header_issues[:unrecognized].any? || header_issues[:empty_columns].any? ||
115-
file_validator.possible_missing_files?
117+
file_validator.possible_missing_files? || row_warning_entries.any?
116118

117119
{
118120
headers: headers,
@@ -135,26 +137,26 @@ def assemble_result(headers:, missing_required:, header_issues:, row_errors:, cs
135137
end
136138

137139
# Builds the find_record lambda used by row validators and hierarchy extraction.
138-
def build_find_record(mapping_manager, mappings)
139-
work_identifier = mapping_manager.resolve_column_name(flag: 'source_identifier', default: 'source').first&.to_s || 'source'
140-
work_identifier_search = Array.wrap(mappings.dig(work_identifier, 'search_field')).first&.to_s ||
140+
def build_find_record
141+
all_mappings = Bulkrax.field_mappings['Bulkrax::CsvParser'] || {}
142+
work_identifier = all_mappings.find { |_k, v| v['source_identifier'] == true }&.first || 'source'
143+
work_identifier_search = Array.wrap(all_mappings.dig(work_identifier, 'search_field')).first&.to_s ||
141144
"#{work_identifier}_sim"
142145
->(id) { find_record_by_source_identifier(id, work_identifier, work_identifier_search) }
143146
end
144147

145148
# Attempt to locate an existing repository record by its identifier.
146-
# The identifier may be a Bulkrax source_identifier or a repository object ID.
147-
# This mimics the find behavior of the actual import process, which checks for existing records to determine whether to create or update.
148-
# Since we don't have the full importer context here, we check both the Entry model and the repository directly.
149+
# The identifier may be a repository object ID or a source_identifier property value.
150+
# Checks the repository directly (by ID, then by Solr property search) — a Bulkrax
151+
# Entry record alone is not sufficient, as the object may never have been created.
149152
#
150153
# @param identifier [String]
151154
# @param work_identifier [String] the source_identifier property name (e.g. "source")
152155
# @param work_identifier_search [String] the Solr field for source_identifier (e.g. "source_sim")
153-
# @return [Boolean] true if a matching Entry or repository object is found
156+
# @return [Boolean] true if a matching repository object is found
154157
def find_record_by_source_identifier(identifier, work_identifier, work_identifier_search)
155158
return false if identifier.blank?
156159

157-
return true if Entry.exists?(identifier: identifier, importerexporter_type: 'Bulkrax::Importer')
158160
return true if Bulkrax.object_factory.find_or_nil(identifier).present?
159161

160162
[Bulkrax.collection_model_class, *Bulkrax.curation_concerns].any? do |klass|

app/services/bulkrax/stepper_response_formatter.rb

Lines changed: 25 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -121,7 +121,8 @@ def build_messages
121121
issues << missing_required_issue if @data[:missingRequired]&.any?
122122
issues << unrecognized_fields_issue if @data[:unrecognized]&.any? || @data[:emptyColumns]&.any?
123123
issues << file_references_issue if @data[:fileReferences]&.positive?
124-
issues << row_errors_issue if @data[:rowErrors]&.any?
124+
issues << row_errors_issue if @data[:rowErrors]&.any? { |e| e[:severity] == 'error' }
125+
issues << row_warnings_issue if @data[:rowErrors]&.any? { |e| e[:severity] == 'warning' }
125126

126127
{
127128
validationStatus: validation_status,
@@ -279,20 +280,33 @@ def no_zip_issue
279280
end
280281

281282
def row_errors_issue
282-
filtered = filtered_row_errors
283-
return nil if filtered.empty?
284-
285-
severity = filtered.any? { |e| e[:severity] == 'error' } ? 'error' : 'warning'
286-
icon = severity == 'error' ? 'fa-times-circle' : 'fa-exclamation-triangle'
283+
entries = filtered_row_errors.select { |e| e[:severity] == 'error' }
284+
return nil if entries.empty?
287285

288286
{
289287
type: 'row_level_errors',
290-
severity: severity,
291-
icon: icon,
292-
title: I18n.t('bulkrax.importer.guided_import.stepper_response_formatter.row_errors_issue.title'),
293-
count: filtered.length,
288+
severity: 'error',
289+
icon: 'fa-times-circle',
290+
title: I18n.t('bulkrax.importer.guided_import.stepper_response_formatter.row_errors_issue.title_errors'),
291+
count: entries.length,
292+
description: I18n.t('bulkrax.importer.guided_import.stepper_response_formatter.row_errors_issue.description'),
293+
items: row_error_items(entries),
294+
defaultOpen: false
295+
}
296+
end
297+
298+
def row_warnings_issue
299+
entries = filtered_row_errors.select { |e| e[:severity] == 'warning' }
300+
return nil if entries.empty?
301+
302+
{
303+
type: 'row_level_warnings',
304+
severity: 'warning',
305+
icon: 'fa-exclamation-triangle',
306+
title: I18n.t('bulkrax.importer.guided_import.stepper_response_formatter.row_errors_issue.title_warnings'),
307+
count: entries.length,
294308
description: I18n.t('bulkrax.importer.guided_import.stepper_response_formatter.row_errors_issue.description'),
295-
items: row_error_items(filtered),
309+
items: row_error_items(entries),
296310
defaultOpen: false
297311
}
298312
end

app/validators/bulkrax/csv_row/duplicate_identifier.rb

Lines changed: 40 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -14,24 +14,50 @@ def self.call(record, row_index, context)
1414
first_row = context[:seen_ids][source_id]
1515

1616
if first_row
17-
context[:errors] << {
18-
row: row_index,
19-
source_identifier: source_id,
20-
severity: 'error',
21-
category: 'duplicate_source_identifier',
22-
column: source_id_label,
23-
value: source_id,
24-
message: I18n.t('bulkrax.importer.guided_import.validation.duplicate_identifier_validator.errors.message',
25-
value: source_id,
26-
field: source_id_label,
27-
original_row: first_row),
28-
suggestion: I18n.t('bulkrax.importer.guided_import.validation.duplicate_identifier_validator.errors.suggestion',
29-
field: source_id_label)
30-
}
17+
add_duplicate_error(context, row_index, source_id, source_id_label, first_row)
3118
else
3219
context[:seen_ids][source_id] = row_index
20+
add_existing_warning(context, row_index, source_id, source_id_label)
3321
end
3422
end
23+
24+
def self.add_duplicate_error(context, row_index, source_id, source_id_label, first_row)
25+
context[:errors] << {
26+
row: row_index,
27+
source_identifier: source_id,
28+
severity: 'error',
29+
category: 'duplicate_source_identifier',
30+
column: source_id_label,
31+
value: source_id,
32+
message: I18n.t('bulkrax.importer.guided_import.validation.duplicate_identifier_validator.errors.message',
33+
value: source_id,
34+
field: source_id_label,
35+
original_row: first_row),
36+
suggestion: I18n.t('bulkrax.importer.guided_import.validation.duplicate_identifier_validator.errors.suggestion',
37+
field: source_id_label)
38+
}
39+
end
40+
private_class_method :add_duplicate_error
41+
42+
def self.add_existing_warning(context, row_index, source_id, source_id_label)
43+
find_record = context[:find_record_by_source_identifier]
44+
return unless find_record&.call(source_id)
45+
46+
context[:errors] << {
47+
row: row_index,
48+
source_identifier: source_id,
49+
severity: 'warning',
50+
category: 'existing_source_identifier',
51+
column: source_id_label,
52+
value: source_id,
53+
message: I18n.t('bulkrax.importer.guided_import.validation.existing_source_identifier_validator.warnings.message',
54+
value: source_id,
55+
field: source_id_label),
56+
suggestion: I18n.t('bulkrax.importer.guided_import.validation.existing_source_identifier_validator.warnings.suggestion',
57+
field: source_id_label)
58+
}
59+
end
60+
private_class_method :add_existing_warning
3561
end
3662
end
3763
end
Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
# frozen_string_literal: true
2+
3+
module Bulkrax
4+
module CsvRow
5+
##
6+
# Validates that each row has a value for source_identifier unless
7+
# fill_in_blank_source_identifiers is configured (in which case Bulkrax
8+
# will generate one automatically).
9+
module MissingSourceIdentifier
10+
def self.call(record, row_index, context)
11+
return if Bulkrax.fill_in_blank_source_identifiers.present?
12+
return if record[:source_identifier].present?
13+
14+
source_id_label = context[:source_identifier] || 'source_identifier'
15+
16+
context[:errors] << {
17+
row: row_index,
18+
source_identifier: nil,
19+
severity: 'error',
20+
category: 'missing_source_identifier',
21+
column: source_id_label,
22+
value: nil,
23+
message: I18n.t('bulkrax.importer.guided_import.validation.missing_source_identifier_validator.errors.message',
24+
field: source_id_label),
25+
suggestion: I18n.t('bulkrax.importer.guided_import.validation.missing_source_identifier_validator.errors.suggestion',
26+
field: source_id_label)
27+
}
28+
end
29+
end
30+
end
31+
end

config/locales/bulkrax.de.yml

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -277,7 +277,8 @@ de:
277277
row_errors_issue:
278278
description: "Die folgenden Probleme bestehen bei den Daten in Ihrer CSV-Datei:"
279279
row_label: "Zeile %{row} · %{column}"
280-
title: Zeilenvalidierungsfehler
280+
title_errors: Zeilenvalidierungsfehler
281+
title_warnings: Zeilenvalidierungswarnungen
281282
success:
282283
message: Ihr Importvorgang wird derzeit im Hintergrund verarbeitet. Sie werden benachrichtigt, sobald er abgeschlossen ist.
283284
start_another: Einen weiteren Import starten
@@ -295,13 +296,21 @@ de:
295296
errors:
296297
message: "Doppelter %{field} '%{value}' — erscheint auch in Zeile %{original_row}."
297298
suggestion: "Jeder %{field} muss innerhalb der CSV-Datei eindeutig sein."
299+
existing_source_identifier_validator:
300+
warnings:
301+
message: "'%{value}' entspricht einem vorhandenen Repository-Datensatz — diese Zeile wird ihn aktualisieren."
302+
suggestion: "Falls Sie keinen bestehenden Datensatz aktualisieren wollten, ändern Sie den Wert für %{field}."
298303
failed: Validierung fehlgeschlagen
299304
file_path_not_exist: Der Dateipfad existiert nicht.
300305
file_references_title: Dateiverweise
301306
files_found_in_zip: "%{found} von %{total} Dateien im ZIP-Archiv gefunden."
302307
files_missing_from_zip: "%{count} %{files_word} werden in Ihrer CSV-Datei referenziert, fehlen aber in der ZIP-Datei:"
303308
files_referenced: Die in der CSV-Datei referenzierten %{count}-Dateien wurden beim Import nicht gefunden.
304309
missing_from_zip: fehlt in der Postleitzahl
310+
missing_source_identifier_validator:
311+
errors:
312+
message: "In der Zeile fehlt ein Wert für '%{field}'."
313+
suggestion: "Fügen Sie dieser Zeile einen '%{field}'-Wert hinzu oder konfigurieren Sie Bulkrax, um Quellkennungen automatisch zu generieren."
305314
missing_required_desc: 'Folgende erforderliche Spalten müssen Ihrer CSV-Datei hinzugefügt werden:'
306315
missing_required_hint: Fügen Sie diese Spalte zu Ihrer CSV-Datei hinzu.
307316
missing_required_title: Fehlende Pflichtfelder

config/locales/bulkrax.en.yml

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -304,7 +304,8 @@ en:
304304
row_errors_issue:
305305
description: "The following issues exist with data in your CSV:"
306306
row_label: "Row %{row} · %{column}"
307-
title: Row Validation Errors
307+
title_errors: Row Validation Errors
308+
title_warnings: Row Validation Warnings
308309
success:
309310
message: Your import is now processing in the background. You'll be notified when it's complete.
310311
start_another: Start Another Import
@@ -322,13 +323,21 @@ en:
322323
errors:
323324
message: "Duplicate %{field} '%{value}' — also appears in row %{original_row}."
324325
suggestion: "Each %{field} must be unique within the CSV."
326+
existing_source_identifier_validator:
327+
warnings:
328+
message: "'%{value}' matches an existing repository record — this row will update it."
329+
suggestion: "If you did not intend to update an existing record, change the %{field} value."
325330
failed: Validation Failed
326331
file_path_not_exist: File path does not exist
327332
file_references_title: File References
328333
files_found_in_zip: "%{found} of %{total} files found in ZIP."
329334
files_missing_from_zip: "%{count} %{files_word} referenced in your CSV but missing from the ZIP:"
330335
files_referenced: "%{count} files referenced in CSV not found in import."
331336
missing_from_zip: missing from ZIP
337+
missing_source_identifier_validator:
338+
errors:
339+
message: "Row is missing a value for '%{field}'."
340+
suggestion: "Add a '%{field}' value to this row, or configure Bulkrax to generate source identifiers automatically."
332341
missing_required_desc: 'These required columns must be added to your CSV:'
333342
missing_required_hint: add this column to your CSV
334343
missing_required_title: Missing Required Fields

config/locales/bulkrax.es.yml

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -277,7 +277,8 @@ es:
277277
row_errors_issue:
278278
description: "Existen los siguientes problemas con los datos en su CSV:"
279279
row_label: "Fila %{row} · %{column}"
280-
title: Errores de validación de filas
280+
title_errors: Errores de validación de filas
281+
title_warnings: Advertencias de validación de filas
281282
success:
282283
message: Tu importación se está procesando en segundo plano. Recibirás una notificación cuando finalice.
283284
start_another: Iniciar otra importación
@@ -295,13 +296,21 @@ es:
295296
errors:
296297
message: "%{field} duplicado '%{value}': también aparece en la fila %{original_row}."
297298
suggestion: "Cada %{field} debe ser único dentro del CSV."
299+
existing_source_identifier_validator:
300+
warnings:
301+
message: "'%{value}' coincide con un registro existente en el repositorio — esta fila lo actualizará."
302+
suggestion: "Si no tenía intención de actualizar un registro existente, cambie el valor de %{field}."
298303
failed: Validación fallida
299304
file_path_not_exist: La ruta del archivo no existe
300305
file_references_title: Referencias de archivos
301306
files_found_in_zip: Se encontraron %{found} de %{total} archivos en ZIP.
302307
files_missing_from_zip: "%{count} %{files_word} referenciado en su CSV pero falta en el ZIP:"
303308
files_referenced: Los archivos %{count} referenciados en CSV no se encontraron en la importación.
304309
missing_from_zip: Falta en el código postal
310+
missing_source_identifier_validator:
311+
errors:
312+
message: "Falta un valor para '%{field}' en esta fila."
313+
suggestion: "Añada un valor de '%{field}' a esta fila o configure Bulkrax para generar identificadores de origen automáticamente."
305314
missing_required_desc: 'Estas columnas obligatorias deben agregarse a su CSV:'
306315
missing_required_hint: Añade esta columna a tu CSV
307316
missing_required_title: Campos obligatorios faltantes

config/locales/bulkrax.fr.yml

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -277,7 +277,8 @@ fr:
277277
row_errors_issue:
278278
description: "Les problèmes suivants existent avec les données de votre CSV :"
279279
row_label: "Ligne %{row} · %{column}"
280-
title: Erreurs de validation de ligne
280+
title_errors: Erreurs de validation de ligne
281+
title_warnings: Avertissements de validation de ligne
281282
success:
282283
message: Votre importation est en cours de traitement en arrière-plan. Vous serez averti(e) lorsqu'elle sera terminée.
283284
start_another: Lancer une autre importation
@@ -295,13 +296,21 @@ fr:
295296
errors:
296297
message: "%{field} en double « %{value} » — apparaît également à la ligne %{original_row}."
297298
suggestion: "Chaque %{field} doit être unique dans le CSV."
299+
existing_source_identifier_validator:
300+
warnings:
301+
message: "« %{value} » correspond à un enregistrement existant dans le dépôt — cette ligne le mettra à jour."
302+
suggestion: "Si vous ne souhaitiez pas mettre à jour un enregistrement existant, modifiez la valeur de %{field}."
298303
failed: Échec de la validation
299304
file_path_not_exist: Le chemin d'accès au fichier n'existe pas.
300305
file_references_title: Références de fichiers
301306
files_found_in_zip: "%{found} fichiers sur %{total} trouvés dans le fichier ZIP."
302307
files_missing_from_zip: "%{count} %{files_word} référencé dans votre fichier CSV mais absent du fichier ZIP :"
303308
files_referenced: "%{count} fichiers référencés dans le fichier CSV sont introuvables lors de l'importation."
304309
missing_from_zip: manquant dans le fichier ZIP
310+
missing_source_identifier_validator:
311+
errors:
312+
message: "Il manque une valeur pour '%{field}' dans cette ligne."
313+
suggestion: "Ajoutez une valeur '%{field}' à cette ligne ou configurez Bulkrax pour générer automatiquement des identifiants source."
305314
missing_required_desc: 'Les colonnes suivantes doivent être ajoutées à votre fichier CSV :'
306315
missing_required_hint: Ajoutez cette colonne à votre fichier CSV.
307316
missing_required_title: Champs obligatoires manquants

0 commit comments

Comments
 (0)