Data Catalog Questions

  • What should happen when existing file is re-registered?
    • Duplicate entry (what happens now)
    • Error
    • Replace
    • New version
  • If support multiple versions what options to we give user
    • Replace (purge old version)
    • Add next version
    • Add explicit version (if so do we enforce > existing version?)
    • What impact does this have on search performance
  • When crawler finds a missing file what should it do (flag entry, hide entry, delete entry)
  • When user deletes a dataset should we give option to also delete files?
    • possibly by deleting the virtual folder which contains it
    • or indirectly by deleting pipeline task
  • No labels