Step-by-Step Guide: Importing Data from SharePoint to ARHUT
Overview
This guide provides step-by-step instructions for importing data from a dataset as a spreadsheet in SharePoint into the ARHUT system. The process below is written to streamline the process and ensure the data is consistent, validated, and correctly imported into the ARHUT collection.
Systems and Data Sources
https://tartuulikool.sharepoint.com/
1. SharePoint- Source: The "Biological Sample Masterlist" Excel workbook, with a worksheet MAINTABLE, which contains raw data about biological samples.
- Role: Acts as the primary data source to be imported into ARHUT.
https://db.arh.ut.ee/
2. ARHUT- Collection: E.g. The Biological_Samples collection in ARHUT.
- Role: The target database where the data will be stored and managed.
- Tools: The Directus interface is used to export, import and manage data in ARHUT.
Workflow: Step-by-Step Instructions
Step 1: Access the SharePoint Sheet
- Log in to your SharePoint account and navigate to the "Biological Sample Masterlist".
- Open the sheet and review the data to ensure it is complete and up-to-date.
- Create a new worksheet in the same SharePoint workbook:
- Name the new worksheet MACHINE_TABLE.
- Leave the MACHINE_TABLE worksheet empty for now. This will serve as the formatted and aligned version of the data for ARHUT.
Step 2: Export the Table Schema from ARHUT
- Log in to the ARHUT system through the Directus interface.
- Navigate to the PaleoMIX O.A.D -> Bioloogilised proovid collection.
- Export the table schema to get a blank template:
- Go to the Export (Ekspordi kirjed) option for the collection.
- Ensure the format is in CSV and set LIMIT to 1, to get just one record and titles
- Save the schema or a sample CSV file.
- Open the CSV in your favourite Spreadsheet program and copy the column titles to the new MACHINE_TABLE worksheet in your Excel Workbook:
Step 3: Map and Align the Data in MACHINE_TABLE Worksheet
- Open the file in Excel (or Google Sheets) and perform the following:
- Remove columns that are not coming from the current sheet, e.g., 'user_created', 'user_created' etc
- You can rearrange columns to match the order in the ARHUT template.
- Check data types (e.g., dates, numbers, text) to ensure they align with ARHUT’s requirements.
- Under each column, write a query to pull data from the main sheet in the form: =MAINTABLE!B2:B1300 where X is the column ID in the original MAINTABLE
- In such a form we can get a automatically up-to-date dataset that can be imported to ARHUT
- NOTE- the schema setup in ARHUT and Excel should ensure that duplicates are not updated to ARHUT!
Step 4: Export the Finalized Sheet as a CSV
- Once the data is aligned and cleaned, save the file as a CSV:
- In Excel, choose File > Save As or Download As.
- Select the CSV (Comma Separated Values) format.
Step 5: Validate and Import into ARHUT
- Log in to the ARHUT system (Directus interface).
- Navigate to the Biological_Samples collection.
- Go to the Import section:
- Click on Import Data.
- Upload the prepared CSV file.
- Review the import preview to ensure (Currently not implemented):
- All required fields are populated.
- No errors or mismatches are detected in the data.
- Resolve any warnings or errors (Currently not implemented):
- Update the CSV file and re-upload if necessary.
- Confirm the import to transfer the data into ARHUT.
Post-Import Verification
- Verify the imported data in ARHUT:
- Navigate to the Biological_Samples collection.
- Check a few records to ensure the data appears as expected.
- Report any issues to the technical team if discrepancies are found.
Tips for Success
- Start Small: Test the process with a small subset of data before importing the full dataset.
- Save a Template: Keep a copy of the aligned CSV file as a reusable template for future imports.
- Use Validation Rules: In Excel, apply data validation to flag potential issues (e.g., missing fields, incorrect formats).
- Backup Data: Always save a backup of the original and cleaned data files before importing.