Data Ingestion: Imports

Modified on Fri, 22 Dec, 2023 at 3:51 PM

TABLE OF CONTENTS

Introduction
- Key Terms
Considerations Before you Begin
Integrations vs. Imports
Navigation
Create an Import
Locate your Import
Edit an Import
Import Run Logs
Next
Glossary of Key Terms

Introduction

Data imports is the process of bringing external data into TRP, this is the first and pivotal stage in TRP. This external data can come from various sources, including third-party software, spreadsheets, databases such as MySQL or SQL Server, or by events posted to TRP.

Data imports in TRP enable users to add new data, update existing records (Load), or perform data transformation (Transform) from various sources (Extract).

Key Terms

Dataset

Dataset is a collection of one kind of data. It will have consistent columns and one or more rows of data. TRP organizes things under independent datasets. Dashboards/ filters/metrics all live under the context of a dataset.

A data import can either create a new dataset or can bring data into an existing dataset. Key Assumption: In the second scenario the incoming data from multiple imports need to match exactly in data columns and their types.

Entity

An entity in TRP groups different datasets together and further provides TRP the context for incoming data. This helps TRP determine what type of analytics the user is interested in. Encounter, Claims, Vidyo CDR, Procedures are examples of entities. Here Encounter represents the provider-patient consult while Claim is the charge posted to patient or insurance and the payment received for that charge.

Considerations Before you Begin

Below are some key points to keep in mind the following before starting an import:

1. Data Quality Assessment: Before importing data, ensure that the data is accurate and formatted correctly to prevent any issues during the import process.

2. Data Validation: Implement validation rules to check the integrity of the data being imported. This includes checking for duplicates, missing values, and inconsistencies.

3. User Training: Train users and administrators on correctly performing data imports. This includes understanding the import process, data mapping, and validation.

Integrations vs. Imports

Data can be brought into TRP via Integrations or Imports, however, there is a key difference. Imports act at the level of a single dataset and also are only responsible for bringing the data into TRP. Integration is a high-level functionality that creates one or multiple imports and datasets along with dashboards, metrics, and filters. Integrations are reserved for third-party systems e.g. Zoom, Vidyo, Teams, etc.

Step 1: Go to Imports Page

Log into the TRP system using your credentials. And click on "Imports and Integrations" from left nav bar.

Step 2: New Imports V2

Click on Imports and then select “New Import V2” to begin a new import.

Create an Import

The next steps walk through creating a new import.

Step 3: Dataset Creation

At this step, you can either choose to create a new dataset OR route data to an existing dataset.

In case of a new dataset, enter the name of your new dataset in the field.
For the existing dataset, select the name of the existing dataset from the dropdown.

Then select the ‘Tag Entity’ which helps TRP group together different datasets.

Step 4: ETL

Select the option “full load”, when a full data refresh is needed every time data needs to be imported. If not, leave it as is.

Step 5: Connection Details

Data Connection and Source Types

The next few fields users will be required to tell TRP where the data resides and how to connect to the data source. Depending on the source type appropriate fields are presented to the user to fill out.

TRP can ingest data from the following data sources.:

Source 1: Via CSV
Source 2: Via Database
- SQL server
- MYSQL server
- Postgres server
Source 3: Via Vidyo
Source 4: Via Zoom
Source 5: Via MS Teams
Source 6: POST Data (Webhook)
Source 7: TRP Database
Source 8: Redcap

The selection of source type determines the next steps of import. In this document, we’ll proceed with Import via CSV as an example. For a full list of source types, please refer to the document on Import Types.

Source 1: Import via CSV

Step 5.1. Select CSV

Once all the above-mentioned details are filled in, select CSV from the dropdown menu of the data source field.

Upload your data file

Select the data delimiter/separator that separates the data into columns in your file. Most CSV files are ‘comma’ separated but sometimes other separators like tabs are also used. Generally opening up the file and investigating will help you determine the separator..

Enter the number of Header Row. This is the row that contains the column names/titles. Generally it's the first row however, in some cases the data may not start from the first row.
Enter the number of First Data Row- which means you need to enter the number of the row that holds data after the column titles row.

(Example shown below in a CSV file)

Once done, click on the Save button located at the bottom right corner, and this will successfully Import the data from a CSV file into the TRP system.
After clicking save, the next step is “Settings”.

Step 6: Data Settings

After setting up the connection, TRP brings in sample data from the data source (in case of CSV the data sample is all the data in the file). In this step, TRP does the following:

TRP behind the scenes assumes the data types for each data column based on the sample data
Establish labels for different data columns
Identifying unique columns if incremental data loading, so TRP can distinguish between data updates and new data.
Establish a timezone for the dates in the incoming data (default to UTC)

Step 6.1

Select the timezone of the datetime fields in the data.

Many software export data in UTC even though they may display local time zones. Please ensure the timezone of the data brought into TRP is correct, otherwise, the time-based analysis would return different results.

Step 6.2.

TRP assumes the data column labels from the data source, in case of CSV from the header row. These labels are editable.

Step 6.3

The last essential step is to choose at least one unique identifier. Simply check mark the unique columns in the "Unique Identifier" section. Users can select one or multiple unique IDs.

Finally, click the "Save" button in the bottom right corner to save your label settings successfully.

Step 7: Start Import run

This is the last step of imports.

Click on Start Import to successfully run the import process which will bring in the data This step is not required. Creating an import only creates a dataset with no data yet, however running the import will populate data in this dataset.

Once the Import is started successfully, wait for the Import completion message.

This concludes the creation of your first Import in TRP.

Locate your Import

To locate the newly imported dataset, click on the “List of Imports” button.
This will navigate users to all imported datasets where the users can locate their most recent dataset by navigating across the pagination given at the bottom of the list.

Furthermore, to locate your datasets in TRP, click on the entity selected from the left nav bar of TRP.

Tip: This is where the users can start creating their dashboards.

Edit an Import

TRP import can be edited to make the following changes.

Data Updates: You have new data columns
Time Zone Adjustments: Change the time zone settings for the data.
Customizing Labels: Modify label names to improve data reporting.

To edit details from an existing successful import:

Go to your imports list.
Choose the import that you want to edit and click on the ‘Edit’ button in the actions columns, which brings you to the same page as the initial import.
Make the necessary changes in the connection details
Next, you can update the time zone, or labels and click Save.
The import is updated now.

Import Run Logs

Users can view the logs to see who viewed or made changes to the import.

Follow the steps below to open logs and detail information:

Go to the list of imports.
Choose the desired import and click on its actions button.
In the actions column, there are two options,
1. “Logs” button to view the logs of import i.e., user information or the system information of the import

Learn about the different data sources from where TRP can ingest data. Import: Data Sources Guide

Learn about how to run data ingestion periodically or just once in our article Import Runs in TRP.

Glossary of Key Terms

CSV File: A common format for data storage.
Data Connection: How TRP connects to the data source.
Data Imports: Bringing external data into TRP.
Data Labels: Data labeling is the process of adding tags or labels to raw data such as images, videos, text, and audio. Data labels act as a reference point, which helps categorize new, unlabeled data in the right fields.
Data Quality Assessment: Ensuring accurate and well-formatted data.
Data Separator: Character that separates data columns. It could be a comma, space, semicolon, tab, or any other character.
Data Settings: Configuring data column labels, unique columns, and time zones.
Data Validation: Rules to check data integrity.
Data Warehouse: A central storage for cleaned and organized data.
Database Source Type: Connecting to a database.
Dataset: A collection of data related to a specific area.
ETL (Extract, Transform, Load): The process of combining data from various sources into a central repository.
Entity: An entity in TRP adds context to healthcare data, enabling features like tagging providers, specialties, and automated chart creation for encounters.
First Data Row: The row with data following the column titles.
Header Row: The row containing column titles.
Imports: Bringing data into TRP at the dataset level.
Port: The server port for the database connection.
Query: Extracting data from a database.
Timezone Setting: Time zone settings in TRP allow configuration of the specific time zone that should be used for recording, processing, and displaying time-related data in the database. (Accurate time zone settings are crucial to creating reports dealing with scheduling, logging, and real-time data).
Unique Column: Unique columns are specific columns within a database where each value in the column is distinct and does not repeat. (Unique columns are essential for preventing data duplication and ensuring accuracy in data management and analysis.)