Import: Data Sources Guide

Modified on Mon, 22 Jul, 2024 at 10:09 AM

TABLE OF CONTENTS

CSV Source Type
Databases source type
Vidyo
Zoom
MS Teams
TRP Data Source
Post Data Webhook
RedCap
Quick links
Glossary of Key Term

Imports: Source Types in TRP

Since data in TRP can originate from multiple sources, including third-party software, and databases like MySQL or SQL Server, this document is a guide on how to connect to each data source.

If you’re not familiar with Imports in TRP, please refer to the previous article by clicking here.

TRP currently supports the following data sources.

Source 1: Via CSV
Source 2: Via Database
- SQL server
- MYSQL server
- Postgres server
Source 3: Via Vidyo
Source 4: Via Zoom
Source 5: Via MS Teams
Source 6: POST Data (Webhook)
Source 7: TRP Database
Source 8: Redcap

Source 1: CSV Source Type

Please refer to the Data Ingestion document for details on the CSV source Type.

Source 2: Import via Database source type

TRP supports three types of database:

SQL Server
MySQL
PostgreSQL

The connection details for all the three databases is the same. There are 2 fields to determine the address (server address and port), one for the database name (Database) and 2 fields (username, password) are the user credentials that have access to the data. More details on each field is presented below.

Username: is the username of the user that has access to the data in the server
Password: is the password of the user accessing the database
Database: name of the database on the server
Port: Port of the server where connection can be made.

Default ports are as such:

MySQL is 3306
1433 for SQL Server
5432 for PostGres

Server Address: is the address of the server, either hostname or IP address will work

Currently, TRP does not support connections with Oracle DB.

Data Pull Strategy

TRP utilizes a user provided query to fetch the data which is input in the Enter search query field. This allows the user to extract data across multiple tables to create their dataset and also include any calculations along with the data.

Successive Data Pulls

TRP fetches the data exactly as dictated by the query and does not process the query itself to determine updates to the data resulted by the query. Hence, the query should be written such that every time the query is run, it gets the latest data from the database.

Full Load

For a full load, i.e. dataset gets wiped every time the import runs and all the data is fetched again, a static query can be utilized. This query should get all the data needed for analytics in the dataset. This strategy is especially useful when the compiled data from the query does not seem to have any obvious identifier to determine the updated rows.

Incremental Load

With very large datasets, generally incremental loads are used ie. TRP should only get the data that is updated. For this purpose the query should be dynamic based on some date time column. An example of such a query would be:

SELECT * 
FROM db.table
WHERE column_date > NOW() - INTERVAL 1 DAY;

In the above example if column_date is a date time column that tracks the update time to a data row, then the given query will always get the new data since yesterday.

Data Type Inference

TRP retrieves data from the database in chunks and will determine the data types from the data example from the first chunk of the data. TRP does not depend on the data type in the source database and rather works out the data types from actual data.

Source 3: Import via Vidyo

This data source is Vidyo CDR. Vidyo is a video conferencing system that keeps its call records in a CDR (call detail record) which is a MySQL database. Hence, many connection details are the same as a database connection.

This import is automatically created by the Vidyo Integration and users may seldom need to create this import directly.

Fields and Selection

To import Vidyo CDR data, the Tag Entity field from the dropdown is selected to be Vidyo CDR. The data to be entered in the connection fields can be determined by following the Vidyo integration guide.

Data Pull Strategy

TRP uses a built-in query to fetch the data from Vidyo CDR. Further the data pull strategy is also fixed to incremental where new data is determined based on the Call ID field. TRP uses the following method to determine updates to the data:

New Call ID with the Call Status is Completed
All the data where the call status is ‘In Progress’

Source 4: Import via Zoom

Zoom is a popular video conferencing system. TRP integrates with zoom to get multiple data elements that can support call analytics for utilization reports, debugging, quality assurance, license optimization etc. The integration includes getting the following data:

Meetings: each data row is one meeting
Participants: a single data row is a participant in the meeting
Quality of Service: A single data row is 1 minute of QoS metrics for one participant in a call
User: all zoom users in the account along with their last login attempts

These zoom imports are automatically created by the zoom integration and a user will seldom need to create the individual zoom import themselves. Click here to learn about how to add Zoom Instance to TRP.

Source 5: Import via MS Teams

Just like Zoom, TRP can integrate with MS Teams which is a popular Microsoft video conferencing system. TRP can track

Participants: a single data row represents a participant in the meeting. A participant can be a user, device, application or guest.
Call segments: a single data row represents a portion of a User-User communication or a User-Meeting communication in the case of a Conference call. A typical VOIP call has one segment per session. A session represents a User-User communication or a User-Meeting communication in the case of a conference call.
Media: a single data row represents the media (audio, video, video-based screen-sharing, etc.) used in a call.

These imports are created automatically by the MS Teams integration and a user will seldom need to create it by themselves. Click here to learn about how to add Microsoft Teams Instance to TRP.

Source 6: TRP Data Source

New datasets can be formed by combining or extracting data from the current datasets in TRP. This allows transformation on any dataset or combines data from multiple datasets.

Connection and Fields

To use TRP Data source, there should already be existing data available in the system.

Data Source is TRP Database
Base Dataset This is the base data used for our data extraction
Query This query section allows users to write queries to create the new data extract. Users can use /c and /t to refer to the data columns or table respectively

Data Pull Strategy

This part acts just like any database connection, however in this case the database is TRP itself. Here the query follows the same data pull strategy as the one described in the database source type section.

Source 7: Post Data Webhook

TRP can ingest events data from third party systems. Many software like Zoom can emit data on events e.g. when a call starts. TRP can capture any arbitrary data of software sent via such events. This is often called a webhook where the third party system can send data to TRP.

TRP can also integrate with enterprise softwares where a real time integration can be supported by this data webhook type import.

This import supports JSON data load from such events. At the end of the import, TRP will generate a link which is copy and pasted in the third party system to subscribe to its events.

Fields and Connection

Select POST Data - Wehbook from the data source dropdown field.
Next, select the API user OR Basic Auth from the dropdown list.
Enter JSON Data example with the correct values.

{ 'data': 
    { 'field1_label':'field1_value', 
      'field2_label':'field2_value', 
       ... 
    }
}

The field value expected should be in the correct data type as TRP will use the field value to decide on the data type of that column. E.g.

{ 'data': 
    { 'Encounter Number': 'N2232', 
      'Patient Age': 35,
      'Encounter Datetime': '2024-01-01 11:30 AM', 
       ... 
    }
}

Authentication

TRP supports two types of authentication for such data events. The API user is a JWT token created by TRP. A new API user can be added by adding a new API user from the Users list view. This generates a secret token and an api token to be used for the JWT authentication.

TRP also supports basic authentication where a password and username is given to the import. TRP import will expect this username and password by the third party system when receiving data.

Data Strategy

This type of import is not a data pull and in fact a data push to TRP. This data is real time as the data is generally sent right as the events occur on the third party system.

Source 8: RedCap

REDCap integration allows the connection of the REDCap system with TRP. REDCap is a popular survey tool used in the healthcare system.

One TRP import captures data from a single REDCap survey.

Fields and Connection

Each REDCap survey allows creating an API token that can be used to get all the survey data.

Select Redcap from the data source dropdown field.
Enter / paste the Redcap API URL link from the Redcap project
Enter the API token which is a unique code associated with a single user on a specific REDCap project.

Data Pull Strategy

RedCap data is always full load, as every time this import runs, it brings all the data from the survey again.

Quick links

Glossary of Key Term

CSV File: A common format for data storage.
Data Connection: How TRP connects to the data source.
Data Imports: Bringing external data into TRP.
Data Labels: Data labeling is the process of adding tags or labels to raw data such as images, videos, text, and audio. Data labels act as a reference point, which helps categorize new, unlabeled data in the right fields.
Data Quality Assessment: Ensuring accurate and well-formatted data.
Data Separator: Character that separates data columns. It could be a comma, space, semicolon, tab, or any other character.
Data Settings: Configuring data column labels, unique columns, and time zones.
Data Validation: Rules to check data integrity.
Data Warehouse: A central storage for cleaned and organized data.
Database Source Type: Connecting to a database.
Dataset: A collection of data related to a specific area.
ETL (Extract, Transform, Load): The process of combining data from various sources into a central repository.
Entity: An entity in TRP adds context to healthcare data, enabling features like tagging providers, specialties, and automated chart creation for encounters.
First Data Row: The row with data following the column titles.
Header Row: The row containing column titles.
Imports: Bringing data into TRP at the dataset level.
Port: The server port for the database connection.
Query: Extracting data from a database.
Timezone Setting: Time zone settings in TRP allow configuration of the specific time zone that should be used for recording, processing, and displaying time-related data in the database. (Accurate time zone settings are crucial to creating reports dealing with scheduling, logging, and real-time data).
Unique Column: Unique columns are specific columns within a database where each value in the column is distinct and does not repeat. (Unique columns are essential for preventing data duplication and ensuring accuracy in data management and analysis.)