Software solutions for legal world.

Message Crawler Manual

Message Crawler Manual

Work in progress. Message Crawler Manual for Version 5 - Last updated June 28, 2021

 What is Message Crawler

Message Crawler is a conversion tool that allows you to import data from multiple sources, preview data in grid view, preform data manipulation and export data out in new format. This tool specifically designed to create RSMF files for Relativity however it can be used for working with DAT files, directory lists and cloud services for document translation and image identification. Many tools provided can be combined and used in creative ways to solve multiple problems encountered during document discovery.

Message Crawler does not collect data from mobile devices but loads files generated by various forensic software or files directly download from social media websites (generally xml or json)

Notes about Conversion

 

Process of document conversion takes data from one data sources and puts it into another format as required. Most data sources that are used for loading to Message Crawler contain vast amounts of metadata and can have deep and complex structures. RSFM file format on the other hand has simple structure with limited number of fields. During process of conversion, most relevant metadata fields are extracted and brought into RSMF. Considering how much metadata various platforms generate along with constant change of these platforms, conversion may not include 100% of data stored in various xml/json files.

Message Crawler is not meant be a one click conversion tool. It is a tool for analyzing and understand your data and making decision as to what data to take for conversion. It is important for user performing conversion to understand what metadata to include or exclude during conversion.

Importing Data

DAT File

Industry standard DAT file can be imported into Message Crawler. DAT should use Concordance style delimiters. It is possible to load CSV file as well but changing Delimiters option before click on Import DAT.

 

Directory List

Directory list of a folder structure can be created and loaded into grid for further processing or batch files creation.

 

Slack (channels/dms/mpims/groups)

Slack data that was exported from Admin Panel can be loaded to Message Crawler. Following types of JSON are supported: Channels, DMS, MPIMS, GROUPS.  Export must contain users.json file.

 

Slack for Teams

Loose or unstructured Slack json files can be loaded with this tool. This tool more limited that original Slack import and may import less data. Since no users.json is used in this tool, each import json file must have user_profile section or user name will remain an id.

Google Hangouts

Data from Google Hangouts can be loaded as well as linked attachments can be downloaded to a folder. Often export will already contain attachments but downloading attachment using Message Crawler will usually produce a higher attachment count. If contacts are no longer “friends”, user information may change user id as Google can no longer cross reference this information to actual user name.

 

Bloomberg

Bloomberg JSON files can be loaded to Message Crawler after they have been organized and foldered correctly. Be sure that you are working with is Instant Bloomberg which is indicated by having _IB_ as part of the file name. Data that contains _B_ is not Bloomberg chat and cannot be parsed with this tool. To parse _B_ data, go to menu Misc Tools > Bloomberg Email Converter

 

Oxygen Forensics / Relativity DAT file

This is similar to standard DAT file but additional processing will take place to extract relevant fields for RSMF conversion.

 

Cellebrite Legalview

This is similar to standard DAT file but additional processing will take place to extract relevant fields for RSMF conversion. It is recommended to use this tool as last result and use Cellebrite XML instead.

 

GroupMe

You can import either single or multiple json files that makes up GroupMe export.

 

ESI Analyst DAT File

This is similar to standard DAT file but additional processing will take place to extract relevant fields for RSMF conversion.

 

Cellebrite XML

Use this tool to load XML files generated by Cellebrite. These file are large and bulky and make some time to load. Phone owner information is not included as part of these XML files so this information must be provided at import. Use grid to examine that you have done so correctly.

 

Teams

You can load Teams data exported in either PST or MSG format.

 


Exporting Data as RSMF

RSMF files generated from Message Crawler can be customized to meet client’s needs. User of the tool is responsible in assigning correct field from the grid to RSMF fields. Read more about RSMF required fields on Relativity’s website at following link.

Relativity Short Message Format

Export.jpg

 

Sorting

Important information about sorting

Once data is loaded to grid, it will be sorted in correct order for RSMF file to be created. Data load from DAT file will not be sorted automatically. If you perform data manipulation, change sort order or create custom conversations you have resort data before starting export.

Data MUST be sorted by Conversation Identifier field first, then by Sort Date/Time (including seconds), then Control Number field. Sort date should be formatted in a “text sortable” format for example: 2021-04-10T17:32:15 and applied to attachments. You can use Date Format tool to generate sort date or format it as required.

 

Required Fields

These fields are required for conversion and if left blank can result in malformed or not usable RSMF file.

Control Number: Unique number used to keep track of records in the grid.

Group Identifier: Fields that is the same for all members of the family. This is similar to Begin Attach field.

Conversation Identifier: Fields that is the same for the entire conversation. This will decide which records will be exported to same RSMF file.

Sender/From: Person sending a message.

Time Stamp: Date/Time when message is being sent.

Message Body: Content of the message

Messaging Platform: You can specify messaging platform from a field or you can select single platform for all messages by selecting an option in triangular brackets <>. Supported platforms by Relativity (slack, sms, mms, bloomberg, skype, imessage, googlechat)

Type of Conversation: Select direct or channel. If you have channel members joining or leaving the channel, it must be a channel or you will get validation errors during export.

Export-Required.jpg

Optional Fields

Optional fields should be specified if available. While not required, they greatly enhance review experience.

Message Type: Type of the message. Following values are supported (message, disclaimer, join, leave, history, unknown)

Recipients: While not required it is highly recommended that this will be provided in order to know who has seen message sent.

Names Delimiter: Select or enter delimiter that is used to separate recipient names.

Is Deleted: Indicate if message was deleted: Allowed values (y, yes, 1, true, deleted)

Importance: Indicates importance of the message. Allowed values (normal, high)

Reactions: This field is generated by Message Crawler when it is able to read reactions to a message. This is not supported for all messaging platforms. Select appropriate field if available.

Custodian: Select custodian of a document. Enter name in triangular brackets to apply same custodian to all messages.

Direction: Direction of the message. Allowed values (incoming, outgoing)

EventCollectionID: This field can be used to group RSMF message together in Relativity to from single conversation.

Export-Optional.jpg

Attachment Information

Attachment specific information is required if attachments present. If there are no attachments, path and name can be left blank.

Attachment Path: Tells Message Crawler where to look for attachment. This can be full or relativity path. If you get attachment not found error during export, use this field to see if attachment is present in directory or not.

Attachment Name: Specify field that contains name of the attachment.

Path Prefix: If you are using relativity path specify initial part of the path in order for Message Crawler to locate attachments. This will be automatically populated most of the time.

Include Missing Attachments: In some situations attachments maybe missing. During normal conversion Message Crawler will indicate that attachment is missing and skip it from RSMF file as required by file format specifications. However if you would like reviewer to know that file was there at some point you can select this option. RSMF file will fail validation process and when data is loaded to Relativity, there will be a red box indicating attachment is missing. Even though file fails validation, it will fine in Relativity.

Export-Attachment.jpg

Additional Metadata

There are multiple ways of adding supplemental metadata to improve review experience. Select additional fields you would like to load to Relativity.

Write to Cross Reference File: A cross reference file will be generated during export that contain file name and additional metadata selected. Once files are loaded to Relativity, cross reference file can be loaded using Relativity Desktop Client. You will need to use File Name as overlay identifier.

Write fields to RSMF Header: RSMF file is an EML file format and custom fields can be added to the header of the file. Once RSMF file is processed, those field will be visible in Email Header field in Relativity.

Add to Body: Metadata added here will be visible when user hovers mouse over question mark next to actual message.

When saving attachments as separate files: A cross reference will be generated if you chose to save attachments as separate files and not to embed them into RSMF file. You will need to overlay new group identifier and Relativity AttachemntID fields using RDC after importing those RSMF files into Relativity. See section about attachment handling.

Export-AddMetadata.jpg

Export Section

Destination Folder: Specify where to write RSMF files to. Files will be split into subfolders in order to keep number of files per folder down.

High Performance Mode: This option tells Message Crawler that you are using high performance station with plenty of ram and cpu. Software will run much faster using all resources it allocate. This is recommended for most conversions however you may want to turn this off for very large jobs or if you want to keep cpu utilization low.

Conform to RSMF version: If you are on RelativityOne, you can take advantage of additional RSMF 2.0 functionality but specifying 2 in version number. These file will be compatible with Relativity Server.

External Attachments: Large RSMF files (over 500mb) may fail to build in Message Crawler or fail to process in Relativity. If you encounter such problem you can have Message Crawler store attachments separate from RSMF files. If you do so, you will have to load new value for Group Identifier and Relativity AttachmentID from cross reference file that will be generated. Additionally you can specify not to embed attachments only if total attachment size is over certain gigabyte size.

Note that attachment size of 250mb can generate RSMF files a little over 300mb due to EML encoding which makes files large. If you have a specific RSMF file in size, you can targe attachment size be 50-100mb less.

 Three step export: User can generate RSMF files in memory (not writing to disk), preview information about files such as expected attachment size and message counts and only once satisfied, perform export to disk. This can help you decide if you need to store attachment as separate files.

 Error log: Error log tracks all messages, warning and errors during conversion. It is important to review error log to make sure no unexpected errors happened.

Presets: You can save and load settings used for conversion for reuse on another job.

Export-Export.jpg