Centralized control of File and Data movement in the Enterprise

A few years ago the IT operations manager at an international corporate walked out of his office just diagonally from where I was seated looking rather distressed. He threw his hands up in the air and announced “I just don’t know!”. Being the IT architect around those parts, I got up, walked over and asked him “Don’t know what!?”. He said that a file that was scheduled daily to be copied from one country location to another location had not been copied and now 3 weeks later the business discovers that their data is not available and in fact the scheduled job had been failing for the last 3 weeks. Because this job was scheduled on a server running in that country, and in fact similarly many data movement jobs were scheduled in just about every country with no central visibility of job success or failure, there was no way (other than to log into every server every day) to check that jobs were running successfully. Well of course one could script the job and send email to a central mail box but who would keep track of which emails had arrived and which not for every job in every country? In many locations connectivity was really bad and unreliable too, so if the line was bad, the job would simply fail, there was no way to try again later without manual intervention. The manager said, “I need an EAI solution!”, I said “That would be overkill, all you need is to have a central file and data management solution with central reporting and alerting to see what was is happening out there. Something simple and easy to use ….”.
The EAI (Enterprise Application Integration) solution type mentioned would be enterprise middleware type systems targeted at complex business systems, these are expensive and complex to implement and maintain, I needed something simpler and cheaper. After a bit of Googling, I came across an old Gartner Magic Quadrant on MFT (Managed File Transfer), I also noted that Gartner had deprecated this Magic Quadrant in favour of one for EAI. Nevertheless, I didn’t need EAI, I realised I just needed MFT. A now more focussed search resulted in the product that we now resell, GoAnywhere MFT by Linoma software. This cost-effective solution was implemented very successfully and in fact to a far greater extent than just copying data around the organisation, for samples of what was done please see the use cases presented in the GoAnywhere section of this website here. The rest of this document discusses some of the key mechanisms built into good MFT systems that enables the value it delivers, use this knowledge to figure out where this would be appropriate in your organisation to add value and close some gaps that you may have.

Characteristics of MFT systems

Centralised management and control: There would not be much point to the use of an MFT system if it did not offer centralised management, control and global visibility of file and data transfer operations. To make it universally accessible and client independent, a fully web-based user interface would be best to access it from anywhere. Functionally, the automation of file and data transfer-related activities and processes is required together with the key quality requirements of resiliency and end-to-end security. Besides a proven and robust MFT platform that is well supported by its vendor, some elements of resiliency and security are discussed directly below.

Security: For end-to-end security, an MFT system should itself be secured with a secure connection to its server backend whilst supporting a number of authentication mechanisms to allow it to securely identify itself when accessing secure resources, e.g. using AD (active directory), LDAP (light-weight directory access protocol), IBM-i and SAML (security assertion mark-up language). Once authenticated, data in motion needs to be secured through encryption protocols like those mentioned in the section on “Encrypted file and data transfer”. When data is at rest it may also have to be secured, see the discussion in the section on “Secure File Storage”. A built-in key management feature for creating and managing encryption key pairs is essential together with support for encrypted email and password protected file compression.

Resiliency: This section refers to MFT itself as a robust platform and the ability of its transfer mechanisms to do everything possible to ensure successful task execution. Some of these are discussed here:

Auto resume: with large file transfer, uncontrolled network glitches and low quality connections can result in file transfer failure most of the way through a file transfer. Starting again wastes time, bandwidth and provides no guarantee that the transfer will work the next time. File transfer should therefore be fairly resilient, supporting an auto resume function allows the transfer to continue once network conditions have normalised.

Retry on failure: in Africa where we operate, a number of services are not guaranteed such as electricity and network connectivity. So a mechanism to retry on failure is required to enable retries over a period of time at a set interval. For example, a retry configuration would look something like: should the current transfer attempt fail, try again every 30 mins for 3 hours. If this fails, then alert an administrator.

Distributed MFT Processing: Some MFT systems, like GoAnywhere MFT allows for multiple MFT servers that can communicate with each other as a resource. In other words, you can execute projects created on other MFT servers and also use this is a high-availability scenario through clustering. Load balancing is also a highly desired feature in a high performance environment that is enabled through a clustering capability. Distributed MFT is also useful when for example extracting and/or compressing data for transfer at a remote system on the other side of a WAN connection. For example, a local MFT instance could do overall process orchestration where it instructs the remote MFT system to extract data local to the remote instance, compress it and then transfer that data across the WAN to the local network so that extraction and compression does not occur over the WAN.

Unencrypted file and data transfer: on the Windows platform, file transfer usually takes place from file share to file share using the SMB protocol. Other platforms more often use mechanisms like FTP with HTTP also available where data sources are hosted on a web server. Use an unencrypted file transfer internally where the data is not deemed to be sensitive in any way or where the file transfer is within a data centre and does not traverse the WAN or LAN where the physical access to the network environment is not controlled.

Encrypted file and data transfer: There are a number of encrypted file transfer mechanisms available, most common would be SFTP, FTPS, SCP, AS2 and HTTPS. Use encrypted transfer mechanisms where sensitive data is being transferred and especially where physical access to the network environment is not controlled such as the corporate LAN or WAN or over the Internet. In fact, when transferring internal non-public data over the Internet, always use encrypted file transfer mechanisms. MFT solutions will allow you to create encrypted key-pairs for supported security protocols as well as register externally created keys for communicating securely with those systems.

Non-file based data transfer: Data is not only stored in files, it also resides in databases (MS SQL Server, Oracle, etc.) or in systems fronting a database, e.g. MS Exchange or MS SharePoint that uses SQL Server in the backend. Any good MFT system should be able to retrieve data directly from almost any database system and transfer it another database of the same or different type. Or you may want to extract data from a database and transform it into a file format like MS Excel or a CSV file for user consumption or for backup purposes or to load into another system that only accepts input in a particular text file format. Most data providing applications today like MS SharePoint expose web service based APIs, your MFT system should allow you to access and invoke standard-based web-services to retrieve data or kick-off processes on the invoked system.

Alerting: As mentioned in the opening scenario, when you are transferring large numbers of file between locations you want to be notified of exceptions where operations fail. Alerts are usually issued via an email mechanism; this can be extended to SMS by subscribing to an email to SMS service. Of course, if there are too many alerts then alerting can quickly become overwhelming and it loses its value as a mechanism to maintain file transfer service levels. For this reason, alerting should be a last resort, MFT systems should have several mechanisms available that allow it recover from failure or retry before alerting. These were discussed above in the section on resiliency.

Dashboards: In the opening scenario, a large number of file transfer occurring daily should have a quick and easy visual mechanism to monitor transfer activity in one place. In addition, there may be a number of stakeholder groups interested in various transfer activities relevant to them, so you may want to empower those groups with a view into the file activity relevant to their areas of work. For example, in the above example, this corporate had a number of banking related file transfers from and to multiple banks of interest to the internal finance community. A role-based access mechanism would allow you to setup dashboards as appropriate and allow non-IT users access to monitor and be alerted to activity related to their area of interest. In fact, with simplified web-based interfaces, such users could also be given limited access to restart jobs or run then at ad-hoc times according to their needs without requiring IT intervention.

Auditing: One of the key issues with user accessible and automated systems is who did what and when. And of course, can I rely on this information, i.e. is there support for non-repudiation. Auditing logs can grow huge so there should be a mechanism to search and categorise auditing data by user, scheduled job, operation type and so on. This is important where the known search value can vary significantly, e.g. I have the file name that was transferred, provide all auditing logs for last week on any operations on this file, similarly by user or scheduled job.

Reporting: Reporting is especially important when looking at non-incident related historic data to determine behavioural changes or trends in operational activity across large system activity data sets through appropriate visualisations. This can be used to establish and monitor performance metrics and make global changes to steer the direction of certain activities or remediate others that appear to be degrading over time. Reporting is also useful for monitoring user related activity for security purposes.

Sharing Data: Whilst I would not place this function as a requirement for an MFT solution, it does make sense somewhat to be able to share data that can be accessed and used by people and systems. Besides running standard FTP, SFTP and other servers, an HTTP/HTTPS services together with role-based user management is a welcome MFT service that would allow ordinary users access to uploading and downloading files that can be shared with other people and systems. Some MFT systems may even allow you to kick off processes on upload or download and even transform data that is uploaded into a target system. Read more about such functionality in the ETL section.

Secure File Storage: We discussed the secure transfer of data earlier but at some point the data has to come to rest, so how do we protect this data here? MFT systems usually support multiple data encryption methods to securely store files at rest. Built in ETL mechanisms and secure repository mechanisms allow for manual or automatic decryption and encryption when retrieving or storing file in such repositories. Password protected file compression, e.g. ZIP, TAR, GZIP etc. should also be supported.

Extract, Transfer and Load (ETL) processes:

As discussed in the previous section on ‘Non-file data transfer”, data can be extracted from a number of sources such as databases and applications with exposed web APIs. Sometimes you may need to sequence a number of tasks with some logic to enable a process that performs multiple tasks across multiple systems. ETL process functionality provides such capability to allow the packaging and scheduling of tasks into a repeatable process, these functions are often used in integration services type functions. Some of the tasks that an ETL package may perform includes:

  • Retrieve files from multiple sources like file shares and FTP / SFTP services.
  • Move or copy files.
  • Retrieve data from a database.
  • Transform the data into another format or perform some basic calculations like calculate VAT or change a data format.
  • Write the transformed data into a file format like MS Excel or CSV file.
  • Load the data into another system via its API or directly into a database.
  • Execute a program on the operating system, target application or database.
  • Notify a user of success or failure.
  • Encrypt or decrypt a file.
  • Compress or uncompress a file.

Of course, there are many other useful features, this would be dependent on the selected MFT system.

Workflow / Orchestration: sometimes you may want to execute a number of coordinated tasks packaged into a single process. These tasks may be coordinated in a logic-based decision tree with sections repeatable (loops) on logic conditions. Using readiness checks and system or user provided data to trigger such workflow events, a fairly sophisticated sequence of tasks can be constructed without any coding in a good MFT system.

Triggers and Monitors: A trigger could be set to fire on a particular event such as a file upload, download, process execution pass or fail etc. On firing, one could initiate a process to deal with the event or initiate further processing. File monitors are also a type of trigger; these can be placed on a folder to monitor for file operation events such as new file creation. A successful trigger could then execute a process to process the file that triggered the event. For example, your Payroll system generates a monthly payment file that it delivers to an internal folder, a file monitor could be set on this folder to trigger a process that loads the file into your ERP system for payment.

Transfer Optimisation: When transferring large files over a WAN connection, it should ideally be compressed before transfer to minimise the network payload that is transferred between the two points. Quite often, extracted data is in text format, normal text files have a high compression ratio and can be significantly reduced before transfer. MFT systems should have built-in compression algorithms with ETL support to compress and uncompress files easily. Note that this works well when sending out data that is localised with the MFT system but will not work if the data is remote on the other side of a WAN. In this case, use the MFT system to execute a remote command to compress the data first before transfer, else consider the distributed MFT processing model as discussed in the section on resiliency above.

Scheduling: Every MFT system should have a powerful scheduling mechanism for creating, prioritising, searching and reporting on scheduled jobs. Other management features should include putting jobs on hold, re-prioritising, cancelling, tracking with job numbers, alerting, retry on failure and multi-threading are also essential features.

Use cases: A number of use cases for MFT are discussed in the GoAnywhere MFT section of our website, there are a many business solution use cases as well as the already discussed IT use case in this document.

An MFT solution could be a very valuable asset to almost any size organisation, some MFT solutions have many more file and data related functions like EFSS (enterprise file sharing and synchronisation) that provides “Drop Box” like functionality, secure email, external secure file sharing, secure gateways, mobile app support and others. My next article related to enterprise file management will be called: “How can I share files securely inside and outside my organization?”.