Cloud Adoption of Security Big Data Lake:
Working on a plan for a client in Health care for a project called Security Big Data Lake. The objective of the project is log aggregation across all the layers (prod/test/dev) and store them in a centralized location and perform analysis on the aggregated logs if there’s any potential hacking into systems, data breach, threat hunting etc., A preferable solution as to store these huge logs was a big data system which can also be used for data storing and also which can provide huge computation power for processing on the data for potential hacking, to see if there are any data breaches and threat hunting using a third party security tools which can be integrated with the application and a big data tool which can provide all these solutions was found to be Hadoop which can also be integrated with security tools like Interset, Firemon.
A basic data estimation on daily inflow of data from all the network layers was approximately estimated to be around 1-3 TB/day and with a default replication factor of data in Hadoop of 3 was estimated to be 3-6 TB/day and the requirements was to have a retention of data for 10 years as per legal aspects around the data, also when the system is planned to be deployed into production it needs 3 data movement frameworks which can be solved with HDF(NiFi) clusters integrated to three different layers (prod/dev/test) and all the data collected across all the platforms needs to be finally aggregated into HDFS (HDP cluster).
Data Architects team was assigned a role with correct estimation of the expected data the format at which the data would be received from all the three different layers from Routers, Firewalls, Network, Password authentication management tools, Unix machines, Windows machines and the daily inflow from each layer and the format of data which needs to be expected from all the three layers along.
System Architects team was assigned with role of estimating based on the information from Data architects on the system requirements for deploying clusters integrated with all the three layers on the requirements for system configurations (RAM, CPU, Disk) along with Load balancers, VIPs and versions of software which are compatible for data movement from all the three layers also based on the inflow of data the estimates for the requirement of disk needed for supporting the data for a 10 years retention as per legal requirements. Also, Business teams were involved along with Systems architects team to take into consideration of the costs for being able to deploy the clusters On-prem along with cloud solution. Also, Legal teams were involved for advising if clusters can be completely deployed into cloud with nothing On-prem as data involved on the logs which are collected across the platforms there’s sensitivity and legal aspect which were needed to be taken into consideration. If even the IT teams based on the report submitted from System architects team suggest that the cost of deploying the complete clusters in cloud would be an ideal scenario from costs perspective. Legal teams would need to sign off and agree that there’s no law or policy which is being breached because of pushing the data into cloud. Along with a complete solution of having either complete On-premises or Cloud, Systems architects are also asked to take into consideration of having data pushed into Cloud for archival after business requirements for log analysis of 1 year. Legal, Finance, Security teams would be asked to provide their sign off to proceed with the model of having the data in cloud for archival after 1 year if it can be considered.
Network teams are assigned with roles for reviewing the model submitted from systems architect team on the requirements for establishing connection from data centers to various cloud vendors and the report is submitted to security operations teams for reviewing on security aspects of the project.Security operation teams were assigned with role of reviewing the security aspects of systems being deployed into Cloud vs On-prem. Also, if there are any potential security breaches that could possibly happen because of using Cloud solution and if there are workaround which could be implemented if in case the Cloud solution was taken into consideration. A report was asked to be submitted from the Systems Architects team to Security Operations team, so they could advice regarding the solution. Also security team is asked to take into consideration if they cannot sign of on complete cloud based approach of pushing the data into Cloud, Will having the solution of pushing the logs for archival after a retention of 1 year as per business requirements into Cloud would be an option as after a year as per business requirements the data would not be used for analysis and costs for archiving data into cloud would be a best option when compared with storing data On-prem.
IT team was asked to do a POC (Proof of concept) on the solution provided by Systems architects team after it has been reviewed and signed off by the Legal, Finance, Security Operations teams. Data science, different technology owners (Network, Firewall, PAM, Unix, Windows) teams are assigned with role of reviewing the data after the proposed solutions is implemented. After all technology owners and data science teams sign off on the data which is processed into HDFS. Roles are assigned for third party security tools to propose their solutions for integrating their applications along with Hadoop clusters and provide a demo on the usage of the tool to Data science and IT users who can give a sign off if the application is working as expected or should they see any issues in the proposed solution.
Project management team would be working in coordinating the tasks assigned with each teams, defining time lines and to see they are met by having a daily touch point till the final report is submitted on the where the data lake needs can be deployed the costs estimations for the having the clusters deployed in Public Cloud, Private Cloud, Hybrid or On-premises. The decision needs to agree by with Legal, Finance, Security, Systems architects, Data architects.
Based on the above report the final document with proposed solution on providing a cloud solution for Security Big Data Lake needs to be submitted to senior management (CEO, CTO, CISO, etc.,) with a timeline of when the application can be deployed into production with all proposed Legal, Finance, Security aspects around the application.