Tag: Data

  • Six ways to improve data lake security

    Six ways to improve data lake security

    Data lakes, such as Oracle Big Data Service, represent an efficient and secure way to store all of your incoming data. Worldwide big data is projected to rise from 2.7 zettabytes to 175 zettabytes by 2025, and this means an exponentially growing number of ones and zeroes, all pouring in from an increasing number of data sources. Unlike data warehouses, which require structured and processed data, data lakes act as a single repository for raw data across numerous sources.

    What do you get when you establish a single source of truth for all your data? Having all that data in one place creates a cascading effect of benefits, starting with simplifying IT infrastructure and processes and rippling outward to workflows with end users and analysts. Streamlined and efficient, a single data lake basket makes everything from analysis to reporting faster and easier.

    There’s just one issue: all of your proverbial digital eggs are in one “data lake” basket.

    For all of the benefits of consolidation, a data lake also comes with the inherent risk of a single point of failure. Of course, in today’s IT world, it’s rare for IT departments to set anything up with a true single point of failure—backups, redundancies, and other standard failsafe techniques tend to protect enterprise data from true catastrophic failure. This is doubly so when enterprise data lives in the cloud, such as with Oracle Cloud Infrastructure, as data entrusted in the cloud rather than locally has the added benefit of trusted vendors building their entire business around keeping your data safe.

    Does that mean that your data lake comes protected from all threats out of the box? Not necessarily; as with any technology, a true assessment of security risks requires a 360-degree view of the situation. Before you jump into a data lake, consider the following six ways to secure your configuration and safeguard your data.

    Establish Governance: A data lake is built for all data. As a repository for raw and unstructured data, it can ingest just about anything from any source. But that doesn’t necessarily mean that it should. The sources you select for your data lake should be vetted for how that data will be managed, processed, and consumed. The perils of a data swamp are very real, and avoiding them depends on the quality of several things: the sources, the data from the sources, and the rules for treating that data when it is ingested. By establishing governance, it’s possible to identify things such as ownership, security rules for sensitive data, data history, source history, and more.

    Access: One of the biggest security risks involved with data lakes is related to data quality. Rather than a macro-scale problem such as an entire dataset coming from a single source, a risk can stem from individual files within the dataset, either during ingestion or after due to hacker infiltration. For example, malware can hide within a seemingly benign raw file, waiting to execute. Another possible vulnerability stems from user access—if sensitive data is not properly protected, it’s possible for unscrupulous users to access those records, possibly even modify them. These examples demonstrate the importance of establishing various levels of user access across the entire data lake. By creating strategic and strict rules for role-based access, it’s possible to minimize the risks to data, particularly sensitive data or raw data that has yet to be vetted and processed. In general, the widest access should be for data that has been confirmed to be clean, accurate, and ready for use, thus limiting the possibility of accessing a potentially damaging file or gaining inappropriate access to sensitive data.

    Use Machine Learning:Some data lake platforms come with built-in machine learning (ML) capabilities. The use of ML can significantly minimize security risks by accelerating raw data processing and categorization, particularly if used in conjunction with a data cataloging tool. By implementing this level of automation, large amounts of data can be processed for general use while also identifying red flags in raw data for further security investigation.

    Partitions and Hierarchy: When data gets ingested into a data lake, it’s important to store it in a proper partition. The general consensus is that data lakes require several standard zones to house data based on how trusted it is and how ready-to-use it is. These zones are:

    • Temporal: Where ephemeral data such as copies and streaming spools live prior to deletion.
    • Raw: Where raw data lives prior to processing. Data in this zone may also be further encrypted if it contains sensitive material.
    • Trusted: Where data that has been validated as trustworthy lives for easy access by data scientists, analysts, and other end users.
    • Refined: Where enriched and manipulated data lives, often as final outputs from tools.

    Using zones like these creates a hierarchy that, when coupled with role-based access, can help minimize the possibility of the wrong people accessing potentially sensitive or malicious data. 

    Data Lifecycle Management:Which data is constantly used by your organization? Which data hasn’t been touched in years? Data lifecycle management is the process of identifying and phasing out stale data. In a data lake environment, older stale data can be moved to a specific tier designed for efficient storage, ensuring that it is still available should it ever be needed but not taking up needed resources. A data lake powered by ML can even use automation to identify and process stale data to maximize overall efficiency. While this may not touch directly on security concerns, an efficient and well managed data lake allows it to function like a well-oiled machine rather than collapsing under the weight of its own data.

    Data Encryption:The idea of encryption being vital to data security is nothing new, and most data lake platforms come with their own methodology for data encryption. How your organization executes, of course, is critical. Regardless of which platform you use or what you decide between on premises vs, cloud, a sound data encryption strategy that works with your existing infrastructure is absolutely vital to protecting all of your data whether in motion or at rest—in particular, your sensitive data.

    Create Your Secure Data Lake

    What’s the best way to create a secure data lake? With Oracle’s family of products, a powerful data lake is just steps away. Built upon the foundation of Oracle Cloud Infrastructure, Oracle Big Data Service delivers cutting-edge data lake capabilities while integrating into premiere analytics tools and one-touch Hadoop security functions. Learn more about Oracle Big Data Service to see how easy it is to deploy a powerful cloud-based data lake in your organization—and don’t forget to subscribe to the Oracle Big Data blog to get the latest posts sent to your inbox.

    Via: https://blogs.oracle.com/

  • Four ways financial services companies use big data

    Four ways financial services companies use big data

    Big data is rapidly becoming the key driver in the financial services industry. Big data covers a lot of areas: transactions, customer accounts, vendors, and more. All include individual fields of data, from time stamps to payment amounts to unstructured text fields of additional data (such as call center notes). Consider these numbers: the volume of digital banking users has increased from 20% in 2010 to 61% in 2018—more than tripling in a number of years. At the same time, the number of connected devices in the past decade has grown exponentially, with more than 90% of data driven around the digital world being generated in the past two years alone.

    The majority of people are accessing their money digitally, and the use of smart devices—be it phone, tablet, laptop, or even web-connected appliances with purchase capabilities—is growing exponentially.  And the volume of transactions happening per second feels countless, and perhaps what’s even more daunting is the amount of security required to handle such a thing.

    If you consider that every device in the world, be it a phone or a smart TV, is a potential access point for hackers, the need for reliable security suddenly gets put into perspective.

    Fortunately, the financial services industry is already on top of this. Many of the world’s biggest providers are leading the charge by combining big data with machine learning (ML). Not only does ML make your money safer, it delivers a better customer experience. Let’s take a look at four specific ways the financial services sector is integrating big data into everyday operations.

    Fraud Detection

    The digital age has transformed the way fraud works—not just from people unscrupulously trying to steal, but also the security teams attempting to protect customer money. Today’s economy is run via online transactions and transfers, which means that for fraudsters, gaining access (usually by stealing someone’s identity or credentials) is the goal. They attempt this in a number of ways, from skimmers on PIN pads to malware transmitted online to brute-force hacks of accounts. On a macro scale, that data can tell a lot about the different parties involved; patterns can create expected profiles and, more importantly, identify when potentially fraudulent activity occurs outside of those expectations. While the finance industry can’t protect everyone at every transaction, they can at as both a safety net and firewall against these types of bad actors thanks to big data.

    Challenges

    To properly process this volume of data, various transaction datasets—with additional information such as interaction events and customer behavior—must be consolidated. That means storing data in an appropriate repository, such as a data lake, and applying ML to efficiently crunch the data while identifying patterns.

    Financial Regulatory and Compliance Analytics

    Regulatory compliance has been an issue for financial institutions since their inception. But in the digital world, regulations have rapidly changed. In addition to working within a digital landscape, regulations have quickly evolved to get a handle on new issues such as an increasing amount of cross-border transactions and the rise of cryptocurrencies.

    Because of evolving regulatory rules, big data benefits financial services by offering large-scale processing of data sets as well as the ability to enact wholesale rule tweaks that quickly enable process updates for compliance. The collection of big data is the foundation for compliance, as it provides real-time proof of adherence to regulations (or identification of issues). This will never change the need for a compliance department to oversee and steer such things, but it will streamline and consolidate involved workflows, as well as minimize human error on records. A prime example of this comes from Caixa Bank, which saved 60,000 work hours overseeing Spain’s direct debits process.

    Challenges

    Similar to fraud detection, regulatory compliance requires bringing together multiple sources. On top of that, compliance also utilizes advanced risk models, and these must be generated quickly without creating any impact on other projects.

    Improve Customer Service Through Big Data

    Any organization’s operations can achieve valuable improvements with big data, and the financial services industry is no different. Consider the steps along any workflow; externally, banks and organizations are looking at customer retention and activity on loans, special offers, balance transfers, and other types of financial offerings. Internally, these same organizations are looking for any sort of process improvement, whether it’s in HR, IT, marketing, sales, or any other organization.

    Big data provides insights that lead to innovation. Let’s take the example of maximizing customer engagement. Big data can look at a customer transactional data and account history to identify purchase patterns, geographic locations, and other potential engagement triggers. With ML, models can be built to identify the customer needs based on this data and extend appropriate offers that maximize potential for engagement. For example, if the ML model determines that a customer is doing a bit of remodeling work by shopping at hardware stores and related businesses, it could trigger an offer for a home equity line of credit.

    Challenges 

    To get the most accurate view of a customer, as many sources need to be used, including licensed third-party data regarding outside factors such as demographic and geographic data. Data scientists will also need to build and constantly refine customer models while also looking at big-picture economic factors such as interest rates.

    Anti-Money Laundering Strategies

    As a subset to both fraud detection and compliance, financial services firms are facing increasing pressure from governments specifically regarding anti-money laundering laws (AML). Money laundering is a different issue from purely fraudulent transactions, and laws and regulations targeting this sort of thing have a much wider scope, including tax evasion, public fund corruption, and market manipulation. Other elements involve concealing these crimes and any money derived from these actions.

    For AML compliance, data must be ingested from extremely diverse sources (sanctions lists, legal data, transactions, application logs). Also, ML models need to look at known money-laundering methods across timing and context in order to flag items for further investigation. Merely working within established rules (such as a transaction threshold) creates black-and-white thinking to an issue with a lot of gray-area manipulation by criminals. This is where ML can truly add value thanks to models that evolve over time as criminal schemes become more nuanced and sophisticated.

    Challenges 

    A wide range of sources is required for AML compliance, including taking on datasets that have many combinations of structured, unstructured, and multi-structured data. Models have to be built to meet the latest regulations, along with constant updating to maintain compliance. Other elements include using tools such as graph analytics to reveal hidden relationships.

    Other Big Data Use Cases

    This post featured an up-close look at big data in the financial services industry, but big data and ML can provide the same types of benefits for just about any industry. To learn more, take a look at Oracle’s Top 22 Use Cases for Big Data. Covering manufacturing, retail, healthcare, and more, this ebook provides insights into the power of big data across multiple industries.

    And for more about how you can benefit from Oracle Big Data, visit Oracle’s Big Data page—and don’t forget to subscribe to the Oracle Big Data blog to get the latest posts sent to your inbox.

    Via: https://blogs.oracle.com/