The Myth of Analytics Self Service

 

BI solutions promise “User Self-Service.” But when users need to download more than a few thousand rows of data, “self-service” becomes “ask IT.”
Business users routinely need millions of rows to send to partners or customers; for further offline review; or to address a regulator’s concerns. These requests are difficult:
• The data is too large to easily download as a CSV
• There are corporate security concerns
• How does the user even open the file (too large for Excel or Sheets)?
• Once created, how can users be sure of the file’s provenance – when was it created? has it been modified?

EXPONAM for BI

By incorporating the ability to write Exponam .BIG files, BI vendors can easily provide true self-service for downloading large datasets. Users can download and open files with 100s of millions of rows in a spreadsheet. Filter and sort data instantly. Security concerns are addressed. File size is minimized. Files are immutable, provenance and lineage are tracked. Data subsets are moved to Excel with a click.
You do all the heavy lifting –making data accessible; providing visualizations; uncovering insights.
Let us help you go the final mile – true user self-service for secure, large data download, sharing and exploration.

 

Download this article.

In the News: Data Headaches Targeted with a Dose of .BIG via DataNami

Working with large numbers of files–and large files–remains a roadblock to productivity for data professionals around the world. Now a software startup named Exponam says it has come up with a potential solution to the problem with a new data file format called .BIG.

Read the full article at:

https://www.datanami.com/2020/11/23/data-headaches-targeted-with-a-dose-of-big/

Exponam & Apache Spark

 

Exponam’s direct integration with Apache Spark, including Databrick’s commercial offering, improves the time-to-value of quantitative, analytic, and machine learning results available with Spark.  Exponam’s integration achieves results with these advantages:

  • A native data source for loading and saving Exponam .BIG filesThe native data source is built with Exponam’s powerful core technology, which dynamically tunes itself to your enterprise Spark clusters’ runtime capabilities.  This ensures lean execution, high performance, and brisk throughput to and from Spark’s internal RDD (resilient distributed data) structures.  With Exponam, you can ingest large datasets into Spark out of highly compressed import files without wasting space and time.  And you are able to egress Spark data into a format that is orders of magnitude more compressed than standard delimited formats, allowing much larger datasets to be faithfully preserved for audit and archival needs.
  • Frictionless access with Spark DataFrames

    Exponam data load and save operations are available using standard DataFrame syntax that data scientists use every day, whether with Scala, Python, or Spark SQL.  Exponam’s default options can be trivially overridden using standard DataFrame options, unleashing the full power of Exponam’s underlying technology: security, file optimization levels, story files, and application-defined supplemental metadata.An Exponam file can contain any number of tables, each with its own schema and row-level data.  Each table can be loaded individually, allowing a single Exponam file to transport entire rich repositories of data into Spark.  Exponam’s schemas eliminate the potential ambiguity of inferred schemas, and mean that the native representation of objects in RDDs is always optimal and correct.Further, Exponam’s save operation with Spark DataFrames allows the flexibility that DataFrame users demand.  Save can be invoked in a cluster-aware fashion, with each node in the cluster generating an output file for its local data only, which can be advantageous for extremely large RDDs.  Alternately, DataFrame results can be coalesced (or glom’ed) through the master node, and result in a single output file.  The point is that Exponam allows you to use the pattern that best fits your cluster profile and data egress requirements.
  • Data lineage

    Modern data architectures seek to preserve data lineage across disparate products and solutions, an almost insurmountable task when data is moved between traditional silos, compute grids, and data grids.  With Exponam, the provenance of data is integral to the file itself.  This allows solution architectures using Apache Spark to maintain data lineage from ingest through egress, so that the linkage to upstream systems is faithfully preserved.
  • SecurityStandard data exchange formats for Spark require data that is unencrypted when at rest.  Exponam, in contrast, is always encrypted at rest, even as it is being loaded into the cluster.  The attack surface for potential data breach is demonstrably smaller with Exponam.Further, Exponam’s default behavior on load operations is to first establish the integrity of the file.  If the file has been tampered with, it will fail with a standard Spark exception, and absolutely no row-level data will be generated in Spark.

 

Download this article:

Data Sharing in a COVID-19 World

Accessing and exploring data sets is hard in the best of times.  Data access is a huge challenge when working within a contingency/continuity, work-from-home mode.  We need a way to distribute data to employees such that it is secure, highly compressed, and ultra-easy to access.

Exponam .BIG files are an easy to use alternative to CSV files.  .BIG files are fully encrypted, ensuring that data files are completely secure anywhere they reside, at rest, in use, and in transit.  Access is controlled via passphrase, token, or multifactor authentication.  Dynamic entitlement is available for further control of sensitive GDPR or HIPAA governed data.

.BIG files are super-efficient for transferring and querying data.  Time to migrate large data sets into/out of databases is slashed from hours to minutes.  Queries execute in a fraction of the time it takes with CSVs.  .BIG files are highly compressed, enabling fast download, distribution, and sharing – no matter how many rows of data (millions, 100s of millions of rows).

.BIG files are easily accessed via the Exponam ExplorerTM, a spreadsheet viewer for instant filtering, sorting, and finding data.  Isolated subsets are moved to Excel with a single click.  .BIG data is also accessed via JDBC with the speed of database queries.

.BIG files are completely tamper-proof and their provenance is guaranteed.  You can be confident that the publisher is genuine, the file properties are accurate, and the data is unaltered.

Download this article.

The Great Myth of Data Security

MYTH: Perimeter & Endpoint Security Protects Your Data

FACT: Your Data Will Get Out

You protect your data repository.
You protect your network.
You provide users with visualizations, summaries, and exception analysis.

Yet, thousands of data files are extracted and downloaded from your repository.  These CSV and XLS files reside on your network drives, on employee machines, in the public cloud, and in employees’ email.  These data files are the most vulnerable point of attack for malicious actors.

If you are not securing data files, you are not protecting your data.

Current State of Data Security:

Data security in 2019 is achieved through a series of concentric security perimeters which attempt to ensure that access to data is authorized.  We secure network and application endpoints.  We secure, manage, monitor, and audit access to databases.  We attempt to scan and block content from leaving our perimeter.

 

But data gets out.  It gets out intentionally when we distribute to partners, vendors, clients.  It gets out for legitimate reasons when managers and analysts drill into the data.  It gets out when we are hacked or otherwise compromised.

We have thousands of data files that are extracts from our databases.  They exist as CSV files, as XLS files, and other delimited file types.  They reside on our network drives, on employee machines, in the public cloud, and in employees’ email.  These files are all potential security concerns.

We address this massive security concern through training employees on corporate data policies.  We instruct how data should be handled, where should it be sent and with whom should it be trusted.

THIS IS INSANE.

 

The Exponam .BIG Solution:

Exponam .BIG files are an easy to use alternative to downloaded CSV files.  .BIG files are fully encrypted, ensuring that data files are completely secure anywhere they reside, at rest, in use, and in transit.  Access is controlled via passphrase, token, or multifactor authentication.  Dynamic entitlement is available for further control of sensitive GDPR or HIPAA governed data.

Your data is secure anywhere it resides – at rest, in use, and in transit.

.BIG files are highly compressed, enabling easy download, distribution, and sharing – no matter how many rows of data (thousands, millions, 100s of millions of rows).

.BIG files are easily accessed via a spreadsheet viewer for instant filtering, sorting, and finding data.  Isolated subsets are moved to excel with a single click.  .BIG data is also accessed via JDBC, Java and Windows libraries.

 

.BIG files are completely tamper-proof and their provenance is guaranteed.  You can be confident that the publisher is genuine, the file properties are accurate, and the data is unaltered.

It is time to learn more.  Contact us for trial information.  

Download this article