Exponam & Apache Spark

 

Exponam’s direct integration with Apache Spark, including Databrick’s commercial offering, improves the time-to-value of quantitative, analytic, and machine learning results available with Spark.  Exponam’s integration achieves results with these advantages:

  • A native data source for loading and saving Exponam .BIG filesThe native data source is built with Exponam’s powerful core technology, which dynamically tunes itself to your enterprise Spark clusters’ runtime capabilities.  This ensures lean execution, high performance, and brisk throughput to and from Spark’s internal RDD (resilient distributed data) structures.  With Exponam, you can ingest large datasets into Spark out of highly compressed import files without wasting space and time.  And you are able to egress Spark data into a format that is orders of magnitude more compressed than standard delimited formats, allowing much larger datasets to be faithfully preserved for audit and archival needs.
  • Frictionless access with Spark DataFrames

    Exponam data load and save operations are available using standard DataFrame syntax that data scientists use every day, whether with Scala, Python, or Spark SQL.  Exponam’s default options can be trivially overridden using standard DataFrame options, unleashing the full power of Exponam’s underlying technology: security, file optimization levels, story files, and application-defined supplemental metadata.An Exponam file can contain any number of tables, each with its own schema and row-level data.  Each table can be loaded individually, allowing a single Exponam file to transport entire rich repositories of data into Spark.  Exponam’s schemas eliminate the potential ambiguity of inferred schemas, and mean that the native representation of objects in RDDs is always optimal and correct.Further, Exponam’s save operation with Spark DataFrames allows the flexibility that DataFrame users demand.  Save can be invoked in a cluster-aware fashion, with each node in the cluster generating an output file for its local data only, which can be advantageous for extremely large RDDs.  Alternately, DataFrame results can be coalesced (or glom’ed) through the master node, and result in a single output file.  The point is that Exponam allows you to use the pattern that best fits your cluster profile and data egress requirements.
  • Data lineage

    Modern data architectures seek to preserve data lineage across disparate products and solutions, an almost insurmountable task when data is moved between traditional silos, compute grids, and data grids.  With Exponam, the provenance of data is integral to the file itself.  This allows solution architectures using Apache Spark to maintain data lineage from ingest through egress, so that the linkage to upstream systems is faithfully preserved.
  • SecurityStandard data exchange formats for Spark require data that is unencrypted when at rest.  Exponam, in contrast, is always encrypted at rest, even as it is being loaded into the cluster.  The attack surface for potential data breach is demonstrably smaller with Exponam.Further, Exponam’s default behavior on load operations is to first establish the integrity of the file.  If the file has been tampered with, it will fail with a standard Spark exception, and absolutely no row-level data will be generated in Spark.

 

Download this article:

Data Sharing in a COVID-19 World

Accessing and exploring data sets is hard in the best of times.  Data access is a huge challenge when working within a contingency/continuity, work-from-home mode.  We need a way to distribute data to employees such that it is secure, highly compressed, and ultra-easy to access.

Exponam .BIG files are an easy to use alternative to CSV files.  .BIG files are fully encrypted, ensuring that data files are completely secure anywhere they reside, at rest, in use, and in transit.  Access is controlled via passphrase, token, or multifactor authentication.  Dynamic entitlement is available for further control of sensitive GDPR or HIPAA governed data.

.BIG files are super-efficient for transferring and querying data.  Time to migrate large data sets into/out of databases is slashed from hours to minutes.  Queries execute in a fraction of the time it takes with CSVs.  .BIG files are highly compressed, enabling fast download, distribution, and sharing – no matter how many rows of data (millions, 100s of millions of rows).

.BIG files are easily accessed via the Exponam ExplorerTM, a spreadsheet viewer for instant filtering, sorting, and finding data.  Isolated subsets are moved to Excel with a single click.  .BIG data is also accessed via JDBC with the speed of database queries.

.BIG files are completely tamper-proof and their provenance is guaranteed.  You can be confident that the publisher is genuine, the file properties are accurate, and the data is unaltered.

Download this article.

How Valuable Is Your Voter Data?

You share filesvoter files, donor lists, supporter data.
It’s how you win elections.

Who do you fear accessing your data?

~ Foreign Agents ~ Domestic Meddlers ~ Other Candidates ~

What happens when your files are stolen or altered?

~ Lost elections ~ Regulatory entanglement ~ Lost donor revenue ~

 

How secure is your data?  Even if you send data through an encrypted channel, once received it is vulnerable.  “Secure” zip files do not protect you.   CSVs and Excel files sitting on a laptop are not secure.  Unsecured data at rest is a huge concern.

We created .BIGTM data files to enable easy sharing and exploring of data (a few rows or hundreds of millions of rows).  But security was always our primary concern.  Our files are architected to be the most impenetrable data vehicles on the market.

Read about how .BIG files secure data.

And then download the Exponam Explorer to see how easily you can explore large sets of data.

Want to know a little secret?

Data isn’t hard. You don’t need a PhD in Data Science to understand big data. I spent years running reporting services in the finance industry.
The truth? We don’t want to give data to you.

Why?

A few reasons:

  • The tools to view, query, and sort large data sets are not made for you. They are made for programmers. If we give you a data set with 10 million records, what are you going to do with it? (Hint: It will probably mean more work for us.)
  • We don’t trust you to draw the correct conclusions from the data. This may be because the data is not “clean.” Maybe we haven’t done as good a job as we should have in storing data.  Maybe we want you to NEED us to provide you with answers.
  • Once you have the data, we don’t know what you might do with it; how you might try to use it; with whom you might (intentionally or unintentionally) share it; or, how you might corrupt it.

No longer.

Exponam .BIGTM files make data sets portable, secure, and tamper-proof. 

The Exponam ExplorerTM makes filtering and sorting data easy and fast. No need to ask your IT group to run queries for you. You don’t need to wait for someone to create custom reports.

We architected Exponam .BIG files to be the most secure data files on the market. (See Free Your Data for more on security.)

It’s your data. Download. Share. Explore.

Find out more. Visit www.exponam.com

DOWNLOAD