diff --git a/docs/howto/load_data_from_external_data_stores.md b/docs/howto/load_data_from_external_data_stores.md index 36c15b501b..6fa9616ab9 100644 --- a/docs/howto/load_data_from_external_data_stores.md +++ b/docs/howto/load_data_from_external_data_stores.md @@ -3,7 +3,9 @@ SnappyData comes bundled with the libraries to access HDFS (Apache compatible). You can load your data using SQL or DataFrame API. -## Example - Loading data from CSV file using SQL +## Example - Loading Data from CSV File using SQL + +The following example demonstrates how you can load data from the CSV file, in a local file system, by using SQL: ```pre // Create an external table based on CSV file @@ -14,7 +16,7 @@ CREATE TABLE CUSTOMER using column options() as (select * from CUSTOMER_STAGING_ ``` !!! Tip - Similarly, you can create an external table for all data sources and use SQL "insert into" query to load data. For more information on creating external tables refer to, [CREATE EXTERNAL TABLE](../reference/sql_reference/create-external-table/) + Similarly, you can create an external table for all data sources and use SQL "insert into" query to load data. For more information on creating external tables refer to, [CREATE EXTERNAL TABLE](../reference/sql_reference/create-external-table/). ## Example - Loading CSV Files from HDFS using API @@ -73,7 +75,7 @@ val df = session.createDataFrame(rdd, ds.schema) df.write.format("column").saveAsTable("columnTable") ``` -## Importing Data using JDBC from a relational DB +## Importing Data using JDBC from Relational DB !!! Note Before you begin, you must install the corresponding JDBC driver. To do so, copy the JDBC driver jar file in **/jars** directory located in the home directory and then restart the cluster. diff --git a/docs/howto/load_data_into_snappydata_tables.md b/docs/howto/load_data_into_snappydata_tables.md index cbd0b7864c..23c59ece31 100644 --- a/docs/howto/load_data_into_snappydata_tables.md +++ b/docs/howto/load_data_into_snappydata_tables.md @@ -3,16 +3,13 @@ SnappyData relies on the Spark SQL Data Sources API to parallelly load data from a wide variety of sources. By integrating the loading mechanism with the Query engine (Catalyst optimizer) it is often possible to push down filters and projections all the way to the data source minimizing data transfer. Here is the list of important features: -**Support for many Sources**
There is built-in support for many data sources as well as data formats. Data can be accessed from S3, file system, HDFS, Hive, RDB, etc. And the loaders have built-in support to handle CSV, Parquet, ORC, Avro, JSON, Java/Scala Objects, etc as the data formats. +* **Support for many Sources**
There is built-in support for many data sources as well as data formats. Data can be accessed from S3, file system, HDFS, Hive, RDB, etc. Moreover, loaders have built-in support to handle CSV, Parquet, ORC, Avro, JSON, Java/Scala Objects, etc. as the data formats. +* **Access virtually any modern data store**
Virtually all major data providers have a native Spark connector that complies with the Data Sources API. For example, you can load data from any RDB like Amazon Redshift, Cassandra, Redis, Elastic Search, Neo4J, etc. While thee connectors are not built-in, you can easily deploy these connectors as dependencies into a SnappyData cluster. All the connectors are typically registered in spark-packages.org. +* **Avoid Schema wrangling**
Spark supports schema inference. Which means, all you need to do is point to the external source in your 'create table' DDL (or Spark SQL API) and schema definition is learned by reading in the data. There is no need to define each column and type explicitly. This is extremely useful when dealing with disparate, complex and wide data sets. +* **Read nested, sparse data sets**
When data is accessed from a source, the schema inference occurs by not just reading a header but often by reading the entire data set. For instance, when reading JSON files, the structure could change from document to document. The inference engine builds up the schema as it reads each record and keeps unioning them to create a unified schema. This approach allows developers to become very productive with disparate data sets. -**Access virtually any modern data store**
Virtually all major data providers have a native Spark connector that complies with the Data Sources API. For e.g. you can load data from any RDB like Amazon Redshift, Cassandra, Redis, Elastic Search, Neo4J, etc. While these connectors are not built-in, you can easily deploy these connectors as dependencies into a SnappyData cluster. All the connectors are typically registered in spark-packages.org - -**Avoid Schema wrangling**
Spark supports schema inference. Which means, all you need to do is point to the external source in your 'create table' DDL (or Spark SQL API) and schema definition is learned by reading in the data. There is no need to explicitly define each column and type. This is extremely useful when dealing with disparate, complex and wide data sets. - -**Read nested, sparse data sets**
When data is accessed from a source, the schema inference occurs by not just reading a header but often by reading the entire data set. For instance, when reading JSON files the structure could change from document to document. The inference engine builds up the schema as it reads each record and keeps unioning them to create a unified schema. This approach allows developers to become very productive with disparate data sets. - -**Load using Spark API or SQL**
You can use SQL to point to any data source or use the native Spark Scala/Java API to load. -For instance, you can first [create an external table](../reference/sql_reference/create-external-table.md). +## Loading Data using Spark API or SQL +You can use SQL to point to any data source or use the native Spark Scala/Java API to load. For instance, you can first [create an external table](../reference/sql_reference/create-external-table.md). ```pre CREATE EXTERNAL TABLE USING OPTIONS @@ -20,15 +17,17 @@ CREATE EXTERNAL TABLE USING OPTIONS +For example, `snc.sparkContext.hadoopConfiguration.set("fs.s3a.connection.maximum", "1000")` \ No newline at end of file diff --git a/docs/programming_guide/tables_in_snappydata.md b/docs/programming_guide/tables_in_snappydata.md index c80b890d9b..d4e4ecdd45 100644 --- a/docs/programming_guide/tables_in_snappydata.md +++ b/docs/programming_guide/tables_in_snappydata.md @@ -31,7 +31,7 @@ CREATE TABLE [IF NOT EXISTS] table_name ) [AS select_statement]; -DROP TABLE [IF EXISTS] table_name +DROP TABLE [IF EXISTS] table_name; ``` Refer to the [Best Practices](../best_practices/design_schema.md) section for more information on partitioning and colocating data and [CREATE TABLE](../reference/sql_reference/create-table.md) for information on creating a row/column table.
diff --git a/docs/reference/command_line_utilities/modify_disk_store.md b/docs/reference/command_line_utilities/modify_disk_store.md index 93aef166dc..fad38926d7 100644 --- a/docs/reference/command_line_utilities/modify_disk_store.md +++ b/docs/reference/command_line_utilities/modify_disk_store.md @@ -16,6 +16,8 @@ Snappy>create region --name=regionName --type=PARTITION_PERSISTENT_OVERFLOW **For non-secured cluster** +## Description + The following table describes the options used for `snappy modify-disk-store`: | Items | Description | @@ -27,8 +29,6 @@ The following table describes the options used for `snappy modify-disk-store`: !!! Note The name of the disk store, the directories its files are stored in, and the region to target are all required arguments. -## Description - ## Examples **Secured cluster**