esProc is a pure Java developed software that can serve as a programming language for data analysis, a middleware for report data preparation, and an efficient big data computing engine.
SPL(short for Structured Process Language) is the programming language used in esProc, specifically designed to process structured data. Currently, SPL is available only in esProc, so we often use 'SPL' or the full term 'esProc SPL' interchangeably to refer to the same product, 'esProc'.
SPL adopts a unique grid programming approach, where code is written in cells and the results of each calculation step can be viewed in real-time through the cells. It has the same strong interactivity as Excel and can also be embedded into Excel to work, making it very suitable for exploratory data analysis.
SPL provides a rich computing library, particularly adept at set and order-related operations, enabling complex calculations without the need for multiple layers of nesting and lengthy code, making it simpler than SQL and Python. It supports simple parallel computing at the same time, fully ensuring the computational performance during big data analysis.
SPL is developed in Java and can be seamlessly embedded into Java reporting tools. The interpreted executed code supports hot swap, making it very suitable for the ever-changing reporting businesses.
SPL has lightweight multi-source computing capabilities, without the need for a logical data warehouse, and can directly perform mixed computing based on multi-source data such as text, databases, NoSQL, and cloud data, naturally supporting real-time hot data reporting.
SPL syntax is more concise, which can greatly simplify the difficulty of complex SQL coding in reports and quickly implement report development. Specifically, SPL also provides the ability to migrate SQL across databases for existing SQL reports.
SPL does not adopt the relational algebra-based SQL syntax. It invents an algebraic system called discrete data set instead to solve the problems of hard-to-code complex SQL. SPL makes it convenient to achieve high-performance algorithms and thus obtains much higher computing performance than the traditional relational data warehouse. It can make the most use of the hardware resources by using creative algorithms. According to many practical instances, esProc can achieve, even exceed, the performance that the distributed databases have on a single machine.
Running as a data warehouse, esProc abandons the concept of “house” , breaks the closedness featured by the conventional databases and creates an open computing system, making it qualified to replace most MPP data warehouses at lower resource-cost and with lighter framework.
This book: SPL Programming is a good start for learning SPL syntax. The book intends for beginners who do not have any programming experiences. Look it through quickly if you are a veteran, but the object understanding explained in section 4.4 is worth a study. Chapter 5 is important, too. It explains SPL’s set-oriented way of thinking, which is quite different from the other languages. But once you understand and master SPL, you can write elegant code. Chapters 8-10 are staple of SPL learning. It regards the structured data computations in a different perspective from SQL. This is significant even for the professional programmers! From the SPL point of view, SQL is a little simple in understanding the structured data as the world is complex. The knowledges you obtained in various database courses are not broad and profound enough! You need a review and brush-up!
Find basic SPL concepts in this post: SPL concepts for beginners. For beginners, you can find characteristic basic computations of SPL in SPL Operations for Beginners. Experienced programmers can quickly understand the differences between SPL and SQL. A software architect can understand the differences between SPL and traditional databases after reading Q&A of esProc Architecture.
Find comprehensive SPL documentation in SPL Learning materials. Generally, an application programmer can get started in handling basic operations from database connection: SPL: Connecting to Databases and database read/write SPL: Reading and Writing Database Data or file access and computation SPL: Reading and Writing Structured Text Files. Then you can learn how to integrate SPL in a Java application How to Call an SPL Script in Java. Those make a simple learning loop.
High-performance computations are relatively difficult, but there is a systematic book on algorithms: Performance Optimization. Performance optimization algorithms are not unique to SPL. You can implement high-performance computations using another programming language (except for SQL) after you learn these algorithms. The key lies in algorithm instead of syntax. Yet, you need to grasp SPL concept and syntax well in order to better understand the algorithms.
The SPL learning posts above also contain applications of the performance optimization algorithms.
Storage forms the cornerstone of high-performance computing. The following post introduces the proprietary storage schema commonly used in SPL for beginners: How to use SPL storage for beginners. Usually, the first step of performance optimization is designing an appropriate storage schema.
You are welcome to post your troubles and problems when trying to achieve high performance computing and discuss with us to find a solution: Wanted! Unbearably slow query and batch job.
- esProc Official WebSite: http://www.scudata.com Forum: http://c.scudata.com/
- Tutorial esProc download, installation, as well as principles and applications
- Function Reference esProc syntax, applications and examples
- User Reference esProc programming by examples
- External Library Guide Deployment of and connection to esProc external libraries
- Please head to Download esProc SPL to download esProc executable files
- How to Get Open-source esProc for Eclipse through Git
esProc is under the Apache 2.0 license. See the LICENSE file for details.