Maclean Liu(刘相兵 发表于 2012-4-9 19:07:52

OOW 2011 Big Data Management – Are You Ready?

OOW 2011 Big Data Management – Are You Ready?


Capturing the broadest set of information and data available (both structured and unstructured
Including data not in your Data Warehouse…weblogs, twitter feeds, facebook feeds
Organizing this information using highly scalable platforms
Analyzing data within context of all your enterprise data using advanced analytics
Deciding on…

Apache Hadoop is open source software
Tool to transform large quantities of data (rather than in-depth analysis)
Key capabilities:
HDFS: Cluster filesystem with redundant storage
Highly scalable data processing
Map/Reduce programming paradigm
Cost-effective storage and processing of large data volumes
Typical customers
Internet companies are using large Hadoop clusters
Enterprise customers still kicking tires with small clusters

Hadoop is good for processing huge weblogs and tweets (as one type of big data) into actionable information. Say as an example, we want to take logs of web events and re-construct who visited a company web site, what pages they clicked on and in what order, and how long they spent there. But how do we get at this information by simply looking at the data? Now although Hadoop is doing the underlying work, we’re going to use a new feature in Oracle Data Integrator to setup and run the whole process from start to finish. The advantage here is that Oracle customers can use a tool, process and languages that they already know. ODI has knowledge modules that drive Hadoop. So the DBA only needs to specify the mapping expressions for the transformations in the ODI graphical UI. ODI does the rest for them.

Typically this kind of data will be reduced in size when we’re done. There is data that is not relevant, and there are duplicates/multiple copies.  We use Hadoop to extract the valuable information which is what we really need for the analysis.  Then finally, ODI together with Oracle Loader for Hadoop moves the results into our resulting sandbox – which is based on Oracle Database technology.

The way to get business value out of all this new data is by doing the right analysis to understand what the data is telling you. Oracle Database has the most advanced analytics on the market and we’ve gone one further with the addition of Oracle R Enterprise. With Oracle R Enterprise we can run the statistical analysis in the database, and the analyst’s laptop is just used to control the session and view the results. This means bigger datasets and faster results.

What’s the benefit of using ODI for Big Data?

Improves productivity for transforming Big Data
Optimizes the loading of Big Data
Reduce complexities of Hadoop through graphical tooling
Minimizes the size of data results
Consistent tooling across BI/DW, SOA, Integration and Big Data


页: [1]
查看完整版本: OOW 2011 Big Data Management – Are You Ready?