Carlos Garcia-Alvarado


Publications [DBLP | Google Scholar Profile]

[29]Clustering binary cube dimensions to compute relaxed GROUP BY aggregations,
Garcia-Alvarado, C., Ordonez, C.,
Information Systems, 2015.
Download [PDF] Downloads: 412
[28]Orca: a modular query optimizer architecture for big data,
Soliman, M. A., Antova, L., Raghavan, V., El-Helw, A., Gu, Z., Shen, E., Caragea, G. C., Garcia-Alvarado, C., Rahman, F., Petropoulos, M., Waas, F., Narayanan, S., Krikellas, K., Baldwin, R.,
SIGMOD Conference, 337-348, 2014.
Download [PDF] Downloads: 308
[27]ONTOCUBO: Cube-based Ontology Construction and Exploration (system demo),
Garcia-Alvarado, C., Ordonez, C.,
ACM SIGMOD Conference, 2014.
Download [PDF] Downloads: 388
[26]Bayesian Variable Selection in Linear Regression in One Pass for Large Data Sets,
Ordonez, C., Garcia-Alvarado, C., Baladandayuthapani, V.,
ACM Transactions on Knowledge Discovery from Data, 2014.
Download [PDF] Downloads: 338
[25]PCA for Large Data Sets with Parallel Data Summarization,
Ordonez, C., Mohanam, N., Garcia-Alvarado, C.,
Journal on Distributed and Parallel Databases, 2014.
Download [PDF] Downloads: 334
[24]Clustering Cubes with Binary Dimensions in One Pass,
Garcia-Alvarado, C., Ordonez, C,
Proc. ACM CIKM Workshop on Data warehousing and OLAP, 2013.
Download [PDF] Downloads: 316
[23]Data mining algorithms as a service in the cloud exploiting relational database systems (system demo),
Ordonez, C., Garcia-Garcia, J., Garcia-Alvarado, C., Cabrera, W., Baladandayuthapani, V., Quraishi, M.S.,
ACM SIGMOD Conference, pp. 1001-1004, 2013.
Download [PDF] Downloads: 322
[22]Total Operator State Recall - Cost-effective Reuse of Results in Greenplum Database,
Caragea, G., Garcia-Alvarado, C., Petropoulos, M., Waas M. F.,
Proc. of IEEE Workshop on Self-Managing Database Systems (Workshop at ICDE), 2013.
Download [PDF] Downloads: 1681
[21]Querying External Source Code Files of Programs Connecting to a Relational Database,
Garcia-Alvarado, C., Ordonez, C., Baladandayuthapani, V.,
Proc. of ACM CIKM Ph.D. Workshop, 2012.
Download [PDF] Downloads: 905
[20]Query Processing on Cubes Mapped from Ontologies to Dimension Hierarchies,
Garcia-Alvarado, C., Ordonez, C.,
Proc. ACM CIKM Workshop on Data warehousing and OLAP, 2012.
Download [PDF] Downloads: 521
[19]Fast PCA Computation in a DBMS with Aggregate UDFs and LAPACK,
Ordonez, C., Mohanam, N., Garcia-Alvarado, C., Tosic, P. T., Martinez, E.,
ACM CIKM Conference, 2012.
Download [PDF] Downloads: 725
[18]Integrating and Querying Source Code of Programs Working on a Database,
Garcia-Alvarado, C., Ordonez, C.,
ACM KEYS Workshop (SIGMOD Conference Workshop), 2012.
Download [PDF] Downloads: 535
[17]Automatic Data Placement in MPP Databases,
Garcia-Alvarado, C., Raghavan, V., Narayanan, S., Waas M. F.,
Proc. of IEEE Workshop on Self-Managing Database Systems (Workshop at ICDE), 2012.
Download [PDF] Downloads: 488
[16]A Data Mining System Based on SQL Queries and UDFs for Relational Databases (system demo),
Ordonez, C., Garcia-Alvarado, C.,
Proc. ACM CIKM Conference, pp. 2521-2524, 2011.
Download [PDF] Downloads: 690
[15]Integrating and Querying Web Databases and Documents,
Garcia-Alvarado, C., Ordonez, C.,
Proc. ACM CIKM Conference, pp. 2369-2372, 2011.
Download [PDF] Downloads: 1153
[14]ONTOCUBE: Efficient Ontology Extraction using OLAP Cubes,
Garcia-Alvarado, C., Chen, Z., Ordonez, C.,
Proc. ACM CIKM Conference, pp. 2429-2432, 2011.
Download [PDF] Downloads: 553
[13]Parallel Multithreaded Processing for Data Set Summarization on Multicore CPUs,
Ordonez C., Navas M., Garcia-Alvarado, C.,
Journal of Computing Science and Engineering (JCSE), 5(2): 111-120, 2011.
Download [PDF] Downloads: 936
[12]Comparing SQL and MapReduce to compute Naive Bayes in a Single Table Scan,
Pitchaimalai, S.K., Ordonez, C., Garcia-Alvarado, C.,
ACM CIKM Workshop on Cloud Data Management, 2010.
Download [PDF] Downloads: 497
[11]OLAP-based Query Recommendation,
Garcia-Alvarado, C., Chen, Z., Ordonez, C.,
Proc. ACM CIKM Conference, 2010.
Download [PDF] Downloads: 470
[10]Relational versus non-relational database systems for data warehousing,
Ordonez C., Il-Yeol Song, Garcia-Alvarado, C.,
Proc. ACM CIKM Workshop on Data warehousing and OLAP, pp. 67-68, 2010.
Download [PDF] Downloads: 415
[9]Enhancing Document Exploration with OLAP (system demo),
Chen, Z., Garcia-Alvarado, C., Ordonez, C.,
Proc. of IEEE ICDM Conference, 2010.
Download [PDF] Downloads: 502
[8]Keyword Search Across Databases and Documents,
Garcia-Alvarado, C., Ordonez, C.,
ACM KEYS Workshop (SIGMOD Conference Workshop), 2010.
Download [PDF] Downloads: 578
[7]Query Recommendation in Digital Libraries using OLAP (system demo),
Garcia-Alvarado, C., Ordonez, C., Chen, Z.,
ACM KEYS Workshop (SIGMOD Conference Workshop), 2010.
Download [PDF] Downloads: 470
[6]OLAP with UDFs in Digital Libraries,
Garcia-Alvarado, C., Chen, Z., Ordonez, C.,
ACM CIKM Conference, 2009.
Download [PDF] Downloads: 667
[5]A Referential Integrity Browser for Distributed Databases,
Ordonez, C., Garcia-Garcia, J., Montero-Campos, R., Garcia-Alvarado, C.,
ACM WebDB (SIGMOD Conference Workshop), 2009.
Download [PDF] Downloads: 427
[4]DBDOC: Querying and Browsing Interrelated Documents in SQL (system demo),
Garcia-Alvarado, C., Ordonez, C., Chen, Z.,
ACM KEYS Workshop (SIGMOD Conference Workshop), 2009.
Download [PDF] Downloads: 408
[3]Fast and Dynamic OLAP Exploration Using UDFs (system demo),
Chen, Z., Ordonez, C., Garcia-Alvarado, C.,
ACM SIGMOD Conference, 2009.
Download [PDF] Downloads: 519
[2]Efficient Distance Computation Using SQL Queries and UDFs,
Pitchaimalai, S., Ordonez, C., Garcia-Alvarado, C.,
IEEE HPDM (High Performance Data Mining Workshop, at ICDM), 2008.
Download [PDF] Downloads: 1037
[1]Information Retrieval from Digital Libraries in SQL,
Garcia-Alvarado, C., Ordonez, C.,
ACM WIDM (Web Information and Data Management, at CIKM), 2008.
Download [PDF] Downloads: 1117
Publication color key: Proceedings, Journal


Posters

  1. Garcia-Alvarado, C., Ordonez, C., Mohanam N., Martinez-Encarnacion, E., Tosic, P., Efficient Linear Algebra Processing Inside a Database Management System. International Workshop on Accurate Solution of Eigenvalue Problems (IWASEP), Poster presentation, 2012. [LINK][PDF]

  2. Ordonez, C., Mohanam N., Garcia-Alvarado, C., Efficient One-pass Algorithms for Data Mining based on UDFs. New England Database Summit, Poster presentation, 2012. [LINK]



Projects

  • Data mining.
  • Scalable algorithms.
  • Information retrieval in database systems.
  • Query recommendation.
  • Ontology extraction.
Information Retrieval in SQL

Abstract: In our research, we propose several techniques for executing document preprocessing, stop-word removal, term extraction and information retrieval with two fundamental IR models in SQL: the Vector Space Model and the Okapi Probabilistic Model.

   
Cube Operator in Documents

Abstract: Digital libraries are generally queried to and specific documents, but not to compute aggregations in an exploratory fashion. We developed a system that enables OLAP processing on the metadata of Digital Libraries by exploiting a combination of efficient UDFs and optimized SQL queries. We propose the creation of OLAP networks, a data structure that contains a summarized representation of the original data set, on a document metadata collection to enrich ordinary retrievals. With such a network, we provide users the ability to quickly explore the metadata collection. Additionally, our system provides capabilities for setting support thresholds, which we evaluate using iceberg queries, visualizing, and navigating large digital libraries.

We extended this work in future papers to obtain query recommendations and ontologies.

   
Document Metadata and Relational Schemas

Abstract: There exist many interrelated information sources on the Internet that can be categorized into structured (database) and semistructured (documents). A key challenge is to integrate, query and analyze such heterogeneous collections of information. In this paper, we defend the idea of building web metadata repositories using relational databases as the main source and central data management technology of structured data, enriched by the semistructured data surrounding it. Our proposal rests on the assumption that heterogeneous relational databases can be integrated (i.e. entity resolution is assumed to work well) and thus can serve as references for external data. That is, we tackle the problem of integrating information in the deep web, departing from databases. We discuss a prototype system that can integrate and query metadata and related documents, based on relational database technology. Metadata includes database ER model elements like database name, table, and column (entity, attribute). Web document data include files, documents and web pages. Links between metadata and external documents are built with SQL queries. Once databases and documents are linked, they are managed and queried with SQL. We discuss an interesting scientific application of our solution with a water pollution database.