INFORMATION TECHNOLOGY AND DATA COLLECTIONS BOTH VIRTUAL AND PHYSICAL (part 3 of 4)
- Part 3 of 4
THE VIRTUAL COLLECTION OF THE PRESENT
Virtual collections have exploded since the start of the new millennium, and in many library models, they are starting to relegate the physical collection to archive statuses. The virtual revolution has evolved into an entire paradigm shift for the whole library profession. There is no longer a choice as to whether a system or its branches will embrace and implement the new virtual collection models, for to not do so would equal the functional suicide of irrelevance, which any library administrator in both the public and academic institutions knows is soon followed by cuts in funding. Patrons are already use to e-books and ready consume online downloads of e-texts via their ubiquitous network access. Most carry with them the technological tools to consume this digital information most anyplace they go—a telling milestone illustrating the future of the Information and Digital Age.
Advances in Networking to the Present
New technological developments in the hardware and software foundations of the Internet have led to greater access and data transfer, especially as bandwidth significantly increases in the networks themselves. Many home users have moved away from the older technology of dial-up connections to higher bandwidth and persistent connections such as DSL (Digital Subscriber Line) based on the old POTS infrastructure and Broadband cable based on the older coaxial TV cable technology. Add to these connection services and ISPs (internet service providers) usually offered through the large telecom companies, the satellite services, and wide-area wireless providers such as the cellular networks and home consumers have many options to choose from—options that were not available ten years ago. The cost of Internet access has decreases as well, and this has played a role in the spread of Internet use among the general public.
As increases in bandwidth, stability, and data security of Internet connections leads to more options for virtual collections in the type and volume of their material, we will continue to see exponential growth and complexity of the content on the Internet as a whole. Multimedia virtual collections such as YouTube would not have been possible over older dial-up connections—streaming video requires higher connections to play in a reasonable real-time environment with minimal buffer rates, and with today's access, larger files can be accessed in more practical download times. The original hypertext language of the first web pages is further augmented by more active applications such as embedded multimedia, software services, database access, and dynamic web pages based on real-time user interactions. To give the reader a better idea of the massive size of the Internet, the following image below gives a small snapshot of less than 30% of the Class-C IP addresses collected from usage data on January 15, 2005 by the Opte Project. Each line is drawn between two network nodes and their corresponding IP addresses. The length of the lines is indicative of the network delay between the two nodes (2005). Though only a small snapshot, this key-hole piece of the vast digital sum creates a slice of what we know as the Internet.
Figure 6
A living digital universe. The Opte Project snapshot of the Class-C IP addresses on the Internet, January 15, 2005. Compare this with chapter1, figure 3's image of the first network, DARPA's ARPANET, March 1977. Image licensed under the Creative Commons Attribution 2.5 Generic license.
The increase use of the Internet can be linked to the increase use of virtual collections in all their forms. As patrons become use to getting their information from websites and Internet based applications like e-mail and apps on their phones, accessing virtual collections will become a standard part of their daily activities, especially given the increased access to constantly networked technology such as smartphones and ever smaller form factor computers designed for ease of carry and increased mobility.
Current 'Wired' Networking Technologies: Gigabit Ethernet and Fiber Optics
The explosive increase of bandwidth is mostly related to the hardware technologies of networking. With the development of IP routing and MAC-based switching, the entirety of the line of communication can be dedicated to that momentary data-transfer, creating a sort of point-to-point connection as needed within the network. Frame switching and packet routing was a major innovation in communication management, as the traditional analog and voice lines required constant connections that remained 99.9% idle given the transfer of the human voice in real-time—these idle times could not be aggregated due to limitations in these older technology paradigms. Even though network hubs, the predecessor to network switches, use to replicate the network data over all the network connections, drastically dividing up the bandwidth, the new revolution in digital communications and frame switching marked a great leap in data transfer over older analog methods. All these developments rest on the first two levels of the OSI model.
There have also been great advances in the transfer infrastructure over wired connection. Ethernet (IEEE 802.3) cabling with a CSMA/CD (carrier sense multiple access with collision detection) networking scheme has won out as the preferred data transfer on the Physical and Data Link Layer in the OSI model. Competitors were the legacy Token Ring and Token Bus technologies. The wire infrastructure is primarily based on varieties of copper core/coaxial cables, replaced with versions of modified copper twisted-pair cables, inspired by the old telephone cables, being the standard for today and terminating with the ubiquitous RJ-45 connector. Other popular cables include coaxial, the prior standard, mostly used in cable television and the Internet services provided by these companies. As well, many people do not realize that they can utilize the cables of their internal power system as a wired network infrastructure through the simple application of specialized filters on their power sockets. Though these copper-core power lines are not faster than Ethernet and fiber optic cabling, they do have the automatic benefit of not having to run additional cabling throughout your building—they do not interfere with the building’s electricity either.
Ethernet has gone through several evolutions of speed and distance development. Due to the physical limitations of electrical signals sent over copper conductors, speed and distance has quantitative capabilities called attenuation, but with the proper copper core wiring and twist to reduce cross-talk between the wire pairs, they have reached gigabit bandwidths. The Category 5e and 6, called Cat-5e and Cat-6 cables respectively, reach these bit rates. The copper based technology is unlikely to go beyond this stage, though, and with the development of sending light pulses through optic mediums, i.e. fiber optics, networks have achieved greater speeds over much greater distances. The data transfer is far more stable as well, given that light is barely affected by the electro-magnetic fields (EMF) created by other electrical devices or from other signals, nor does it give off an EMF itself, so there is no interference from cross-talk. With the metal-based mediums as conductors, such as copper-core, the EMF interference becomes more pronounced as more energy was placed into the system to achieve greater speeds and distances. As light is based on different physical principles, these limitations were overcome. To give an example, Ethernet 1000BASE-T and –TX standards using Cat-6 twisted-pair cable has a gigabit bandwidth over 100 meters. Ethernet 1000BASE-LX10 standard uses fiber optic for a gigabit bandwidth over 10 kilometers—a transfer distance a hundred times longer than the copper cable. With these wire technologies and the global infrastructure, the Ethernet standard has become the backbone of the Internet.
Current 'Wireless' Networking Technologies
In order to provide network access to an area, you must invest time and material in establishing a physical infrastructure. In the beginning of networking, this was all through a wired infrastructure, either laying dedicated cables or using the telephone cable infrastructure. Now that the technology for networking has evolved due to increased need and additional developments, the wireless technology standards have been established and widely implemented, providing the primary access for many of the networks in libraries and public areas.
Wireless networks, though significantly slower, covering less distance, remains considerable less stable and is significantly far less secure than wired networking. Although, it has one major advantage that hands-down makes it the technology of choice for home and wide-area network topographies that do not have the expertise and resources to invest in establishing complex wired systems—it does not require the laying of wire (i.e. wireless), and it can be used by mobile nodes on the network where connecting into a network jack is not feasible and presents security risk. Wireless transfer of data is not a new concept. Ever since the raise of radios and transceivers, it has been an important cornerstone in communication technology. In addition to radio frequencies (RF), other forms of wireless technology include the line-of-sight infrared light, laser light, and visible light technologies as well as acoustic energy. RF has become the most widely used in the wireless networking and cellular phone telecommunication networks. In wireless networking, the various RF standards fall under the IEEE 802.11 range and the IMT (International Mobile Telecommunications) generational groups such as 2G, 3G and 4G networks with secondary short-range protocols such as Bluetooth. Like Ethernet for wired networking, the various wireless standards cover the Physical and Data Link Layers of the OSI model.
Wi-Fi technology is defined with the IEEE 802.11 standards a, b, g, and n, with several additional subdivisions within each letter. Each standard sets radio frequencies, sub-channels, and bandwidth requirements over certain distances. For example, one of the first consumer Wi-Fi standards, IEEE 802.11b, has a maximum raw data rate of 11 Mbit/s and uses the 2.4 GHz range. Unfortunately, most other RF devices, microwaves, cell phones, and cordless phones also use the 2.4 GHz range, so there were issues with devices inferring with each other's signals. Other standards were developed to overcome this challenge and increase range and bandwidth. The most modern Wi-Fi standard on the market today is the IEEE 802.11n variant with the augmentation of MIMO (multiple-input multiple-output) technology through the use of more than one built-in antenna and has a maximum raw data rate of 54 Mbit/s to 600 Mbit/s with the use of both the 5 GHz and 2.4 GHz ranges—another technology referred to as dual-band.
New technologies are constantly being developed and introduced in to the consumer market. The cellular network technologies, traditionally aimed at wide-area networking (WAN) are becoming stronger and more stable. WiMAX (Worldwide Interoperability for Microwave Access), defined by IEEE 802.16, is a technology derived from this family, and it is becoming an effective wireless competitor to established DSL and cable ISP services. The fourth generational evolution standard of the IMT, more commonly marketed as 4G, in cellular networking, though not as fast, is far more accessible to people around the world. The wired technologies are countering with full optical networks, stability, and better prices. Networking technology remains a field of high competition and development.
Virtual Collections, Public and Private
With increases in public use of web page technologies, namely coding standards as well as the augmenting plug-ins, back-end, and front-end applications that can be embed in a page to add additional functionality, access to virtual collections and the databases that make them possible have become a daily part of our lives. Without these improvements, access to popular websites such as YouTube, Google Books, Wikipedia, and the thousands of other websites—which act as access portals to their massive databases of indexed or stored information—would be severally limited to the home user. Virtual libraries tend to also use standard webpage technologies to provide relatively user-friendly access to their supporting databases. All together, these virtual collections, and modes of access have significantly altered the means in which most patrons consume information for browsing the virtual library is different than browsing a physical collection. This in turn has affected how libraries mold their user environments. One element that is often lost while searching online sources is serendipity (Thomas, 2000). Patrons can make the best find while perusing the stacks, notes Thomas. One suggestion Thomas says will assist researchers looking through the virtual library is to place electronic indices next to the collection in the physical library.
User perception of the university library website is an important consideration. Researchers have found that opinions of electronic material and of virtual libraries tend to differ among users, such as undergraduates have no experience with their particular institution’s online library and less research experience in general than graduate students and faculty. Students are now required to have at least some technical ability in order to use academic library online resources. Technical expertise is also required of library and/or information professionals. More libraries are adopting more library integrated information technologies, so that they may better serve their patrons, and these features enable users to customize and simplify websites. Research shows the importance of asking how the use of a university library’s website can be enhanced. Enhancing virtual or electronic library resources requires a user perspective, and this perspective has been the focus of several studies referenced in Yong-Mi Kim’s article, Users’ Perceptions of University Library Websites: A Unifying View. Studies suggest that people are likely to use a system when it is useful. Research has also shown that Doctoral students and faculty members are more likely to find electronic resources useful. Other website attributes preferred by library users include a simple design, logical information layouts, ease of navigation, and customization (Kim, 2011).
Through marketing research, user feedback, and trial and error, private websites have come to the same logic. For example, take Google's age-old design, simple and elegant which seems to take simplistic to its aesthetic extreme. Contrast this to the information over-load visuals of some other webpages, such as Yahoo! and AOL. Website design has definitely become a high-value and practical art. Embedded client-side and server-side applications add considerable functionality to many of these websites. Though coding such as Flash have increasingly fallen out of favor with web-developers, not because of its functionality, but because of the difficulty in indexing and search engines reduced ability to crawl them for information. Flash is a more image based technology, as opposed to the text standards of HTML/XHTML technologies, and HTML coding has become more like the skeleton structure in which additional web application, such as JavaScript and PHP, dress according to unique user perimeters and interaction. For example, every time you make a query in an Internet search engine, the results page is dynamically populated by the returned data from the search engine's web servers. The page coding provides the structure, but the client-side applications, such as JavaScript, DHTML, Flash, and Silverlight technologies provide added web page functionality not possible through a simple HTML coding alone. Recently, technologies have been developed to coordinate client-side scripting with server-side technologies such as PHP. Ajax, acronym for Asynchronous JavaScript and XML, a web development technique using a combination of various technologies for a more streamlined development and portability between web browsers, is an example of technology which creates an interactive and rich web viewing experience. There is also added ability to use some scripting languages, such as JavaScript.
Given these technologies, the Web has come a long way from its modest roots. The WWW Virtual Library was the very first launch of a virtual collection indexing sites on the web—a sort of manual search engine whose content was reviewed and vetted by thousands of volunteers. This site is still operational even today—retrieved from https://vlib.org/:
Figure 7
The Old and the New: original Tim Berners-Lee's web catalog at CERN, 1991 (viewed in a modern Firefox web-browser). This is the first virtual library on the web, in this case, offering access to a virtual collection of web links stored on databases and accessed via web servers. Photo by the author.
If you compare the first webpage and the first virtual collection site with the complex and rich websites of today, the advancement of webpage technology and web browsers on the Internet since 1991 has done much to vault data consumption by patrons via virtual collections well ahead of their physical collections counter-parts. Given these increasingly growing trends, libraries cannot afford to deny virtual access to their collections if they wish to retain the interest and support of their patrons. Libraries of today will have to invest as many resources in virtual content and online staff services as they do in the brick and mortar and physical collection of the traditional institution. The future library will see that balance of content and services swing overwhelmingly in favor of the virtual side of this delineating spectrum.
Database Models and Technologies that Make the Virtual Collection Possible
Database technology and design, along with their underlying data models, is at the very core of every virtual collection. Other than the client portals that each patron uses to access the virtual environment and the thousands of network connections and subsequent Internet that it is a part of, the database structure and supporting technology remains the third of these elements that makes every virtual collection possible. The database design and technology directly determine how the information within the virtual collection is stored and accessed by its patrons. Each librarian must acquaint themselves with basic database design, technology, and data model theory in order to understand the virtual collections that are a major part of their institution and help to mold their user environments. Data models are the basic informational designs that the collection data is fit into in order to be organized, stored, and retrieved in a manner that is useful and effective. Essentially, this is the theory behind any database concept. The field of data models is populated with a bewildering cloud of colloquial and imprecise terms and definitions attached to those terms as each model borrows and redefines their own terms for essential functions within its system and logic. From data model to data model, specific terms may even come to mean other basic elements of the data organizational theory. At the moment, there are three board types of data models, with hybrids and amalgamations also being fairly common. The first data model was the Hierarchical Model, followed by the development of the Network Model, the most used Relational Model, and the newest Object-orientated Model strongly influenced by the raise of object-orientated programming (OOP) in software development in the 1980's.
The Hierarchical data model is the oldest model. The core principle behind it is the one parent-to many children data relationship inherit to its tree-like system. Mostly, we see this in regular spreadsheet programs and computer file systems where all data has inter-relational meaning based off its position in the hierarchical order. The Windows Registry database is an important example of a continually used Hierarchical data model design. The relational data model is based on the focusing of the relationships between the data objects. Through the use of object keys and established relations between the data records, the tables can be redrawn according to the needs of the information query, as expressed in the SQL (Structured Query Language) code, slightly modified per database management system (DBMS). Relational database designs are the most common data models used today, but there is still some larger, legacy hold-outs based on the Hierarchical and Network models. As well, there are plenty of hybrids implemented within database technologies in accord with the nature of the data and its relationships and retrieval requirements. Also, there are data models that try to more closely mimic the programming theories behind object-orientation, such as the Object-orientated data model. Database hardware and management software is literally the nuts and bolts (and circuits and bits) that the data is handled by for organization, storage, and retrieval in a manner that is fast, useful, and effective. Essentially the data model and the management system compose the structure behind every database.
At the moment, there are three board database management technologies that dominate the field. IBM has long been a heavy-hitter in the field, and their DB2 series, currently DB2 9, DBMS has been in the market since the earliest beginnings of the digital technology. Oracle has the biggest name in the market, and their Database series, currently Database 11g, is long serving, taking the honor of being the first commercial database based on the use of SQL even though IBM has the honor of creating the powerful language, but decided not to use it in their produces. Today, SQL is the most used method of accessing and managing a developed database system. Both IBM's and Oracle's DBMS require specialized software running off specific server-networking hardware platforms, also developed and sold by these companies. Due to this design, it is extremely difficult to not think of these DBMS's as hardware-software systems, inseparable from each other. At the other end of the spectrum, MySQL is the newest to the market, but it was boosted by the fact that it is free and open-source and could run on most server systems, thus not being dependent on specially designed and ridiculously expensive hardware platforms. MySQL is a popular choice for database functionality in many web applications, and is a base software in the widely used LAMP web application stack—LAMP being an acronym for the free applications used by most web servers: Linux, Apache, MySQL, and the Perl/PHP/Python group. Whatever DBMS’s are used, databases will always be at the core of the functionality of any extensive virtual collection.
Online Databases
Vast online journal and book databases became the second great stimulus to virtual collections, along with the raise of networking technologies. Leveraging developments in data models and management systems, many companies augmented themselves to sell subscriptions for their online database content, mostly of a specialized research nature. Examples of such companies include Lexus Nexus, Elsevier, and Gale. These online database companies have become very important sources for scholars to conduct their research, and their database product lines depend heavily on the Internet and web technology as online access portals via websites to their databases. Their content is mostly composed of the digitalization of mass collections in older printed scholarly journals and monographs. Also, as specialized products provided by these companies, libraries had to pay subscription fees in order to provide access to their patrons. These well-established services have become important factors in the progress of more formal virtual collections, though they are starting to lose significant ground to companies, such as Google, who are starting to offer general access to content for free. The publishing industry is starting to go through the same growing pains and indecision over online content that the music and movie industry went through in the late 1990's and the first part of this century. Pirating of digital content is a result of over-restrictive attempts by copyright holders with unrealistic ideas in controlling their information once it is put onto the Internet, but they will learn that controlling digital information is a near impossible task as stronger industries with vast more resources have tried and failed miserably before them. The publishing industry would do better to adapt their business models in order to better harness the flow of information dissemination and commerce that is already well established on the Internet—many companies have adapted well to these information flow realities on the web. Any information, whether public or private, once put onto the Internet ought to be considered free, legally or not. But due to their closed-collection, subscription-based business models, the traditional online databases will never reach the size and market penetration of more open though not necessarily free virtual collections, such as Google Books and YouTube, and remain secondary to these newer and more vibrant virtual collections.
Advances in data models and management software/hardware will be the future of more sophisticated computers systems. This will include OPAC systems. The key to advanced complexity and comprehension in computers, not necessarily digital technology, is leveraging the ability to collect, collate, and process large amounts of data from various sources in a fairly short period of time. Improvements in online databases will help to improve all four of these processes as it acts as a central technology in which the others are built upon, particularly the collate process which requires a solid sense of order and correlation between data sets and records. The traditional data models used in computing are powerful, but are not sophisticated enough for the next generation of information technology needing to process significantly increased amounts of data through increasingly faster network feeds. The advanced algorithms in search engine web-crawlers, the automated software programs that analyze web content for indexing in the search database, are steps in the right direction in this field—but only the first few steps on a long road of development. A new database architecture and core data model, the LISA Informationbase and Taxonomy data model, is introduced in chapter 7 that is designed exclusively for the advancements in database development, data indexing, and collation with the above principles in mind for the next generation of information technology.
- Continues in part 4 of 4.
REFERENCES
Please see the last part.
Copyright Joseph Walker, 2020