Web Services
Web services improve development efficiency by sharing WSDL documents with consumers. These services are published in a registry called UDDI, where consumers can search and access them using the SOAP protocol.
Recommender Systems
Recommender systems help users find relevant information amid large data volumes by leveraging user preferences like "likes" and "dislikes." They use information filtering to narrow down choices and save users’ search time. These systems are widely used in e-commerce and social networks.
Background & Collaborative QoS Prediction
Recommender systems assist users in selecting web services by predicting Quality of Service (QoS) properties, which can be user-dependent or independent. Collaborative QoS prediction relies on sharing past user experiences to accurately estimate QoS for services without expensive direct testing. Users contribute QoS data to a centralized server, improving prediction accuracy.
Methodology
Collaborative filtering (CF) requires active user participation, representation of user interests, and matching algorithms. The goal is to accurately predict unknown QoS values by testing real-world services, expanding QoS properties, and incorporating user feedback and activity data.
Literature Survey
CF addresses challenges like data sparsity and privacy. Techniques include memory-based, model-based, and hybrid CF. Recent work explores privacy-aware systems, runtime service adaptation, and advanced algorithms like convolutional neural networks. Machine learning methods like Decision Trees, SVM, and genetic algorithms improve classification and prediction accuracy. Blockchain is proposed to enhance QoS prediction reliability.
System Architecture
The proposed system, HAPA with fuzzy clustering, improves user/item similarity measurement over traditional Pearson correlation, leading to better QoS prediction quality. Modules calculate similarity measures (max, average, reciprocal standard deviation) to produce accurate predictions.
Collaborative Filtering Types
Neighborhood-based CF: Uses observed QoS data to find similar users/services for prediction.
Model-based CF: Uses models like matrix factorization (PMF) to predict unknown QoS values.
User-based and Item-based CF: Calculate similarity between users or items to make predictions.
Performance Analysis
Using real-world QoS data (response time and throughput), the Adaptive Matrix Factorization (AMF) approach outperforms other methods (UPCC, IPCC, UIPCC, PMF) in accuracy (MRE, NPRE) and prediction error distribution. Data transformation improves accuracy significantly, and increased data density reduces errors and overfitting. AMF also shows faster convergence and better scalability to new users and services.
Conclusion
Collaborative filtering (CF) is one of the most successful recommender techniques. Broadly, there are memory-based CF techniques such as the neighborhood-based CF algorithm; model-based CF techniques such as Bayesian belief nets CF algorithms, clustering CF algorithms, and MDP-based CF algorithms; and hybrid CF techniques such as the content boosted CF algorithm and Personality diagnosis. As a representative memory-based CF technique, neighborhood-based CF computes similarity between users or items, and then use the weighted sum of ratings or simple weighted average to make predictions based on the similarity values. Pearson correlation and vector cosine similarity are commonly used similarity calculations, which are usually conducted between co-rated items by a certain user or both users that have co-rated a certain item. To make top N recommendations, neighborhood-based methods can be used according to the similarity values. Memory-based CF algorithms are easy to implement and have good performances for dense datasets. Shortcomings of memory-based CF algorithms include their dependence on user ratings, decreased performance when data are sparse, new users and items problems, and limited scalability for large datasets, and so forth. Memory-based CF on imputed rating data and on dimensionality-reduced rating data will produce more accurate predictions than on the original sparse rating data. Model-based CF techniques need to train algorithmic models, such as Bayesian belief nets, clustering techniques, or MDP-based ones to make predictions for CF tasks. Advanced Bayesian belief nets CF algorithms with the ability to deal with missing data are found to have better performance than simple Bayesian CF models and Pearson correlation-based algorithms. Clustering CF algorithms make recommendations within small clusters rather than the whole dataset, and achieve better scalability. An MDP-based CF algorithm incorporates the users’ action of taking the recommendation or not into the model, and the optimal solution to the MDP is to maximize the function of its reward stream. The MDP-based CF algorithm brings profits to the customized system deploying it. There are downsides of model-based CF techniques, for example, they may not be practical when the data are extremely sparse, the solutions using dimensionality reduction or transformation of multiclass data into binary ones may decrease their recommendation performance, the model-building expense may be high, and there is a tradeoff between prediction performance and scalability for many algorithms.
Most hybrid CF techniques combine CF methods with content-based techniques or other recommender systems to alleviate shortcomings of either system and to improve prediction and recommendation performance. Besides improved performance, hybrid CF techniques rely on external content information that is usually not available, and they generally have increased complexity. It is always desirable to design a CF approach that is easy to implement, takes few resources, produces accurate predictions and recommendations, and overcomes all kinds of challenges presented by real-world CF applications, such as data sparsity, scalability, synonymy, privacy protection, and so forth. Although there is no cure-all solution available yet, people are working out solutions for each of the problems.
To alleviate the sparsity problem of CF tasks, missing-data algorithms such as TAN-ELR, imputation techniques such as Bayesian multiple imputation and dimensionality reduction techniques such as SVD and matrix factorization can be used. Clustering CF algorithms and other approaches such as an incremental SVD CF algorithm are found promising in dealing with the scalability problem. Latent semantic indexing (LSI) is helpful to handle the synonymy problem. And sparse factor analysis is found helpful to protect user privacy. Advances in Artificial Intelligence Besides addressing the above challenges, future CF techniques should also be able to make accurate predictions in the presence of shilling attacks and noisy data, and be effectively applied in fast-growing mobile applications as well. There are many evaluation metrics for CF techniques. The most commonly used metric for prediction accuracy include mean absolute error (MAE), recall and precision, and ROC sensitivity. Because artificial data are usually not reliable due to the characteristics of CF tasks, real-world datasets from live experiments are more desirable for CF research.
We have presented a novel approach to assist users and Web service providers in the composition and selection of composite services that are more privacy preserving. With respect to other proposals for privacy-preserving Web service composition, our approach supports the specification of fine-grained privacy policies and preferences based on different privacy dimensions, i.e. purpose, visibility, retention period and sensitivity. In addition, our approach ranks the generated composite Web services with respect to their privacy level, which quantifies the risk of unauthorized disclosure of user information based on sensitivity, visibility and retention period. As future work, we are planning to conduct an extensive evaluation of our Java-based prototype. First, we will evaluate its performance with respect to the number of candidate Web services, the complexity of the privacy policies of the orchestrator and component services, and to the (re)delegation depth.
Then, we will conduct a controlled experiment with master students in computer science to evaluate participants’ perceived ease of use, perceived usefulness, and intention to use according to the Technology Acceptance Model (TAM).
References
[1] Xiaoyuan Su and Taghi M. Khoshgoftaar (2009) “A Survey of Collaborative Filtering Techniques”, Hindawi Publishing Corporation Advances in Artificial Intelligence Volume 2009, Article ID 421425, doi:10.1155/2009/421425
[2] Z. Zheng, H. Ma, M. R. Lyu and I. King (2013) “Collaborative Web Service QoS Prediction via Neighborhood Integrated Matrix Factorization” in IEEE Transactions on Services Computing, vol. 6, no. 3, pp. 289-299.
[3] Elisa Costante, Federica Paci and Nicola Zannone 2013) “Privacy-Aware Web Service Composition and Ranking”, International Journal of Web Services Research • July 2013
[4] Jieming Zhu, Pinjia He, ZibinZheng and Michael R. Lyu (2014) “Towards Online, Accurate, and Scalable QoS Prediction for Runtime Service Adaptation”, IEEE 34th International Conference on Distributed Computing Systems 1063-6927/14
[5] BadrulSarwar, George Karypis, Joseph Konstan, and John Riedl (2001) “Item-Based Collaborative Filtering Recommendation Algorithms”, Hong Kong, ACM 1-58113-348-0/01/0005.
[6] XiongLuo, HaoLuo and Xiaohui Chang (2015) “Online Optimization of Collaborative Web Service QoS Prediction Based on Approximate Dynamic Programming”, Hindawi Publishing Corporation International Journal of Distributed Sensor Networks Vol. 2015, Article ID 452492, http://dx.doi.org/10.1155/2015/452492
[7] Priyanka Sebastian and B. Janani (2014) “A Survey on Web Service Recommender Systems”, International Journal of Advanced Research in Computer, Volume 5, No. 1, ISSN No. 0976-5697
[8] YitongMeng, Weiwen Liu and Benben Liao (2019) “PMD: A New User Distance for Recommender Systems”, CSIR, arXiv:1909.04239v2
[9] Min Chen, YixueHao, Kai Hwang, Lu Wang and Lin Wang (2017) “Disease Prediction by Machine Learning Over Big Data from Healthcare Communities”, IEEE- Healthcare Big Data, 10.1109/ACCESS.2017.2694446
[10] Amirah Mohamed Shahiria, WahidahHusaina and Nuraini Abdul Rashida (2015) “The Third Information Systems International Conference A Review on Predicting Student’s Performance using Data Mining Techniques”, Published by Elsevier B.V., Procedia Computer Science 72, 414 – 422 1877-0509
[11] Ms. ManjiriMahadevMastoli, Dr. Urmila R. Pol and Rahul D. Patil (2019) “Machine Learning Classification Algorithms for Predictive Analysis in Healthcare”, International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 06 Issue: 12, p-ISSN: 2395-0072 Page 1225
[12] A Durga Devi (2015) “Enhanced Prediction of Heart Disease by Genetic Algorithm and RBF Network”, International Journal of Advanced Information in Engineering Technology (IJAIET) Vol.2, No.2, 29-36
[13] Ying Jin, Guangming Cui and Yiwen Zhang (2020) “Quality Prediction of Web Services Based on a Covering Algorithm”, Hindawi Complexity Volume 2020, Article ID 8572161, https://doi.org/10.1155/2020/8572161
[14] WeihongCai, Xin Du and JianlongXu (2019) “A Personalized QoS Prediction Method for Web Services via Blockchain-Based Matrix Factorization”, Sensors 2019, 19(12), 2749
[15] ChaimaAbid, MarouaneKessentini and Hanzhang Wang (2020) “Early prediction of quality of service using interface-level metrics, code-level metrics, and anti patterns”, Information and Software Technology, Vol. 126, 106313